Technology
CA-BED: Conversation-Aware Bayesian Experimental Design
CA-BED is a Bayesian dialog planning framework that improves LLM question selection in interactive contexts. It boosts success rates by 21.8% with minimal extra conversational turns.
Topological Ignorability for Structural Causal Effects Beyond Means
This paper introduces topological ignorability and geometric causal metrics to detect structural distribution changes beyond means. Validated under hidden confounding, these methods identify effects missed by traditional mean-based approaches.
pcbGPT: Automatic PCB Schematic Synthesis from Natural Language Requirements
pcbGPT synthesizes editable KiCad schematics from natural language using a Python DSL and validation tools. It achieves high accuracy on embedded tasks but still requires expert review for reliability.
Low-Resource Safety Failures Are Action Failures, Not Representation Failures
Safety failures in low-resource languages stem from action deficits, not representation gaps. A recalibrated gate using few examples significantly improves refusal rates without retraining.
Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations
LLMs exhibit language-driven disparities in medical triage, with ER recommendations varying by input language. These biases stem from implicit geographic inference, as adding location anchors significantly alters emergency advice.
DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation
DiscourseFlip is a stealthy, black-box attack that manipulates opinions across broad query networks in RAG systems. It outperforms baselines in efficacy and evasion, exposing critical gaps in current defenses.
TECCI: Tricky Edits of Collected and Curated Images
TECCI is a rigorous benchmark exposing weaknesses in image editing models, where no model exceeded 22% success. Nano Banana Pro emerged as the top performer in this challenging evaluation.
Fine-Tuning Diffusion Models for Molecular Generation via Reinforcement Learning and Fast Sampling
FTDiff fine-tunes diffusion models via RL and rapid sampling to generate high-quality, structurally constrained molecules efficiently, outperforming existing methods without costly post-processing.
Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs
APEIRIA distills neuro-symbolic logic into 3D MLLMs via a three-stage curriculum, merging transparent symbolic reasoning with open-vocabulary flexibility. It outperforms prior NS3D methods while matching state-of-the-art 3D MLLM performance on spatial reasoning benchmarks.
Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue
This paper introduces RefMem-Bench to evaluate reflective memory in long dialogues and REMIND, a framework that enhances models' ability to synthesize fragmented signals into sophisticated interpretations.
Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing
This paper introduces a hybrid framework for imbalanced regression, unifying data-level and algorithm-level balancing. It enhances predictive accuracy by combining adaptive binning, representation learning, and a novel latent-density weighted loss.
Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization
This study introduces MEA, a 24-language MTXLS benchmark, revealing that LLMs process summarization and translation concurrently. An activation steering technique leveraging English summaries significantly improves cross-lingual output quality.
PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery
PALTO optimizes GaN tri-gate FinFETs via physics-informed active learning, identifying two devices with superior switching efficiency and drive current compared to industrial benchmarks.
DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance
DeepIPCv3 fuses LiDAR and DVS data via Transformer attention to prevent sudden pedestrian collisions. It achieves state-of-the-art, light-independent safety by eliminating motion blur and reducing control errors.
IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages
IndoBias evaluates LLM bias in Indonesian and regional languages using a dual-track cultural benchmark. Results show decoder models favor prototypical Indonesian sentences, while local languages trigger higher ideological bias.
RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning
POPO boosts LLM reasoning by prioritizing effective off-policy samples via group replay and decoupled importance sampling. This reduces rollout costs while maintaining robust performance across diverse reasoning domains.
Knowledge-Intensive Video Generation
KIVI introduces a framework and benchmark to evaluate video generation's factual accuracy and helpfulness. Results show current models lag behind humans in delivering clear, reliable instructional content.
Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning
This study reveals that latent visual reasoning gains stem from boundary markers and attention patterns, not visual memory. Retaining only markers preserves most performance, challenging the assumption that latent tokens encode visual evidence.
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution
BenchEvolver synthesizes challenging coding tasks by evolving solutions, effectively differentiating top-tier LLMs. This approach enables scalable benchmark creation and improves model performance via reinforcement learning.
What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression
This paper unifies knowledge transfer via spectral SGD analysis, revealing how spectral horizon expansion and denoising drive effectiveness in high-dimensional linear regression.