Global News Digest

Technology

arXiv

CA-BED: Conversation-Aware Bayesian Experimental Design

CA-BED is a Bayesian dialog planning framework that improves LLM question selection in interactive contexts. It boosts success rates by 21.8% with minimal extra conversational turns.

arXiv

Topological Ignorability for Structural Causal Effects Beyond Means

This paper introduces topological ignorability and geometric causal metrics to detect structural distribution changes beyond means. Validated under hidden confounding, these methods identify effects missed by traditional mean-based approaches.

arXiv

pcbGPT: Automatic PCB Schematic Synthesis from Natural Language Requirements

pcbGPT synthesizes editable KiCad schematics from natural language using a Python DSL and validation tools. It achieves high accuracy on embedded tasks but still requires expert review for reliability.

arXiv

Low-Resource Safety Failures Are Action Failures, Not Representation Failures

Safety failures in low-resource languages stem from action deficits, not representation gaps. A recalibrated gate using few examples significantly improves refusal rates without retraining.

arXiv

Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations

LLMs exhibit language-driven disparities in medical triage, with ER recommendations varying by input language. These biases stem from implicit geographic inference, as adding location anchors significantly alters emergency advice.

arXiv

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

DiscourseFlip is a stealthy, black-box attack that manipulates opinions across broad query networks in RAG systems. It outperforms baselines in efficacy and evasion, exposing critical gaps in current defenses.

arXiv

TECCI: Tricky Edits of Collected and Curated Images

TECCI is a rigorous benchmark exposing weaknesses in image editing models, where no model exceeded 22% success. Nano Banana Pro emerged as the top performer in this challenging evaluation.

arXiv

Fine-Tuning Diffusion Models for Molecular Generation via Reinforcement Learning and Fast Sampling

FTDiff fine-tunes diffusion models via RL and rapid sampling to generate high-quality, structurally constrained molecules efficiently, outperforming existing methods without costly post-processing.

arXiv

Distilling Neuro-Symbolic Programs into 3D Multi-modal LLMs

APEIRIA distills neuro-symbolic logic into 3D MLLMs via a three-stage curriculum, merging transparent symbolic reasoning with open-vocabulary flexibility. It outperforms prior NS3D methods while matching state-of-the-art 3D MLLM performance on spatial reasoning benchmarks.

arXiv

Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

This paper introduces RefMem-Bench to evaluate reflective memory in long dialogues and REMIND, a framework that enhances models' ability to synthesize fragmented signals into sophisticated interpretations.

arXiv

Hybrid Imbalanced Regression Through Unified Data-Level and Algorithm-Level Balancing

This paper introduces a hybrid framework for imbalanced regression, unifying data-level and algorithm-level balancing. It enhances predictive accuracy by combining adaptive binning, representation learning, and a novel latent-density weighted loss.

arXiv

Understanding LLM Behavior in Multi-Target Cross-Lingual Summarization

This study introduces MEA, a 24-language MTXLS benchmark, revealing that LLMs process summarization and translation concurrently. An activation steering technique leveraging English summaries significantly improves cross-lingual output quality.

arXiv

PALTO: Physics-Informed Active Learning for Tri-Gate FinFET Design Optimization for Vertical Power Delivery

PALTO optimizes GaN tri-gate FinFETs via physics-informed active learning, identifying two devices with superior switching efficiency and drive current compared to industrial benchmarks.

arXiv

DeepIPCv3: Event-Aware Multi-Modal Sensor Fusion for Sudden Pedestrian Crossing Avoidance

DeepIPCv3 fuses LiDAR and DVS data via Transformer attention to prevent sudden pedestrian collisions. It achieves state-of-the-art, light-independent safety by eliminating motion blur and reducing control errors.

arXiv

IndoBias: A Dual Track Culturally Grounded Benchmark for LLMs Bias Evaluation in Indonesian Languages

IndoBias evaluates LLM bias in Indonesian and regional languages using a dual-track cultural benchmark. Results show decoder models favor prototypical Indonesian sentences, while local languages trigger higher ideological bias.

arXiv

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

POPO boosts LLM reasoning by prioritizing effective off-policy samples via group replay and decoupled importance sampling. This reduces rollout costs while maintaining robust performance across diverse reasoning domains.

arXiv

Knowledge-Intensive Video Generation

KIVI introduces a framework and benchmark to evaluate video generation's factual accuracy and helpfulness. Results show current models lag behind humans in delivering clear, reliable instructional content.

arXiv

Beyond Visual Memory: Mechanistic Diagnostics of Latent Visual Reasoning

This study reveals that latent visual reasoning gains stem from boundary markers and attention patterns, not visual memory. Retaining only markers preserves most performance, challenging the assumption that latent tokens encode visual evidence.

arXiv

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

BenchEvolver synthesizes challenging coding tasks by evolving solutions, effectively differentiating top-tier LLMs. This approach enables scalable benchmark creation and improves model performance via reinforcement learning.

arXiv

What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

This paper unifies knowledge transfer via spectral SGD analysis, revealing how spectral horizon expansion and denoising drive effectiveness in high-dimensional linear regression.