Global News Digest

Technology

arXiv

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

ReSkill harmonizes skill creation with policy optimization in Agentic RL using assertion-driven updates and Thompson Sampling. It outperforms existing methods by ensuring skills co-evolve with the policy, significantly boosting performance on unseen tasks.

arXiv

MobEvolve: An Agentic Self-Evolving Heuristic System for Interpretable Human Mobility Generation

MobEvolve is an agentic, self-evolving heuristic system that uses LLM agents to refine mobility models. It outperforms existing methods in realism, interpretability, and efficiency on Singapore and Montreal datasets.

arXiv

Evidence-Gated LLM Priors for Multi-Objective Bayesian Optimization

This study introduces an evidence-gated framework for multi-objective Bayesian optimization, dynamically calibrating LLM priors via objective-specific reputation markets. Results show this approach enhances robustness over static priors, though raw LLM confidence proves inconsistent across benchmarks.

arXiv

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization

S-SPPO stabilizes Self-Play Preference Optimization via semantic calibration, preventing policy degeneration. It achieves superior AlpacaEval 2.0 performance with Llama-3-8B without extra human annotations.

arXiv

TrafficRAG: A Multimodal RAG Framework for Traffic Accident Liability Determination

TrafficRAG is a multimodal RAG framework that combines vision-language models with hybrid retrieval to automate traffic accident liability determination. It outperforms baselines, achieving 77.32% legal accuracy and 81.71% factual faithfulness.

arXiv

Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation

This study introduces GAIATrace and Vidur-Agent to simulate and analyze multi-model agentic AI systems. These tools enable reproducible, cost-effective evaluation of system dynamics and design choices on general tasks.

arXiv

TriAlign: Towards Universal Truth Consistency in Personalized LLM Alignment

TriAlign introduces Truth-Invariant Alignment via multi-agent RL, balancing personalized LLM outputs with universal truth consistency. It reduces factual disparities across social groups while maintaining high personalization quality.

arXiv

EvoBrain: Continual Learning of EEG Foundation Models Across Heterogeneous BCI Tasks

EvoBrain enables continual learning for EEG foundation models via Neuro-Spectral Task Normalization and Response-Affinity Distillation. It outperforms SOTA methods across six BCI tasks, achieving unified decoding with minimal catastrophic forgetting.

arXiv

Stochastic convergence of parallel asynchronous adaptive first-order methods

This paper analyzes the stochastic convergence of parallel asynchronous adaptive first-order methods for non-convex optimization, proving an O(1/sqrt{t}) rate. Empirical results confirm their suitability for large-scale, heterogeneous machine learning environments.

arXiv

Structure-Guided Adaptive Propagation for Protein-Protein Interaction Site Prediction

SGAP-PPIS uses structure-guided adaptive propagation to dynamically adjust information diffusion for accurate protein-protein interaction site prediction. It outperforms rigid models by leveraging equivariant graph neural networks to tailor propagation to local geometric contexts.

arXiv

Consistency evaluation of benchmarks used for causal discovery

This study introduces an LLM-based workflow to verify 11 causal discovery benchmarks against 38,081 papers, revealing significant discrepancies with current literature. These findings highlight critical reliability issues in widely used benchmarks for causal discovery research.

arXiv

Token Predictors Are Not Planners: Building Physically Grounded Causal Reasoners

The authors introduce Causal-Plan-Bench and Causal Planner to shift embodied AI from token prediction to physical causal reasoning. Their model achieves superior performance by internalizing physical logic, validating a Causal Scaling Law.

arXiv

OctoT2I: A Self-Evolving Agentic Text-to-Image Router

OctoT2I is a self-evolving agentic router that optimizes text-to-image generation via an unsupervised, multi-round routing strategy. It achieves superior speed and energy efficiency while maintaining high-quality outputs without human supervision.

arXiv

Evaluation of Baseline Methods for IDD-based SSD External Memory Search

This study evaluates simple baseline methods for SSD-based A* search using immediate duplicate detection, addressing gaps in prior research on external memory search strategies.

arXiv

CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback

CAPF guides LLM search agents using verifier-side feedback to repair failed rollouts, boosting Qwen3-4B’s QA accuracy from 44.7% to 48.5% across seven benchmarks.

arXiv

EVA-Net: Subject-Independent EEG Motor Decoding with Video-Derived Motor Priors

EVA-Net uses video-derived motor priors to align EEG features, enabling robust subject-independent decoding. It achieves an 8.66% LOSO accuracy gain on EEGMMI, outperforming text-based methods.

arXiv

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

This study benchmarks quantized and sparse LLMs using conformal prediction, revealing that compression decouples accuracy from uncertainty reliability. It urges incorporating uncertainty-aware metrics into model compression workflows for safer deployment.

arXiv

Absorbing Complexity: An Interaction-Native Knowledge Harness for Financial LLM Agents

InKH is a framework for financial LLM agents that internalizes complexity via structured memory, reducing latency by 83% and token costs by 82% while improving task quality and auditability.

arXiv

WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis

WorldCoder-Bench benchmarks LLMs on synthesizing physically grounded 3D worlds using StateProbe to verify hidden runtime states. Results show top models achieve low verification coverage, highlighting significant challenges in generating robust, interactive 3D environments.

arXiv

Community-Aware Assessment of Social Textual Engagement and Resonance: A Human-Centric Perspective on User-Generated Content Evaluation

The study introduces CASTER and MEDEA, a human-centric framework using Social-CoT to assess UGC resonance via simulated community personas. MEDEA outperforms baselines on the new CASTER-Bench with empathetic, interpretable reasoning.