Global News Digest

Technology

arXiv

TrafficClaw: A Generalizable LLM Agent in the Unified Physical Environment for Urban Traffic Control

TrafficClaw is an LLM agent for unified urban traffic control, using spatiotemporal reasoning and RL to optimize interconnected subsystems. It demonstrates robust generalization and cross-subsystem coordination across six tasks in three major cities.

arXiv

Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities

Biomedical AI risks exacerbating healthcare disparities due to biased, non-representative training data. Adopting transparency and provenance standards is essential to mitigate these inequities.

arXiv

Neural Decision-Propagation for Answer Set Programming

Neural Decision-Propagation (NDProp) replaces classical ASP solvers with neural networks and fuzzy logic to improve scalability. It efficiently learns stable models, enhancing accuracy in neuro-symbolic benchmarks.

arXiv

Trustworthy AI Suffers from Invariance Conflicts and Causality is The Solution

This paper argues that causal reasoning resolves invariance conflicts among trustworthy AI goals like fairness and robustness. It offers a framework to mitigate trade-offs in both ML and foundation models.

arXiv

From Features to Actions: Explainability in Traditional and Agentic AI Systems

This study contrasts static feature attribution with trace-based diagnostics, revealing that the latter effectively diagnoses agentic AI failures. It advocates for trajectory-level explainability to evaluate autonomous behaviors.

arXiv

KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models

KnowledgeBerg benchmarks LLMs on systematic knowledge coverage and compositional reasoning, revealing significant performance deficits across models and languages. Despite improvements from test-time compute and retrieval, enduring limitations in structured knowledge management persist.

arXiv

ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor for Inductive Logic Programming

ANDRE is a neuro-symbolic ILP framework using attention-based differentiable operators to extract interpretable rules from noisy, probabilistic data. It outperforms existing methods in stability and rule quality.

arXiv

The Refusal--Compliance Tradeoff: A Large-Scale Safety Behavior Audit of Large Language Models

This audit reveals LLMs trade off over-refusal of safe queries against harmful compliance, with safety behaviors driven more by post-training objectives than architecture.

arXiv

Towards a Virtual Neuroscientist: Autonomous Neuroimaging Analysis via Multi-Agent Collaboration

NEXUS is an autonomous multi-agent framework for neuroimaging that dynamically adapts workflows to improve biomarker prediction. It outperforms static pipelines on ADHD-200 and ADNI datasets through collaborative, code-centric execution and hierarchical quality control.

arXiv

Causal state binding predicts action control in language agents

The study introduces "causal state binding" to verify if language agents’ internal states genuinely drive actions. Results show structured agents outperform controls, improving constraint-clean issue localization.

arXiv

RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation

RADAR is a redundancy-aware diffusion framework that iteratively generates adaptive multi-agent communication topologies. It outperforms baselines in accuracy, robustness, and token efficiency across six benchmarks.

arXiv

CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

CVEvolve is an autonomous, zero-code framework using LLMs to discover algorithms for unstructured scientific data. It outperforms baselines and generalizes better, empowering scientists to process complex images without coding expertise.

arXiv

MMSkills: Towards Multimodal Skills for General Visual Agents

MMSkills introduces reusable multimodal procedural knowledge for visual agents, using state-conditioned packages to enhance runtime decision-making. It converts public trajectories into skills via an agentic generator and branch-loaded deployment.

arXiv

Herculean: An Agentic Benchmark for Financial Intelligence

Herculean is a new benchmark evaluating AI agents' agentic financial skills across trading, hedging, insights, and auditing. It reveals that while agents handle trading well, they struggle with complex, long-horizon tasks like hedging and auditing.

arXiv

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

ECC enhances LLM capability assessment by refining semantic embeddings with posterior model comparisons. It outperforms baselines by ~18% and improves downstream applications like query routing.

arXiv

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps

This benchmark reveals that top deep research agents struggle with expert consulting tasks, achieving acceptance rates below 22%. The study highlights significant gaps in multi-document analysis and susceptibility to cognitive traps.

arXiv

Coding Agent Is Good As World Simulator

This paper introduces a coding agent framework that generates physics-based world models via executable code, outperforming video-based models in physical accuracy and visual quality for simulations.

arXiv

Ethical Hyper-Velocity (EHV): A Hardware-Rooted Zero-Trust Runtime Enforcement Architecture for Agentic AI Systems

EHV is a hardware-rooted architecture for agentic AI that enforces policies in O(1) time using TEEs and formal verification. It eliminates governance latency, ensuring safety without compromising deployment speed.

arXiv

Towards a General Intelligence and Interface for Wearable Health Data

Researchers developed a foundation model for wearable health data, pretrained on massive unlabeled datasets, to improve health predictions. Integrated with LLM agents, it enables a Personal Health Agent for context-aware, safe insights.

arXiv

LLM-Guided Communication for Cooperative Multi-Agent Reinforcement Learning

LMAC uses LLMs to optimize communication protocols, enabling agents to accurately reconstruct shared states. This approach significantly improves performance and state recovery in cooperative multi-agent reinforcement learning benchmarks.