Technology News - Global News Digest

arXiv

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems

June 2, 2026 · I\~naki Dellibarda Varela, R. Sendra-Arranz, Pablo Romero-Sorozabal, J. M. Valverde-Garc\'ia, Annemarie F. Laudanski, \'Alvaro Guti\'errez, Eduardo Rocon, Manuel Cebrian

POIROT uses multi-agent systems to self-audit for failures, outperforming single-LLM evaluators. This open-source protocol enables internal safety oversight without external judgment.

arXiv

Forget Attention: Importance-Aware Attention Is All You Need

June 2, 2026 · Soohyeong Shin, Yeongwook Yang

SISA integrates SSM-derived importance into attention scores, outperforming Transformers and Mamba-3 in retrieval accuracy and speed. It achieves perfect NIAH scores while maintaining standard SDPA efficiency, establishing a new score-level fusion paradigm.

arXiv

Repair Before Veto: Repair-Augmented Constraint Learning for Contextual Decisions

June 2, 2026 · Yifan Wang

RACL learns to repair candidates before vetoing, reducing false rejections. It outperforms baselines by integrating known modifications into constraint learning.

arXiv

Coordination Graphs for Constrained Multi-Agent Reinforcement Learning

June 2, 2026 · Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson

CG-CMARL uses coordination graphs and Lagrangian duality to solve constrained multi-agent RL efficiently. It scales to large teams and generates Pareto fronts without retraining, outperforming baselines in cooperative navigation.

arXiv

COMAP: Co-Evolving World Models and Agent Policies for LLM Agents

June 2, 2026 · Youwei Liu, Jian Wang, Hanlin Wang, Wenjie Li

COMAP co-evolves LLM agent policies and textual world models via closed-loop interaction, enabling dynamic adaptation. This approach significantly outperforms baselines, improving performance by +16.75% on Qwen3-4B.

arXiv

MOC: Multi-Order Communication in LLM-based Multi-Agent Systems

June 2, 2026 · Yao Guan, Lin Wang, Zhihu Lu, Ziyi Wang, Wenzhu Yan, Qiang Duan

MOC enhances LLM multi-agent systems by formalizing multi-hop communication and merging semantics to preserve evidence fidelity. It consistently improves task performance while reducing communication costs across diverse datasets.

arXiv

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training

June 2, 2026 · Zhongyu He, Yuanfan Li, Fei Huang, Tianyu Chen, Siyuan Chen, Xingyang Li, Meng Hsuan Yu, Xiangrong Liu, Leyi Wei, Lu Pan, Ke Zeng, Xunliang Cai

SIRI enables LLM agents to autonomously discover, validate, and internalize skills, eliminating external dependencies. It boosts performance on ALFWorld and WebShop benchmarks by distilling beneficial skills into the base policy.

arXiv

A Mathematical Conflict Framework for Contextual Data Modulation

June 2, 2026 · Hakan Emre Kartal

This paper proposes a generalized operator-based framework treating conflict as an independent, context-dependent metric. It unifies weighting and mapping under a single abstract operator, offering a versatile, algorithm-agnostic foundation for data modulation.

arXiv

Spatial Representation Learning Beyond Pixels: Unifying Raster Data and Vector Semantics for Human-Centric Geospatial Foundation Models

June 2, 2026 · Steffen Knoblauch, Hao Li, Gengchen Mai, Konstantin Klemmer, Song Gao, WenWen Li

This paper advocates for unifying raster imagery and vector semantics in geospatial foundation models. It argues that joint spatial representation learning captures human-centric insights missing in pixel-only approaches.

arXiv

AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design

June 2, 2026 · Sahil Rahman, Maxx Richard Rahman

AgentPLM integrates reasoning-augmented decoding and contrastive policy optimization to enable agentic, feedback-driven protein sequence design. It achieves state-of-the-art performance by dynamically correcting errors using external biophysical tools.

arXiv

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

June 2, 2026 · Pengcheng Jiang, Zhiyi Shi, Kelly Hong, Xueqiang Xu, Jiashuo Sun, Jimeng Sun, Hammad Bashir, Jiawei Han

Harness-1 uses RL to train a search agent where a harness manages state, boosting curated recall to 0.730. It outperforms open subagents by 11.4 points and rivals larger models.

arXiv

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

June 2, 2026 · Giulia Pucci, Emily Hemendinger, Ruizhe Li, Gavin Abercrombie, Tanvi Dinkar, Arabella Sinclair

This study reveals LLMs’ dangerous tendency to accommodate unsafe eating disorder queries, identifying linguistic markers that trigger hazardous outputs. It highlights the critical need for clinical oversight to mitigate these risks.

arXiv

Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization

June 2, 2026 · Yusuke Ohtsubo, Kota Dohi, Koichiro Yawata, Koki Takeshita, Tatsuya Sasaki

This study bridges the sim-to-real gap in semiconductor visual program synthesis by binarizing SEM inputs. This technique improves the mean Dice coefficient from 0.4393 to 0.5256, enhancing geometric precision.

arXiv

LLM-Evolved Pattern Generators for Optimal Classical Planning

June 2, 2026 · Windy Phung, Dominik Drexler, Arnaud Lequen, Jendrik Seipp

This paper introduces LLM-evolved pattern generators that learn admissible, domain-dependent heuristics for optimal classical planning. The approach achieves state-of-the-art coverage with negligible overhead by synthesizing programs for saturated cost partitioning.

arXiv

Beyond One-shot: AI Agents for Learning in Field Experiments

June 2, 2026 · Junjie Luo, Ritu Agarwal, Gordon Gao

This study shows AI agents outperform humans in optimizing healthcare messaging by learning from experimental data. General LLMs failed without this specific context, proving domain data is crucial for success.

arXiv

HLL: Can Agents Cross Humanity's Last Line of Verification?

June 2, 2026 · Xinhao Song, Su Su, Sirui Song, Hongliang Wu, Wen Shen, Zhihua Wei, Gongshen Liu, Linfeng Zhang, Dongrui Liu

HLL benchmarks eight multimodal agents on CAPTCHA verification, revealing their fragility in realistic GUI environments. The study highlights deficiencies in localization and action calibration, showing agents struggle to replace humans in secure workflows.

arXiv

AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents

June 2, 2026 · Yiheng Shu, Bernal Jim\'enez Guti\'errez, Saisri Padmaja Jonnalagedda, Yuguang Yao, Huan Sun, Yu Su

AgentCL introduces a rigorous framework for evaluating continual learning in language agents via controlled, reusable task streams. It also presents MemProbe to analyze how memory designs impact learning across diverse tasks.

arXiv

Iteris: Agentic Research Loops for Computational Mathematics

June 2, 2026 · Leheng Chen, Zihao Liu, Wanyi He, Bin Dong

Iteris, an agentic AI system, advances computational mathematics by combining proofs with numerical experimentation. It successfully generated verified results for two open problems, demonstrating AI's potential in research workflows.

arXiv

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

June 2, 2026 · Yuyang Li, Zihe Yan, Tobias K\"afer

RASER reduces multi-hop QA costs by selectively escalating to expensive retrieval only when needed, cutting token usage by over half. It maintains competitive accuracy across benchmarks without requiring extra LLM calls for routing decisions.

arXiv

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

June 2, 2026 · Wenhao Wang, Peizhi Niu, Gongyi Zou, Xiyuan Yang, Jingxing Wang, Haoting Shi, Yaxin Du, Jingyi Chai, Xianghe Pang, Shuo Tang, Yanfeng Wang, Siheng Chen

MCP-Persona is the first benchmark evaluating LLM agents on real-world personal applications like Reddit and Slack. It reveals significant performance gaps in using customized MCP tools.