Technology News - Global News Digest

arXiv

Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge

June 2, 2026 · Xutao Ma, Yixiao Huang, Hanlin Zhu, Somayeh Sojoudi

The paper introduces an "Identity Bridge" training method to overcome the reversal curse in LLMs. This low-cost approach enables models to learn higher-level rules, significantly improving logical reasoning performance.

arXiv

Beyond End-to-End Video Models: An LLM-Based Multi-Agent System for Educational Video Generation

June 2, 2026 · Lingyong Yan, Jiulong Wu, Dong Xie, Weixian Shi, Deguo Xia, Jizhou Huang

LASEV is an LLM-based multi-agent system that generates precise educational videos by orchestrating specialized agents for reasoning, visualization, and narration. It ensures logical accuracy and synchronization through structured script assembly rather than direct pixel generation.

arXiv

Prototype Transformer: Towards Language Model Architectures Interpretable by Design

June 2, 2026 · Yordan Yordanov, Matteo Forasassi, Bayar Menzat, Ruizhi Wang, Chang Qi, Markus Kaltenberger, Amine M'Charrak, Tommaso Salvatori, Thomas Lukasiewicz

The Prototype Transformer replaces self-attention with linear-cost prototypes, enabling inherent interpretability by learning identifiable concepts. It maintains competitive performance while offering transparent, scalable language modeling.

arXiv

REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

June 2, 2026 · Kai Ye, Xianwei Mao, Sheng Zhou, Zirui Shao, Ye Mo, Liangliang Liu, Haikuan Huang, Bin Li, Jiajun Bu

REAL resolves knowledge conflicts in VQA via Reasoning-Pivot Alignment, using RPA-SFT and RPGD to detect and mitigate contradictions. This approach significantly improves discrimination accuracy and overall performance across datasets.

arXiv

Benchmarking at the Edge of Comprehension

June 2, 2026 · Samuele Marro, Jialin Yu, Emanuele La Malfa, Oishi Deb, Jiawei Li, Yibo Yang, Ebey Abraham, Sunando Sengupta, Eric Sommerlade, Michael Wooldridge, Philip Torr

Critique-Resilient Benchmarking uses adversarial verification to evaluate LLMs in the "post-comprehension regime," where human understanding is insufficient. This method maintains evaluation integrity by focusing on localized claims rather than full task comprehension.

arXiv

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

June 2, 2026 · Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

LLM4Cov is an offline framework for high-coverage hardware verification that uses execution-aware agentic learning. A 4B model achieved 90.4% coverage, outperforming its teacher despite being significantly smaller.

arXiv

LLM-WikiRace Benchmark: How Far Can LLMs Plan over Real-World Knowledge Graphs?

June 2, 2026 · Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic

LLM-WikiRace reveals that while top LLMs excel at easy tasks, their performance drops significantly on hard reasoning challenges. Results show long-horizon planning, not just knowledge, is the primary bottleneck for current models.

arXiv

PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

June 2, 2026 · Junkai Lu, Peng Chen, Xingjian Wu, Yang Shu, Chenjuan Guo, Christian S. Jensen, Bin Yang

PATRA addresses LLM limitations in time series reasoning by extracting trend/seasonality patterns and using balanced rewards. It outperforms baselines in TSQA tasks, enhancing cross-modal understanding and deep logical analysis.

arXiv

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents

June 2, 2026 · Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng

The paper identifies "information self-locking" in RL-based LLM agents, where poor action selection and belief tracking create a bottleneck. It proposes AREW, an advantage reweighting technique, to alleviate this issue and boost performance by up to 60 points.

arXiv

OpenHospital: A Thing-in-itself Arena for Evolving and Benchmarking LLM-based Collective Intelligence

June 2, 2026 · Peigen Liu, Rui Ding, Yuren Mao, Ziyan Jiang, Yuxiang Ye, Yunjun Gao, Ying Zhang, Renjie Sun, Longbin Lai, Zhengping Qian

OpenHospital is an interactive simulation for evolving and benchmarking LLM-based collective intelligence. It enables physician agents to develop medical competence through dynamic patient interactions.

arXiv

AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents

June 2, 2026 · Shengda Fan, Xuyan Ye, Yupeng Huo, Zhi-Yuan Chen, Yiju Guo, Shenzhi Yang, Wenkai Yang, Shuqi Ye, Jingwen Chen, Haotian Chen, Xin Cong, Yankai Lin

AgentProcessBench evaluates step-level quality in tool-using agents using 1,000 trajectories and human annotations. It reveals model fragilities and shows process evaluation complements outcome-based supervision.

arXiv

Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

June 2, 2026 · Minh-Khoi Pham, Thang-Long Nguyen Ho, Thao Thi Phuong Dao, Tai Tan Mai, Minh-Triet Tran, Marie E. Ward, Una Geary, Rob Brennan, Nick McDonald, Martin Crane, Marija Bezbradica

The study introduces AWARE, a retrieval-aligned framework that significantly improves clinical risk prediction in EHRs by addressing data heterogeneity and imbalance, outperforming naive retrieval methods.

arXiv

Rashomon Memory: Towards Argumentation-Driven Retrieval for Multi-Perspective Agent Memory

June 2, 2026 · Albert Sadowski, Jaros{\l}aw A. Chudziak

Rashomon Memory uses parallel, goal-conditioned agents to encode contradictory experiences, employing argumentation to retrieve and explain multi-perspective interpretations.

arXiv

Vision Language Models Cannot Reason About Physical Transformation

June 2, 2026 · Dezhi Luo, Yijiang Li, Maijunxian Wang, Tianwei Zhao, Bingyang Wang, Siheng Wang, Pinyuan Feng, Pooyan Rahmanzadehgervi, Ziqiao Ma, Hokin Deng

Vision-Language Models fail to reason about physical transformations, performing at chance on ConservationBench. They rely on textual priors rather than visual understanding, unable to maintain invariant representations of physical attributes.

arXiv

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

June 2, 2026 · Zeyu Wang, Jingye Xu, Xiaogang Li, Peiyao Xiao, Qinhao Kong, Ben Wang, Chengliang Xu, Zichao Chen, Bing Zhao, Hu Wei

FeynmanBench reveals that while multimodal LLMs excel at local diagram recognition, they fail at global topological and algebraic reasoning, exposing critical architectural limits in scientific diagram comprehension.

arXiv

PECKER: A Precisely Efficient Critical Knowledge Erasure Recipe For Machine Unlearning in Diffusion Models

June 2, 2026 · Zhiyong Ma, Zhitao Deng, Huan Tang, Jialin Chen, Zhijun Zheng, Zhengping Li, Qingyuan Chuai

PECKER is an efficient machine unlearning method for diffusion models using saliency masks to prioritize critical parameter updates, reducing computational overhead while maintaining unlearning efficacy.

arXiv

What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

June 2, 2026 · Songze Li, Xiaoke Guo, Tianqi Liu, Biao Yi, Zhaoyan Gong, Zhiqiang Liu, Huajun Chen, Wen Zhang

The authors propose UILoop, an iterative UI-in-the-loop framework for multimodal GUI reasoning, enhancing element comprehension and transparency. They also introduce UI Comprehension-Bench, a new benchmark with 26,000 samples to evaluate these advancements.

arXiv

MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

June 2, 2026 · Wenchang Duan, Zhenguo Gao, Jinguo Xian, Yi Shi

MAVEN-T uses reinforced heterogeneous distillation to create a real-time, lightweight multi-agent trajectory predictor. It combines graph-based teacher knowledge with PPO-refined student training for safe, efficient autonomous driving.

arXiv

Process Reward Agents for Steering Knowledge-Intensive Reasoning

June 2, 2026 · Jiwoong Sohn, Tomasz Sternal, Kenneth Styppa, Torsten Hoefler, Michael Moor

Process Reward Agents (PRA) provide online, step-by-step rewards to guide knowledge-intensive reasoning, outperforming baselines on MedQA. PRA boosts accuracy across 0.5B-8B models without retraining, decoupling reasoning engines from domain-specific rewards.

arXiv

RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

June 2, 2026 · M\'elanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia E. Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor

RadAgent is a tool-using AI agent that generates chest CT reports via stepwise, transparent reasoning. It outperforms 3D VLMs in accuracy, robustness, and faithfulness, enabling clinicians to inspect and validate AI decisions.