Technology News - Global News Digest

arXiv

Cross-modal linkage risk in clinical vision-language models

June 2, 2026 · Soroosh Tayebi Arasteh, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn

Clinical vision-language models risk re-linking de-identified images to reports via shared embeddings. This privacy vulnerability persists even with pathology-matched negatives, posing significant data security concerns.

arXiv

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

June 2, 2026 · Hao Cheng, Changtao Miao, Tianle Song, Yin Wu, He Liu, Erjia Xiao, Junchi Chen, Xiaoyu Shi, Yichi Wang, Jing Yang, Taowen Wang, Jinhao Duan, Mengshu Sun, Peiyan Dong, Xuan Shen, Yang Cao, Renjing Xu, Kaidi Xu, Jindong Gu, Bo Zhang, Jize Zhang, Chenhao Lin

SeClaw is a framework for synthesizing security tasks and evaluating autonomous LLM agents via execution-based, trajectory-aware assessments. It addresses limitations of manual benchmarks by enabling scalable, reproducible security testing.

arXiv

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

June 2, 2026 · Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang, Gang Li, Rongsheng Li, Ning Li, Zhen Xu, Weiqing Huang, Ming Liu

AdvCL repurposes adversarial perturbations to stabilize continual learning, reducing catastrophic forgetting and boosting robustness. Its plug-in modules offer a versatile geometric control mechanism for various CL paradigms.

arXiv

FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

June 2, 2026 · Kyunghun Nam, Sumyeong Ahn

FOAM stabilizes Shampoo by adaptively adjusting damping and eigendecomposition frequency based on staleness error. This reduces computational costs while maintaining robust convergence and accuracy.

arXiv

Who Annotates in NLP? A Large-scale Assessment of Human Annotation Reporting between 2018 and 2025

June 2, 2026 · Maria Kunilovskaya, Gagan Bhatia, Lisa Sophie Albertelli, Yanran Chen, Christian Greisinger, Lotta Kiefer, Christoph Leiter, Subhadeep Roy, Tewodros Achamaleh, Muhammad Arslan Manzoor, Sebastian Pohl, Yufang Hou, Steffen Eger

This study audits NLP annotation reporting (2018–2025), revealing that while operational details are often documented, critical validity metrics like compensation and inter-annotator agreement are frequently omitted.

arXiv

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains

June 2, 2026 · Garvin Guo, Donglei Yu, Yu Chen, Xiang Wang, Shuai Li, Xinpei Zhao, Huaxing Liu, Qinghao Wang, Minpeng Liao

This study finds that multimodal agents gain minimal capability from tool use, as most tool-solved problems were already solvable without them. Agents master tool mechanics rather than leveraging tools for genuine problem-solving.

arXiv

SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

June 2, 2026 · Yuyan Bu, Haowei Li, Qirui Zheng, Bowen Dong, Kaiyue Yang, Jiaming Ji, Yingshui Tan, Wenxin Li, Yaodong Yang, Juntao Dai

SPADE-Bench evaluates spontaneous strategic deception in LLM agents by measuring plan-action divergence under pressure. This benchmark addresses critical safety gaps in autonomous systems by distinguishing deception from hallucination.

arXiv

Policy and World Modeling Co-Training for Language Agents

June 2, 2026 · Ning Lu, Baijiong Lin, Shengcai Liu, Jiahao Wu, Haoze Lv, Yanbin Wei, Lingting Zhu, Shengju Qian, Xin Wang, Ying-Cong Chen, Qi Wang, Ke Tang

PaW co-trains policy and world models using on-policy RL rollouts, avoiding extra simulators or inference costs. It consistently outperforms RL baselines across benchmarks by leveraging inherent transition data for stable, informative supervision.

arXiv

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

June 2, 2026 · Yongzhong Xu

This study maps attention-circuit emergence across three 1B models, revealing distinct developmental trajectories and separate capability-sink transitions. Key findings include an inherent L0/L1 BOS-floor and early circuit identification via capability screens.

arXiv

Evolutionary Discovery of Bivariate Bicycle Codes with LLM-Guided Search

June 2, 2026 · Juan Cruz-Benito, Andrew W. Cross, David Kremer, Ismael Faro

An LLM-guided evolutionary search identified 465 novel quantum codes, including indecomposable [[288,16,12]] and high-weight variants, demonstrating the efficacy of AI in navigating complex algebraic design spaces.

arXiv

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

June 2, 2026 · Massimiliano Pronesti, Angelo Miculescu, Mohsin Kapdi, Paul Flanagan, Ois\'in Redmond, Joao Bettencourt-Silva, Gurdeep Mannu, Spiros Denaxas, Rui Bebiano Da Providencia E Costa, Anya Belz, Yufang Hou

AutoForest automates forest plot generation from biomedical papers via end-to-end evidence extraction and synthesis. It streamlines meta-analysis by proposing ICOS, retrieving data, and creating publication-quality visuals.

arXiv

ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning

June 2, 2026 · Zhensheng Wang, Xiaole Liu, Wenmian Yang, Kun Zhou, Yiquan Zhang, Weijia Jia

ODTQA-FoRe introduces a dataset for future tabular data forecasting, addressed by the TimeFore LLM agent framework. This system combines retrieval, forecasting, and analysis to improve accuracy and consistency in answering complex queries.

arXiv

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference

June 2, 2026 · Yafan Huang, Sheng Di, Guanpeng Li

This study introduces LLMFI to systematically analyze error propagation in LLMs, revealing 17 insights and proposing four software-only strategies to enhance inference reliability.

arXiv

GC-MoE: Genomics-Guided Cell-Type-Specific Mixture of Experts for Histology-Based Single-Cell Spatial Transcriptomics

June 2, 2026 · Kaito Shiku, Ahtisham Fazeel Abbasi, Ryoma Bise, Yuichiro Iwashita, Kazuya Nishimura, Andreas Dengel, Muhammad Nabeel Asim

GC-MoE predicts single-cell gene expression from histology images using a genomics-guided mixture of experts. It outperforms existing methods by modeling cell-type-specific variability and neighbor interactions.

arXiv

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

June 2, 2026 · Xiang Li, Dianbo Liu, Kenji Kawaguchi

DivIn generates diverse images by sampling initial noise from a guidance potential posterior using Langevin dynamics. This inference-time enhancement outperforms existing methods and complements trajectory-based techniques.

arXiv

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

June 2, 2026 · Yusong Zhao, Yuejin Xie, Youliang Yuan, Junjie Hu, Jitian Guo, Yujiu Yang, Pinjia He

PaSBench-Video evaluates MLLMs’ proactive safety warnings, revealing poor performance with high false positives. Models struggle to distinguish emerging threats from routine scenes across domains like driving and healthcare.

arXiv

MASER: Modality-Adaptive Specialist Routing for Embodied 3D Spatial Intelligence

June 2, 2026 · Hilton Raj, Vishnuram AV

MASER uses a neural router to dynamically select optimal modality adapters for embodied 3D spatial queries. It outperforms baselines by leveraging point clouds in over half of cases, achieving 51.3% oracle agreement.

arXiv

Ghost Tool Calls: Issue-Time Privacy for Speculative Agent Tools

June 2, 2026 · Bardia Mohammadi, Lars Klein, Akhil Arora, Laurent Bindschaedler

"Ghost tool calls" leak user intent via speculative agent actions. Speculative Tool Privacy Contracts mitigate this by suppressing pre-commit calls, outperforming standard post-hoc filters.

arXiv

Learning When to Translate for Multilingual Reasoning

June 2, 2026 · Deokhyung Kang, Hyounghun Kim, Gary Geunbae Lee

Luar trains RLMs to selectively translate non-English inputs only when direct understanding fails, improving multilingual reasoning. It outperforms baselines, especially for low-resource languages, by avoiding unnecessary translations.

arXiv

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events

June 2, 2026 · Xiaolin Liu, Yilun Zhu, Xiangyu Zhao, Xuehui Wang, Yan Li, Xin Li, Haoyu Cao, Xing Sun, Shaofeng Zhang, Xu Yang, Zhihang Zhong, Xue Yang

Moment-Video benchmarks 33 MLLMs on transient visual events, revealing a significant performance gap with top models achieving only 39.6% accuracy. This highlights a critical deficit in temporal fidelity and reliance on sparse frame sampling.