Technology News - Global News Digest

arXiv

On the Theoretical Limitations of Embedding-based Link Prediction

June 2, 2026 · Samy Badreddine, Emile van Krieken, Luciano Serafini

This study proves linear output layers in knowledge graph embeddings create rank bottlenecks, limiting scalability. Non-linear alternatives significantly improve performance on large, dense graphs.

arXiv

Query Circuits: Explaining How Language Models Answer User Prompts

June 2, 2026 · Tung-Yu Wu, Fazl Barez

Query circuits explain specific LLM responses by tracking internal information flow, offering faithful, efficient explanations. Using NDF metrics, they recover significant performance with sparse circuits.

arXiv

ACON: Optimizing Context Compression for Long-horizon LLM Agents

June 2, 2026 · Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, Saravan Rajmohan

ACON optimizes long-horizon LLM agents by compressing context via natural language optimization, reducing token usage by up to 54% and boosting task success rates without fine-tuning.

arXiv

InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning

June 2, 2026 · Gautam Sreekumar, Vishnu Naresh Boddeti

InPhyRe reveals LMMs struggle with inductive physical reasoning, failing to apply unseen laws and relying on language bias, undermining their trustworthiness for safety-critical tasks.

arXiv

Taming System Complexity: Demystifying Software Engineering Agents in Diagnosing Linux Kernel Faults

June 2, 2026 · Zhenhao Zhou, Zhuochen Huang, Yike He, Chong Wang, Jiajun Wang, Yijian Wu, Xin Peng, Yiling Lou

The study introduces LinuxFLBench, revealing LLM agents struggle with kernel fault localization. It proposes LinuxFL+ to significantly boost accuracy with minimal cost.

arXiv

REBot: From RAG to CatRAG with Semantic Enrichment and Graph Routing

June 2, 2026 · Thanh Ma, Tri-Tam La, Lam-Thu Le Huu, Minh-Nghi Nguyen, Khanh-Van Pham Luu

REBot uses CatRAG, a hybrid framework with semantic-enriched graphs, to provide precise academic policy guidance. It achieved state-of-the-art F1 scores of 98.89% on regulation-specific tasks.

arXiv

A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

June 2, 2026 · Ke Chen, Yifeng Wang, Hassan Almosapeeh, Haohan Wang

This paper introduces a unified framework using an execution-free evaluator to guide query-dependent prompt optimization. It outperforms baselines by providing stable, interpretable improvements across diverse tasks.

arXiv

Multimodal Function Vectors for Visual Relations

June 2, 2026 · Shuhao Fu, Esther Goldberg, Ying Nian Wu, Hongjing Lu

Researchers isolate "function vectors" in LMMs to enhance visual relation reasoning without updating core parameters. This method boosts zero-shot accuracy and enables generalization to unseen relationships via linear combination.

arXiv

Addressing Longstanding Challenges in Cognitive Science with Language Models

June 2, 2026 · Dirk U. Wulff, Rui Mata

Language models can resolve cognitive science’s fragmentation by formalizing theories and synthesizing data, but risks like bias and oversimplification require careful, human-supervised application.

arXiv

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

June 2, 2026 · Hang He, Chuhuai Yue, Chengqi Dong, Mingxue Tian, Hao Chen, Zhenfeng Liu, Jiajun Chai, Xiaohan Wang, Yufei Zhang, Qun Liao, Guojun Yin, Wei Lin, Chengcheng Wan, Haiying Sun, Ting Su

LocalSearchBench evaluates agentic search in local services using 1.3M records and 900 multi-hop queries. Advanced LRMs struggle, with top accuracy at 35.6%, highlighting the need for specialized domain training.

arXiv

ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning

June 2, 2026 · Nearchos Potamitis, Vansh Ramani, Har Ashish Arora, Dhairya Kuchhal, Lars Klein, Akhil Arora

ReasonBENCH reveals that LLM reasoning scores are highly unstable, with single-run evaluations often misrepresenting capabilities. It proposes analyzing quality and cost as distributions to account for structured variance in performance.

arXiv

On the Collapse of Generative Paths: A Criterion and Correction for Diffusion Steering

June 2, 2026 · Ziseok Lee, Minyeong Hwang, Wooyeol Lee, Sanghyun Jo, Jihyung Ko, Young Bin Park, Jae-Mun Choi, Eunho Yang, Kyungsu Kim

This paper introduces ACE, a framework preventing "Marginal Path Collapse" in diffusion steering via time-varying exponents. ACE ensures stable, well-defined generative paths, outperforming baselines in drug design and image generation.

arXiv

Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

June 2, 2026 · Yang Yu, Zhuangzhuang Chen, Lanqing Li, Xiaomeng Li

Selective-adversarial Entropy Intervention (SaEI) enhances RL-based visual reasoning by perturbing visual inputs to boost response diversity. This method improves policy exploration and reasoning capabilities without compromising factual knowledge.

arXiv

MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents

June 2, 2026 · Youngmin Im, Byeongung Jo, Jaeyoung Wi, Seungwoo Baek, Tae Hoon Min, Joo Hyung Lee, Sangeun Oh, Insik Shin, Sunjae Lee

MobiBench is a modular, multi-path offline benchmark for mobile GUI agents, offering scalable, reproducible evaluation with 94.72% human agreement. It enables detailed module-level analysis to improve agent design and performance.

arXiv

Safety Alignment of LMs via Non-cooperative Games

June 2, 2026 · Anselm Paulus, Ilia Kulikov, Brandon Amos, R\'emi Munos, Ivan Evtimov, Kamalika Chaudhuri, Arman Zharmagambetov

AdvGame aligns LMs via non-cooperative games, jointly training attacker and defender models through online reinforcement learning. This preference-based approach enhances both safety and utility while creating a robust red-teaming tool.

arXiv

PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Vision-Language Models

June 2, 2026 · Zhisheng Chen, Tingyu Wu, Zijie Zhou, Zhengwei Xie, Jinhan Li, Ziyan Weng, Liang Lin, Jingwei Song, Zikai Xiao, Yingwei Zhang

PolarMem introduces a training-free polarized graph memory for VLMs, explicitly storing verified absent evidence to reduce contradictions. It enhances retrieval-intensive tasks by prioritizing logical consistency over semantic similarity.

arXiv

Unplugging a Seemingly Sentient Machine Is the Rational Choice -- A Metaphysical Perspective

June 2, 2026 · Erik J Bekkers, Anna Ciaunica

Challenging physicalism, the authors argue AI lacks true consciousness, making its disconnection rational. They advocate prioritizing human life over machine mimicry via Biological Idealism.

arXiv

MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

June 2, 2026 · Xuancheng Li, Haitao Li, Yujia Zhou, YiqunLiu, Qingyao Ai

MulFeRL enhances RLVR by using multi-turn verbal feedback to guide failed reasoning attempts. It outperforms baselines on math tasks and generalizes well to new domains.

arXiv

Structure Enables Effective Self-Localization of Errors in LLMs

June 2, 2026 · Ankur Samanta, Akshayaa Magesh, Ayush Jain, Kavosh Asadi, Youliang Yu, Daniel Jiang, Boris Vidolov, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni

Structured reasoning via Thought-ICS enables LLMs to precisely localize errors in flawed steps. This approach significantly boosts self-correction rates, achieving 20-40% improvements over baselines.

arXiv

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

June 2, 2026 · Jingqi Zhou, Sheng Wang, Dezhao Deng, Junwen Lu, Junwei Su, Qintong Li, Jiahui Gao, Hao Wu, Jiyue Jiang, Lingpeng Kong, Dunhong Jin, Chuan Wu

ToolSelf unifies task execution and self-reconfiguration via a unified tool interface, enabling dynamic runtime adaptation. Trained with CAT, it significantly outperforms static baselines by eliminating manual guidance needs.