Technology News - Global News Digest

arXiv

MindClaw: Closed-Loop Embodied Mental-State Reasoning for Precision Intervention

June 2, 2026 · Ruoxuan Zhang, Qiaoqiao Wan, Zhengguang Wang, Chenghao Yu, Hongxia Xie, Jianlong Fu, Wen-Huang Cheng

MindClaw is a closed-loop framework for embodied Theory of Mind that enables precise, real-time assistance by integrating belief memory and cognitive triggers. It outperforms VLM baselines by optimizing intervention timing and maintaining dynamic environmental connectivity.

arXiv

TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

June 2, 2026 · Bohan Yang, Yijun Gong, Zhi Zhang, Ge Zhang, Wenpeng Xing, Meng Han

TriLens detects LLM hallucinations by tracking per-layer logit-lens entropy across attention, FFN, and residual streams. This white-box method uses compact entropy trajectories to identify internal uncertainty without storing high-dimensional states.

arXiv

Before the Model Learns the Bug:Fuzzing RLVR Verifiers

June 2, 2026 · Jaideep Ray

This paper introduces a fuzzing framework for RLVR verifiers to detect flaws before models learn them. It quantifies verifier bugs via adversarial testing and performance metrics.

arXiv

AnyEdit++: Adaptive Long-Form Knowledge Editing via Bayesian Surprise

June 2, 2026 · Bowen Tian, Caixue He, Jiemin Wu, Jingying Wang, Wenshuo Chen, Zexi Li, Yutao Yue

AnyEdit++ enhances long-form knowledge editing in LLMs by using Bayesian Surprise to identify semantic boundaries, ensuring structural coherence. This approach outperforms baselines in reasoning, coding, and narrative tasks.

arXiv

CAREAgent: Clinical Agent with Structured Reasoning and Tool-Integrated for Order Generation

June 2, 2026 · Ruihui Hou, Ziyue Huai, Chennuo Zhang, Ziyan Liu, Siran Zhao, Yao Yu, Jie Zhai, Tong Ruan

CAREAgent generates executable clinical orders using structured reasoning and tools. It outperforms existing methods on ClinicalBench, achieving significant F1 score improvements.

arXiv

Diagnosing LLM Arbitration Behavior over Pre-evidence Epistemic States in RAG-based Fact-Checking

June 2, 2026 · Yuxi Sun, Wenbo Shang, Wei Gao, Xin Huang, Jing Ma

The study introduces PAVE to diagnose how LLMs resolve conflicts between prior beliefs and retrieved evidence in RAG fact-checking. It reveals inconsistent arbitration behaviors and proposes a lightweight JSD-based method to improve factual reliability.

arXiv

Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches

June 2, 2026 · Teddy Ferdinan, Bart{\l}omiej Koptyra, Miko{\l}aj Langner, Tomasz Adamczyk, {\L}ukasz Radli\'nski, Maciej Markiewicz, Aleksander Szcz\k{e}sny, Stanis{\l}aw Wo\'zniak, Tymoteusz Romanowicz, Dzmitry Pihulski, Mateusz Zbrocki, Mateusz \'Smigielski, Micha{\l}

This survey examines Reasoning Language Models across 28 scientific fields, revealing significant adoption disparities. It proposes a maturity framework to address imbalances and guide broader, equitable integration in scientific research.

arXiv

Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification

June 2, 2026 · Shihao Ji, Haotao Tan, Zihui Song, Mingyu Li

The paper introduces Expected Value Alignment (EVA) to improve generative reward modeling for formal math verification. EVA extracts continuous scores from discrete token distributions, reducing discretization artifacts while preserving interpretability.

arXiv

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

June 2, 2026 · Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang, Qing Zong, Jiahe Guo, Zhongwei Xie, Yiyan Ji, Yauwai Yim, Hongyu Luo, Xiyu Ren, Ruan Chenyu, Haoran Li, Yangqiu Song

SkillRevise iteratively improves LLM-authored agent skills using execution traces, boosting success rates from 36.05% to 61.63%. It outperforms baselines by leveraging empirical data for robust, transferable procedural knowledge.

arXiv

The Case for Model Science: Verify, Explore, Steer, Refine

June 2, 2026 · Przemyslaw Biecek, Luca Longo, Jianlong Zhou, Thomas Fel, Andreas Holzinger, Wojciech Samek

The authors propose "Model Science" to replace limited benchmarking with a systematic approach using Verify, Explore, Steer, and Refine. This discipline aims to explain model mechanisms and failures, drawing insights from fields like neuroscience and medicine.

arXiv

Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts

June 2, 2026 · Ya Shen, Gang Chen, Hui Ma, Mengjie Zhang

DEFT is a novel DRL scheduler using a Mixture-of-Experts architecture to optimize dynamic cloud workflows with varying deadlines. It outperforms baselines by reducing costs and deadline violations via a graph-adaptive gating mechanism.

arXiv

Can LLM Agents Sustain Long-Horizon Organizational Dynamics?

June 2, 2026 · Xuancheng Zhu, Yang Yue, Shuaibing Wan, Zihan Dou, Xiaohan Zhang, Yongrui Liu, Guoshun Nan

TaskWeave, a hierarchical framework using dependency-aware trace memory, enables LLM agents to sustain coherent, long-term organizational dynamics. Evaluated on a year-long IT simulation, it outperforms baselines in coherence and grounding.

arXiv

"Skill issues'': data-centric optimization of lakehouse agents

June 2, 2026 · Nicole Rose Schneider, Davide Ghilardi, Giacomo Piccinini, Jacopo Tagliabue

This study optimizes lakehouse agents via data-centric pipelines, achieving a 31.9% accuracy gain. By verifying lakehouse states rather than just outputs, it refines agent capabilities for write-heavy workflows.

arXiv

The Shape of Wisdom: Decision Trajectories in Language Models

June 2, 2026 · Shailesh Rana

This study maps decision trajectories in LLMs, revealing that "unstable-correct" answers are most common. It offers a methodology to distinguish stable from precarious model outputs.

arXiv

Advanced Mathematics Learning Behavior Prediction and Academic Early Warning Model Based on Multimodal Data Analysis

June 2, 2026 · Liu Qiong, Li Zhengbo

This study uses multimodal data and hierarchical knowledge graphs to predict advanced math learning behaviors and issue early academic warnings. Empirical results show the model effectively identifies at-risk students and enhances mastery through targeted interventions.

arXiv

HomeFlow: A Data Flywheel for Smart Home Agent Training with Verifiable Simulation

June 2, 2026 · Yi Gu, Huacan Wang, Shuo Zhang, Yuqing Hou, Lei Xue, Weipeng Ming, Chen Liu, Fangzhou Yu, Kuan Li, Ronghao Chen, Sen Hu, Xiaofeng Mou, Yi Xu

HomeFlow is a verifiable data flywheel using simulation and tree search to train smart home agents. Its models outperform GPT-5.5, achieving up to 87% task success rates on the new SmartHome-Bench.

arXiv

Application of Algorithms in Energy-Efficient Design Platforms for Green Building

June 2, 2026 · Na Yu, Fu Wenli, Guo Fei

A novel BIM-integrated platform using evolutionary algorithms reduced office building energy use by 29.3% with minimal cost and discomfort. This validates its effectiveness for sustainable, energy-efficient green building design.

arXiv

SIRIUS-SQL: Anchoring Multi-Candidate Text-to-SQL in Execution Feedback

June 2, 2026 · Leo Luo, Haining Xie, Siqi Shen, Zhipeng Ma, Rui Ling, Hang Xu, Hefeng Jiang, Dingwei Chen, Yang Li, Peng Chen, Jie Jiang

SIRIUS-SQL improves text-to-SQL by using execution feedback and reinforcement learning to generate diverse, executable candidates. It outperforms existing systems, achieving 75.88% on BIRD and 91.20% on SPIDER.

arXiv

Emergent Ordinal Geometry in Transformers Trained on Local Comparisons

June 2, 2026 · Nishit Singh

Transformers trained on local comparisons develop an internal number line, mirroring human symbolic distance effects. Their embeddings collapse into a 1D manifold recovering hidden rank order, bridging cognitive science and neural networks.

arXiv

ANDES: Agent Native Data Evolving Synthesis Tool for Autonomous Instruction Alignment

June 2, 2026 · Zhengyang Zhao, Shengjie Ye, Lu Ma, Hao Liang, Hengyi Feng, Wentao Zhang

ANDES is an agent-native framework that enhances autonomous instruction alignment by providing a modular data synthesis skill. It overcomes agent context limits via a self-evolving World Tree, achieving state-of-the-art post-training results.