Technology News - Global News Digest

arXiv

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

June 2, 2026 · Kuan Li, Shuo Zhang, Huacan Wang, Fangzhou Yu, Zecheng Sheng, Yi Gu, Weipeng Ming, Lei Xue, Chen Liu, Sen Hu, Ronghao Chen, Siyue Lin, Yuqing Hou, Xiaofeng Mou, Yi Xu

SMH-Bench evaluates LLM agents in smart homes using 1,100 tasks across varying complexities. It reveals that while LLMs handle explicit controls well, they struggle with scheduling, ambiguity, and personalized reasoning in complex environments.

arXiv

Bayesian Spectral Emotion Transition Discovery from Multi-Annotator Disagreement

June 2, 2026 · Keito Inoshita, Takato Ueno

BSETD uses Bayesian spectral analysis of multi-annotator disagreement to uncover emotion transition patterns, revealing distinct affective spaces and validating robustly across diverse corpora.

arXiv

VET: A Framework for Analyzing AI Discourse

June 2, 2026 · Meredith Ringel Morris

The VET Framework classifies AI discourse by valence, effectiveness, and trajectory to critically assess polarized narratives like AI Doom and Hype. It serves as a practical tool for improving AI literacy by enabling rigorous vetting of extreme viewpoints.

arXiv

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

June 2, 2026 · Junqi Liu, Salena Song, Yuhan Wang, Jiawei Mao, Hardy Chen, Xiaoke Huang, Tianhao Qi, Pengfei Guo, Yucheng Tang, Yufan He, Can Zhao, Andriy Myronenko, Dong Yang, Daguang Xu, Yuyin Zhou

AutoMedBench evaluates agentic AI in medical research via a five-stage workflow, revealing validation as the weakest link. It assesses performance across imaging tasks, highlighting verification failures as primary error sources.

arXiv

Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks

June 2, 2026 · Fabian Hoppe, Melven R\"ohrig-Z\"ollner, Philipp Knechtges

This study uses LLMs to optimize tensor network contraction orders, demonstrating the potential of verifier-guided evolutionary coding. However, it emphasizes the enduring necessity of human oversight for validation and interpretation.

arXiv

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

June 2, 2026 · Lichao Wang, Zhaoxing Ren, Tianzhuo Yang, Jiaming Ji, Chi Harold Liu, Yaodong Yang, Juntao Dai

SafeMCP is a server-side defense plugin that uses predictive reasoning to proactively filter hazardous tools for LLM agents. It mitigates power-seeking risks while preserving agent utility through a novel training pipeline.

arXiv

Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations

June 2, 2026 · Deyu Zhuang, Peiliang Gong, Yang Shao, Liyuan Shu, Qi Zhu, Xiaoli Li, Daoqiang Zhang

PC-MambaSDE predicts remaining useful life under irregular observations by embedding physical constraints into a continuous-time Mamba-SDE framework. It ensures physically plausible, monotonic degradation trajectories, outperforming existing methods on industrial benchmarks.

arXiv

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

June 2, 2026 · Ekaterina Alimaskina, Darya Rudas, Denis Shveykin, Gleb Molodtsov, Pavel Vasiliev, Aleksandr Beznosikov

This study reveals that 2-bit quantization causes reasoning loops in LRMs, but targeted recovery via FP16 planning and loop rescue restores accuracy. These methods enable efficient, high-performance extreme low-bit inference without sacrificing speed.

arXiv

RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network

June 2, 2026 · Yogesh Kumar Meena, Saurabh Agarwal, K. V. Arya

RL-ACRGNet uses reinforcement learning to automate chest X-ray reporting, outperforming benchmarks on IU-Xray and MIMIC-CXR. It improves report quality and clinical consistency through a novel encoder-decoder architecture.

arXiv

Topological texture analysis of microscopy images of dynamic casein gelation and its relation to rheological properties

June 2, 2026 · Zahra Tabatabaei, Diana Soto Aguilar, Jose C. Bonilla, Mathias P. Clausen, Jon Sporring

This study links casein gelation’s rheology to microscopy via TDA, DBC, MFP, and LBP. It reveals microstructural phases, offering a robust tool for analyzing complex food material dynamics.

arXiv

An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

June 2, 2026 · Sherzod Turaev, Mary John, Mamoun Awad, Nazar Zaki, Khaled Shuaib

This study presents an NLP framework aligning curricula with labor markets using schema-constrained LLMs and ESCO-based semantic matching. Applied to a CS program, it achieved high extraction reliability and comprehensive gap quantification.

arXiv

Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings

June 2, 2026 · Hallah Shahid Butt, Qiong Huang, G\"okhan Demirel, Kevin F\"orderer, Erfan Tajalli-Ardekani, Simnon Waczowicz, Luigi Spatafora, Veit Hagenmeyer, Benjamin Sch\"afer

This study introduces an explainable deep reinforcement learning framework for optimal building energy management, demonstrating that on-policy algorithms like PPO achieve superior stability and cost savings.

arXiv

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

June 2, 2026 · Jiaming Wang, Ziteng Feng, Jiangtao Wu, Ruihao Li, Qianqian Xie, Yuxiang Ren, He Zhu, Xueming Han, Fanyu Meng, Junlan Feng, Jiaheng Liu

This study introduces TELBench and DRIFT to localize span-level errors in deep-research agents, improving error detection accuracy by 30%. It shifts focus from final outputs to trajectory reliability.

arXiv

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

June 2, 2026 · Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin, Jinyu Guo, Malu Zhang, Peng Wang, Yang Yang

eMoT stabilizes LLM reasoning via symbolic anchoring and memory corrosion, achieving superior accuracy on math benchmarks with lightweight models.

arXiv

S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty

June 2, 2026 · Fabio Pavirani, Bert Claessens, Pierre Pinson, Chris Develder

S3TS integrates scenario trees with non-linear models to optimize grid planning under uncertainty. It outperforms baselines, reducing costs by up to 51% in non-linear contexts.

arXiv

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

June 2, 2026 · Liuji Chen, Dianxing Tang, Xing Shi, Dingshuo Chen, Qiang Liu, Shu Wu, Liang Wang

EAPO mitigates tool abuse in agentic RL via difficulty-sensitive rewards and confidence-based reweighting. It boosts accuracy by ~10% while cutting tool calls by ~20% across Qwen and Llama models.

arXiv

An Abstract Worlds Semantic Framework for Belief Change Operators

June 2, 2026 · Daniel Grimaldi, M. Vanina Martinez, Ricardo O. Rodriguez

This paper introduces Abstract Worlds Semantics, a syntax-free set-theoretic framework for belief change. It unifies classical and non-prioritized models, generalizing AGM, KM, and Multiple Change theories.

arXiv

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

June 2, 2026 · Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller

BADGER unifies text-to-SQL and agentic evaluation for enterprise AI. Its Hybrid-EX metric achieves 87.3% accuracy, significantly outperforming existing frameworks.

arXiv

From Capability Models to Automated Planning: An AAS-Native Approach for Automatic PDDL Generation

June 2, 2026 · Hamied Nabizada, Thomas Wirt, Luis Miguel Vieira da Silva, Felix Gehlhoff, Alexander Fay

This study enables automatic PDDL generation from AAS capability models, allowing engineers to verify production layouts without PDDL expertise. It validates the approach by comparing layout variants in a laboratory system.

arXiv

CEON: Circular Economy Ontology Network

June 2, 2026 · Huanyu Li, Els de Vleeschauwer, Robin Keskis\"arkk\"a, Mikael Lindecrantz, Mina Abd Nikooie Pour, Ying Li, Ben De Meester, Patrick Lambrix, Eva Blomqvist

CEON addresses semantic interoperability gaps in the circular economy by establishing cross-sectorial concepts. It facilitates data documentation across construction, electronics, and textile industries.