Global News Digest

Technology

arXiv

SMH-Bench: Benchmarking LLM Agents for Environment-Grounded Reasoning and Action in Smart Homes

SMH-Bench evaluates LLM agents in smart homes using 1,100 tasks across varying complexities. It reveals that while LLMs handle explicit controls well, they struggle with scheduling, ambiguity, and personalized reasoning in complex environments.

arXiv

Bayesian Spectral Emotion Transition Discovery from Multi-Annotator Disagreement

BSETD uses Bayesian spectral analysis of multi-annotator disagreement to uncover emotion transition patterns, revealing distinct affective spaces and validating robustly across diverse corpora.

arXiv

VET: A Framework for Analyzing AI Discourse

The VET Framework classifies AI discourse by valence, effectiveness, and trajectory to critically assess polarized narratives like AI Doom and Hype. It serves as a practical tool for improving AI literacy by enabling rigorous vetting of extreme viewpoints.

arXiv

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

AutoMedBench evaluates agentic AI in medical research via a five-stage workflow, revealing validation as the weakest link. It assesses performance across imaging tasks, highlighting verification failures as primary error sources.

arXiv

Algorithmic algorithm development with LLMs: A Case Study on LLM-Usage for Contraction Order Optimization in Tensor Networks

This study uses LLMs to optimize tensor network contraction orders, demonstrating the potential of verifier-guided evolutionary coding. However, it emphasizes the enduring necessity of human oversight for validation and interpretation.

arXiv

SafeMCP: Proactive Power Regulation for LLM Agent Defense via Environment-Grounded Look-Ahead Reasoning

SafeMCP is a server-side defense plugin that uses predictive reasoning to proactively filter hazardous tools for LLM agents. It mitigates power-seeking risks while preserving agent utility through a novel training pipeline.

arXiv

Physically-Constrained Mamba-SDE for Remaining Useful Life Prediction under Irregular Observations

PC-MambaSDE predicts remaining useful life under irregular observations by embedding physical constraints into a continuous-time Mamba-SDE framework. It ensures physically plausible, monotonic degradation trajectories, outperforming existing methods on industrial benchmarks.

arXiv

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

This study reveals that 2-bit quantization causes reasoning loops in LRMs, but targeted recovery via FP16 planning and loop rescue restores accuracy. These methods enable efficient, high-performance extreme low-bit inference without sacrificing speed.

arXiv

RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network

RL-ACRGNet uses reinforcement learning to automate chest X-ray reporting, outperforming benchmarks on IU-Xray and MIMIC-CXR. It improves report quality and clinical consistency through a novel encoder-decoder architecture.

arXiv

Topological texture analysis of microscopy images of dynamic casein gelation and its relation to rheological properties

This study links casein gelation’s rheology to microscopy via TDA, DBC, MFP, and LBP. It reveals microstructural phases, offering a robust tool for analyzing complex food material dynamics.

arXiv

An NLP-Driven Framework for Curriculum-Labor Market Alignment: Schema-Constrained LLM Extraction, ESCO-Anchored Semantic Matching, and Multi-Dimensional Gap Quantification

This study presents an NLP framework aligning curricula with labor markets using schema-constrained LLMs and ESCO-based semantic matching. Applied to a CS program, it achieved high extraction reliability and comprehensive gap quantification.

arXiv

Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings

This study introduces an explainable deep reinforcement learning framework for optimal building energy management, demonstrating that on-policy algorithms like PPO achieve superior stability and cost savings.

arXiv

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

This study introduces TELBench and DRIFT to localize span-level errors in deep-research agents, improving error detection accuracy by 30%. It shifts focus from final outputs to trajectory reliability.

arXiv

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

eMoT stabilizes LLM reasoning via symbolic anchoring and memory corrosion, achieving superior accuracy on math benchmarks with lightweight models.

arXiv

S3TS: Stochastic Scenario-Structured Tree Search for Advanced Planning Under Uncertainty

S3TS integrates scenario trees with non-linear models to optimize grid planning under uncertainty. It outperforms baselines, reducing costs by up to 51% in non-linear contexts.

arXiv

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

EAPO mitigates tool abuse in agentic RL via difficulty-sensitive rewards and confidence-based reweighting. It boosts accuracy by ~10% while cutting tool calls by ~20% across Qwen and Llama models.

arXiv

An Abstract Worlds Semantic Framework for Belief Change Operators

This paper introduces Abstract Worlds Semantics, a syntax-free set-theoretic framework for belief change. It unifies classical and non-prioritized models, generalizing AGM, KM, and Multiple Change theories.

arXiv

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

BADGER unifies text-to-SQL and agentic evaluation for enterprise AI. Its Hybrid-EX metric achieves 87.3% accuracy, significantly outperforming existing frameworks.

arXiv

From Capability Models to Automated Planning: An AAS-Native Approach for Automatic PDDL Generation

This study enables automatic PDDL generation from AAS capability models, allowing engineers to verify production layouts without PDDL expertise. It validates the approach by comparing layout variants in a laboratory system.

arXiv

CEON: Circular Economy Ontology Network

CEON addresses semantic interoperability gaps in the circular economy by establishing cross-sectorial concepts. It facilitates data documentation across construction, electronics, and textile industries.