Technology News - Global News Digest

arXiv

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

June 2, 2026 · Yanyu Chen, Jiyue Jiang, Dianzhi Yu, Zheng Wu, Jiahong Liu, Jiaming Han, Xiao Guo, Jinhu Qi, Yu Li, Yifei Zhang, Irwin King

LC-ERD mines latent logic via consistency-regulated reward decomposition to solve label noise and coarse supervision in LLM self-alignment. It enables resilient self-evolving reasoning by cleaning the reasoning manifold and measuring individual step utility.

arXiv

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

June 2, 2026 · Yifan Zeng, Yiran Wu, Yaolun Zhang, Wentian Zhao, Kun Wan, Qingyun Wu, Huazheng Wang

Multi-agent RL improves LLMs via complex tradeoffs: isolated policies yield higher peak accuracy but risk terminal cliffs, while shared policies cause asymmetric gradient dominance.

arXiv

Fundamental Limitation in Explaining AI

June 2, 2026 · Atsushi Suzuki, Jing Wang

This paper proves a fundamental quadrilemma: AI performance, interpretability, faithfulness, and environmental complexity cannot all be maximized simultaneously. Thus, complete faithfulness in AI explanations is theoretically impossible.

arXiv

Test-Time Deep Thinking to Explore Implicit Rules

June 2, 2026 · Wentong Chen, Xin Cong, Zhong Zhang, Yaxi Lu, Siyuan Zhao, Yesai Wu, Qinyu Luo, Haotian Chen, Yankai Lin, Zhiyuan Liu, Maosong Sun

TTExplore uses a 7B Exp-Thinker to deduce implicit rules via stable RL, boosting agent performance by 14–19 points in embodied tasks.

arXiv

Hypothesis Generation and Inductive Inference in Children and Language Models

June 2, 2026 · Jeffrey Qin, Wasu Top Piriyakulkij, Zhuangfei Gao, Mia Radovanovic, Jessica Sommerville, Kevin Ellis, Marta Kryven

This study compares children and LLMs in an inductive inference task, finding both discount unreliable evidence and dissociate task completion from rule generalization.

arXiv

Experiments in Agentic AI for Science

June 2, 2026 · Judy Fox, Geoffrey Fox

This study introduces DeepTS and DeepScribe, agentic AI frameworks for automating time-series data curation and physics lecture analysis. These systems leverage hybrid architectures to enhance scientific workflows beyond current LLM limitations.

arXiv

Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling

June 2, 2026 · Dao Tran, Duc Anh Le, Ngoc Luu, Quan Pham, Tung Pham, Hung Bui

This paper introduces stochastic backtracking over a persistent prefix pool to improve test-time scaling. It achieves higher accuracy with fewer tokens than frontier-only PRM methods.

arXiv

BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting

June 2, 2026 · Ruifeng Tan, Jintao Dong, Weixiang Hong, Jia Li, Jiaqiang Huang, Tong-Yi Zhang

BatteryMFormer is a multi-level Transformer framework for early battery degradation forecasting. It outperforms baselines by leveraging aging-condition priors, meta-pattern memory, and dual-view encoding.

arXiv

FrontierOR: Benchmarking LLMs' Capacity for Efficient Algorithm Design in Large-Scale Optimization

June 2, 2026 · Minwei Kong, Chonghe Jiang, Ao Qu, Wenbin Ouyang, Zhaoming Zeng, Xiaotong Guo, Zhekai Li, Junyi Li, Yi Fan, Xinshou Zheng, Xi Jing, Yikai Zhang, Zhiwei Liang, Seonghoo Kim, Runqing Yang, Zijian Zhou, Sirui Li, Han Zheng, Wangyang Ying, Ou Zheng, Chonghuan

FrontierOR benchmarks LLMs on large-scale optimization, revealing that even top models struggle to outperform standard solvers in efficiency and quality.

arXiv

Cross-Entropy Games and Frost Training

June 2, 2026 · Arthur Renard, Franck Gabriel, Valentin Hartmann, Cl\'ement Hongler

Frost Training uses reward gradients in embedding space to accelerate LLM policy optimization in Cross-Entropy Games. It significantly boosts output quality and computational efficiency during GRPO training.

arXiv

RULER: Representation-Level Verification of Machine Unlearning

June 2, 2026 · Georgina Cosma, Axel Finke

RULER introduces representation-level metrics (M2, M4) to verify machine unlearning, revealing that models passing output-level checks often retain forgotten data. This framework exposes hidden memorization across diverse domains where traditional methods fail.

arXiv

Agyn: An Open-Source Platform for AI Agents with Scalable On-Demand Execution, Agent Definition as a Code, and Zero-Trust Access

June 2, 2026 · Nikita Benkovich, Vitalii Valkov

Agyn is an open-source platform for scalable AI agent deployment, featuring a Kubernetes-based serverless runtime, Terraform-based code definition, and zero-trust security.

arXiv

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration

June 2, 2026 · Hankyeol Kim, Pilsung Kang

This study reveals LLM confidence calibration is highly sensitive to measurement protocols, challenging assumptions about Instruct model improvements. It shows verbalized confidence fails to distinguish correct from plausible incorrect answers.

arXiv

Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

June 2, 2026 · Aakash Pant, Kavya Shah, Apoorv Agnihotri, Sneha Nikam, Prasaanth Balraj, Nakul Jain

This study argues for evaluating fully deployed AI systems in low-resource contexts, integrating deployment variables like hardware limits and connectivity. It proposes a standardized reporting framework to replace abstract leaderboard metrics with actionable, context-aware assessments.

arXiv

FundaPod: A Multi-Persona Agent Pod Platform with Knowledge Graph Memory for AI-Assisted Fundamental Investment Research

June 2, 2026 · Di Zhu, Lei Nico Zheng, Zihan Chen

FundaPod uses multi-persona AI agents and knowledge graph memory to support transparent, human-centric fundamental investment research. It enables independent agent analysis for portfolio managers to adjudicate divergent views and build verifiable investment plans.

arXiv

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

June 2, 2026 · Shuhei Watanabe, Frank Hutter

c-TPE modifies Tree-Structured Parzen Estimators to handle inequality constraints in expensive hyperparameter optimization. It outperforms existing methods across 81 tasks and is available via OptunaHub.

arXiv

Cookie-Bench: Continuous On-screen Key Interaction Evaluation for Web Generation

June 2, 2026 · Haoyue Yang, Zhangxiao Shen, Fan Ding, Hangting Lou, Yifeng Kou, Haoqing Yu, Jingyao Li, Zhengfan Wu, Siqi Bao, Jing Liu, Hua Wu

Cookie-Bench introduces a reference-free, autonomous framework evaluating web generation via continuous on-screen interaction. It correlates strongly with human ratings, offering scalable, holistic assessment of functionality and aesthetics.

arXiv

Stability Analysis of Sharpness-Aware Minimization

June 2, 2026 · Hoki Kim, Jinseong Park, Yujin Choi, Jaewook Lee

This study proves SAM gets trapped at saddle points due to inferior diffusion compared to vanilla gradient descent. It shows momentum and batch size are crucial for alleviating this instability and improving generalization.

arXiv

DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle

June 2, 2026 · Oskar Natan, Jun Miura

DeepIPCv2 is an end-to-end autonomous driving system using LiDAR for robust perception and control. It outperforms existing methods in lighting variations and precision, with code to be open-sourced.

arXiv

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

June 2, 2026 · Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali \.Irfan Mahmuto\u{g}ullar{\i}, Brandon Amos, Tias Guns, Michele Lombardi

This paper introduces a novel decision-focused learning method using score function gradient estimation to remove structural assumptions. It effectively handles nonlinear objectives and uncertain constraints, matching specialized techniques while offering broader applicability.