Technology News - Global News Digest

arXiv

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

June 2, 2026 · Ravil Mussabayev, Rustam Mussabayev

MLLM-Microscope analyzes MLLM internal representations, revealing that fusion techniques significantly impact embedding linearity and dimensionality. These insights guide future model architecture design and optimization.

arXiv

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

June 2, 2026 · Ismail Hossain, Sai Puppala, Zhuoran Lu, Sajedul Talukder, Nan Jiang

SkillVetBench benchmarks security in open agentic ecosystems via semantic vetting and sandbox execution. It reveals static methods' inadequacy and proves runtime verification detects hidden malicious intent.

arXiv

CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences

June 2, 2026 · Fangzhou Lin, Peiran Li, Lingyu Xu, Wenjing Chen, Qianwen Ge, Shuo Xing, Mingyang Wu, Xiangbo Gao, Siyuan Yang, Kazunori Yamada, Ziming Zhang, Haichong Zhang, Zhen Dong, Ming-Hsuan Yang, Zhengzhong Tu

CV-Arena is a benchmark for instructional computer vision, using 12k pairs and Active Elo to evaluate human-AI collaborative preferences. It reveals significant shortcomings in current systems regarding instruction adherence and physical reasoning.

arXiv

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

June 2, 2026 · Yuhang Jiang

Mamba-2’s state sink splits into causal execution heads and non-causal detection heads. Single-bucket probes miss the critical execution layer, revealing that representational similarity does not guarantee functional importance.

arXiv

Explainable deep reinforcement learning reveals energy-efficient control strategies for turbulent drag reduction

June 2, 2026 · Federica Tonti, Ricardo Vinuesa

This study uses explainable deep reinforcement learning to achieve 34.44% drag reduction and 34.01% net energy savings. The optimal strategy combines SHAP attributions for skin-friction and pressure, outperforming baselines with minimal actuation cost.

arXiv

Silent Failures in Federated Personalization of Foundation Models

June 2, 2026 · YongKyung Oh, Alex Bui

This paper introduces "Silent Failures" in federated foundation model personalization, where privacy obscures trustworthiness issues. It proposes a taxonomy and advocates for privacy-preserving behavioral evaluation to detect these hidden risks.

arXiv

SS-ZKR: Spatial-Semantic Zero-Knowledge Routing for Privacy-Preserving Multi-Agent Collaboration

June 2, 2026 · Hassan Touheed

SS-ZKR enables privacy-preserving multi-agent routing by using zero-knowledge proofs to hide payload content from intermediaries. This allows compliant cross-organizational collaboration in regulated sectors without decrypting sensitive data.

arXiv

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

June 2, 2026 · Bingyu Li, Da Zhang, Tao Huo, Zhiyuan Zhao, Junyu Gao, Xuelong Li

This paper introduces MTRS, a new task for segmenting temporal changes, and MTRefSeg-21K, a 21K-sample benchmark. It proposes MTRefSeg-R1, a specialized LVLM framework that outperforms existing baselines in multi-temporal referring segmentation.

arXiv

Lodestar: An Online-Learning LLM Inference Router

June 2, 2026 · Gangmuk Lim, Wanyu Zhao, Brighten Godfrey, Jiaxin Shan, Le Xu, Liguang Xie

Lodestar is a learning-based router that optimizes LLM inference by predicting optimal GPU assignments to minimize latency. It outperforms existing heuristics, reducing TTFT by up to 2.15x in specialized environments.

arXiv

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

June 2, 2026 · Gyojin Han, Junmo Kim

This study proposes a cross-axis feature fusion model with joint-wise motion difference prediction for text-based 3D human motion editing. It achieves state-of-the-art semantic alignment and fidelity on the MotionFix dataset.

arXiv

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

June 2, 2026 · Zhiyao Xu, Aoxue Liu, Zhanjie Ding, Dan Zhao, Yong Jiang, Qing Li

TACG optimizes MoE inference via task-aware expert grouping, while GESR ensures robustness through selective replication. Together, they significantly reduce communication costs and improve load balancing in multi-task environments.

arXiv

FVSpec: Real-World Property-Based Tests as Lean Challenges

June 2, 2026 · Quinn Dougherty, Max von Hippel, Hazel Shackleton, Mike Dodds

FVSpec converts 2,772 real-world Python property-based tests into 9,415 Lean 4 specifications using an LLM pipeline. This open-source benchmark assesses AI capabilities in formal software verification.

arXiv

AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics

June 2, 2026 · Ranulfo Bezerra, Satoshi Tadokoro, Kazunori Ohno

This survey proposes a modular architecture integrating AI, IoT, and robotics using hybrid SLMs and LLMs. It outlines a blueprint for adaptive, connected robotic ecosystems addressing current interoperability and scalability challenges.

arXiv

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

June 2, 2026 · Xin Su, Dawid Majchrowski, Fangyuan Yu, Vanshil Atul Shah, Sebastian Rogawski, Pawel Morkisz, Anahita Bhiwandiwalla, Phillip Howard

Hybrid Verified Decoding optimizes speculative decoding by dynamically selecting cache or model-based drafters based on predicted acceptance. It achieves a 2.73x speedup over EAGLE3 in agentic workflows by efficiently allocating verification resources.

arXiv

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

June 2, 2026 · Zhihong Liu, Siqi Kou, Zheng Li, Ye Ma, Quan Chen, Peng Jiang, Kai Yu, Zhijie Deng

ProductWebGen benchmarks multimodal product webpage generation, comparing editing-based and unified model approaches. It reveals trade-offs in instruction adherence and visual consistency, introducing a 1k fine-tuning dataset.

arXiv

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

June 2, 2026 · Sicheng Yang, Shulan Ruan, Shiwei Wu, Yu Liu, Lu Fan, Zhi Li, You He

PolySpeech-100 benchmarks 22 Speech-LLMs across 110 languages, revealing open-source models excel with dialects but struggle with low-resource languages. It also finds Chain-of-Thought prompting often hinders performance.

arXiv

Data Collection for Training Quality-Control AI in Carpet Manufacturing

June 2, 2026 · Akbar Erkinov

This paper proposes an inline machine-vision system for real-time carpet inspection and systematic data collection. The framework supports continuous AI training via a phased strategy, addressing quality control bottlenecks in woven-carpet manufacturing.

arXiv

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

June 2, 2026 · Zhicheng Zhang, Lei Wang, Yu Zhang, Yongsheng Gao

This paper proposes temporally-aligned evaluation using Soft Dynamic Time Warping to address flaws in rigid frame-wise metrics for audio-driven talking heads. It demonstrates that sequence-level alignment yields more robust, consistent, and fair comparisons across diverse generative methods.

arXiv

OPD+: Rethinking the Advantage Design for On-Policy Distillation

June 2, 2026 · Hanyang Zhao, Haoxian Chen, Han Lin, Genta Indra Winata, David Yao, Wenpin Tang

OPD+ corrects biased advantage estimation in on-policy distillation by removing stop-gradients, enabling diverse f-divergences. It outperforms baseline KL approaches on tool-use and reasoning benchmarks.

arXiv

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

June 2, 2026 · Longxuan Yu, Yunshu Wu, Yu Fu, Siheng Xiong, Rob Brekelmans, Hui Liu, Yue Dong, Greg Ver Steeg

DSL-LLaDA adapts an 8B masked diffusion LM for continuous denoising via lightweight training, enabling simultaneous embedding evolution. It outperforms discrete models in few-step summarization by avoiding repetition and premature termination.