Global News Digest

Technology

arXiv

MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models

MLLM-Microscope analyzes MLLM internal representations, revealing that fusion techniques significantly impact embedding linearity and dimensionality. These insights guide future model architecture design and optimization.

arXiv

Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems

SkillVetBench benchmarks security in open agentic ecosystems via semantic vetting and sandbox execution. It reveals static methods' inadequacy and proves runtime verification detects hidden malicious intent.

arXiv

CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences

CV-Arena is a benchmark for instructional computer vision, using 12k pairs and Active Elo to evaluate human-AI collaborative preferences. It reveals significant shortcomings in current systems regarding instruction adherence and physical reasoning.

arXiv

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

Mamba-2’s state sink splits into causal execution heads and non-causal detection heads. Single-bucket probes miss the critical execution layer, revealing that representational similarity does not guarantee functional importance.

arXiv

Explainable deep reinforcement learning reveals energy-efficient control strategies for turbulent drag reduction

This study uses explainable deep reinforcement learning to achieve 34.44% drag reduction and 34.01% net energy savings. The optimal strategy combines SHAP attributions for skin-friction and pressure, outperforming baselines with minimal actuation cost.

arXiv

Silent Failures in Federated Personalization of Foundation Models

This paper introduces "Silent Failures" in federated foundation model personalization, where privacy obscures trustworthiness issues. It proposes a taxonomy and advocates for privacy-preserving behavioral evaluation to detect these hidden risks.

arXiv

SS-ZKR: Spatial-Semantic Zero-Knowledge Routing for Privacy-Preserving Multi-Agent Collaboration

SS-ZKR enables privacy-preserving multi-agent routing by using zero-knowledge proofs to hide payload content from intermediaries. This allows compliant cross-organizational collaboration in regulated sectors without decrypting sensitive data.

arXiv

An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation

This paper introduces MTRS, a new task for segmenting temporal changes, and MTRefSeg-21K, a 21K-sample benchmark. It proposes MTRefSeg-R1, a specialized LVLM framework that outperforms existing baselines in multi-temporal referring segmentation.

arXiv

Lodestar: An Online-Learning LLM Inference Router

Lodestar is a learning-based router that optimizes LLM inference by predicting optimal GPU assignments to minimize latency. It outperforms existing heuristics, reducing TTFT by up to 2.15x in specialized environments.

arXiv

Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing

This study proposes a cross-axis feature fusion model with joint-wise motion difference prediction for text-based 3D human motion editing. It achieves state-of-the-art semantic alignment and fidelity on the MotionFix dataset.

arXiv

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

TACG optimizes MoE inference via task-aware expert grouping, while GESR ensures robustness through selective replication. Together, they significantly reduce communication costs and improve load balancing in multi-task environments.

arXiv

FVSpec: Real-World Property-Based Tests as Lean Challenges

FVSpec converts 2,772 real-world Python property-based tests into 9,415 Lean 4 specifications using an LLM pipeline. This open-source benchmark assesses AI capabilities in formal software verification.

arXiv

AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics

This survey proposes a modular architecture integrating AI, IoT, and robotics using hybrid SLMs and LLMs. It outlines a blueprint for adaptive, connected robotic ecosystems addressing current interoperability and scalability challenges.

arXiv

Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding

Hybrid Verified Decoding optimizes speculative decoding by dynamically selecting cache or model-based drafters based on predicted acceptance. It achieves a 2.73x speedup over EAGLE3 in agentic workflows by efficiently allocating verification resources.

arXiv

ProductWebGen: Benchmarking Multimodal Product Webpage Generation

ProductWebGen benchmarks multimodal product webpage generation, comparing editing-based and unified model approaches. It reveals trade-offs in instruction adherence and visual consistency, introducing a 1k fine-tuning dataset.

arXiv

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

PolySpeech-100 benchmarks 22 Speech-LLMs across 110 languages, revealing open-source models excel with dialects but struggle with low-resource languages. It also finds Chain-of-Thought prompting often hinders performance.

arXiv

Data Collection for Training Quality-Control AI in Carpet Manufacturing

This paper proposes an inline machine-vision system for real-time carpet inspection and systematic data collection. The framework supports continuous AI training via a phased strategy, addressing quality control bottlenecks in woven-carpet manufacturing.

arXiv

Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation

This paper proposes temporally-aligned evaluation using Soft Dynamic Time Warping to address flaws in rigid frame-wise metrics for audio-driven talking heads. It demonstrates that sequence-level alignment yields more robust, consistent, and fair comparisons across diverse generative methods.

arXiv

OPD+: Rethinking the Advantage Design for On-Policy Distillation

OPD+ corrects biased advantage estimation in on-policy distillation by removing stop-gradients, enabling diverse f-divergences. It outperforms baseline KL approaches on tool-use and reasoning benchmarks.

arXiv

DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs

DSL-LLaDA adapts an 8B masked diffusion LM for continuous denoising via lightweight training, enabling simultaneous embedding evolution. It outperforms discrete models in few-step summarization by avoiding repetition and premature termination.