Technology
MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models
MLLM-Microscope analyzes MLLM internal representations, revealing that fusion techniques significantly impact embedding linearity and dimensionality. These insights guide future model architecture design and optimization.
Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems
SkillVetBench benchmarks security in open agentic ecosystems via semantic vetting and sandbox execution. It reveals static methods' inadequacy and proves runtime verification detects hidden malicious intent.
CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences
CV-Arena is a benchmark for instructional computer vision, using 12k pairs and Active Elo to evaluate human-AI collaborative preferences. It reveals significant shortcomings in current systems regarding instruction adherence and physical reasoning.
Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink
Mamba-2’s state sink splits into causal execution heads and non-causal detection heads. Single-bucket probes miss the critical execution layer, revealing that representational similarity does not guarantee functional importance.
Explainable deep reinforcement learning reveals energy-efficient control strategies for turbulent drag reduction
This study uses explainable deep reinforcement learning to achieve 34.44% drag reduction and 34.01% net energy savings. The optimal strategy combines SHAP attributions for skin-friction and pressure, outperforming baselines with minimal actuation cost.
Silent Failures in Federated Personalization of Foundation Models
This paper introduces "Silent Failures" in federated foundation model personalization, where privacy obscures trustworthiness issues. It proposes a taxonomy and advocates for privacy-preserving behavioral evaluation to detect these hidden risks.
SS-ZKR: Spatial-Semantic Zero-Knowledge Routing for Privacy-Preserving Multi-Agent Collaboration
SS-ZKR enables privacy-preserving multi-agent routing by using zero-knowledge proofs to hide payload content from intermediaries. This allows compliant cross-organizational collaboration in regulated sectors without decrypting sensitive data.
An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation
This paper introduces MTRS, a new task for segmenting temporal changes, and MTRefSeg-21K, a 21K-sample benchmark. It proposes MTRefSeg-R1, a specialized LVLM framework that outperforms existing baselines in multi-temporal referring segmentation.
Lodestar: An Online-Learning LLM Inference Router
Lodestar is a learning-based router that optimizes LLM inference by predicting optimal GPU assignments to minimize latency. It outperforms existing heuristics, reducing TTFT by up to 2.15x in specialized environments.
Cross-Axis Feature Fusion with Joint-Wise Motion Difference Prediction for Text-Based 3D Human Motion Editing
This study proposes a cross-axis feature fusion model with joint-wise motion difference prediction for text-based 3D human motion editing. It achieves state-of-the-art semantic alignment and fidelity on the MotionFix dataset.
Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference
TACG optimizes MoE inference via task-aware expert grouping, while GESR ensures robustness through selective replication. Together, they significantly reduce communication costs and improve load balancing in multi-task environments.
FVSpec: Real-World Property-Based Tests as Lean Challenges
FVSpec converts 2,772 real-world Python property-based tests into 9,415 Lean 4 specifications using an LLM pipeline. This open-source benchmark assesses AI capabilities in formal software verification.
AI-IoT-Robotics Integration: Survey of Frameworks, Emerging Trends, and the Path Toward Connected Robotics
This survey proposes a modular architecture integrating AI, IoT, and robotics using hybrid SLMs and LLMs. It outlines a blueprint for adaptive, connected robotic ecosystems addressing current interoperability and scalability challenges.
Hybrid Verified Decoding: Learning to Allocate Verification in Speculative Decoding
Hybrid Verified Decoding optimizes speculative decoding by dynamically selecting cache or model-based drafters based on predicted acceptance. It achieves a 2.73x speedup over EAGLE3 in agentic workflows by efficiently allocating verification resources.
ProductWebGen: Benchmarking Multimodal Product Webpage Generation
ProductWebGen benchmarks multimodal product webpage generation, comparing editing-based and unified model approaches. It reveals trade-offs in instruction adherence and visual consistency, introducing a 1k fine-tuning dataset.
PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
PolySpeech-100 benchmarks 22 Speech-LLMs across 110 languages, revealing open-source models excel with dialects but struggle with low-resource languages. It also finds Chain-of-Thought prompting often hinders performance.
Data Collection for Training Quality-Control AI in Carpet Manufacturing
This paper proposes an inline machine-vision system for real-time carpet inspection and systematic data collection. The framework supports continuous AI training via a phased strategy, addressing quality control bottlenecks in woven-carpet manufacturing.
Temporally-Aligned Evaluation for Audio-Driven Talking Head Generation
This paper proposes temporally-aligned evaluation using Soft Dynamic Time Warping to address flaws in rigid frame-wise metrics for audio-driven talking heads. It demonstrates that sequence-level alignment yields more robust, consistent, and fair comparisons across diverse generative methods.
OPD+: Rethinking the Advantage Design for On-Policy Distillation
OPD+ corrects biased advantage estimation in on-policy distillation by removing stop-gradients, enabling diverse f-divergences. It outperforms baseline KL approaches on tool-use and reasoning benchmarks.
DSL-LLaDA: Scaling Continuous Denoising to 8B Masked Diffusion LMs
DSL-LLaDA adapts an 8B masked diffusion LM for continuous denoising via lightweight training, enabling simultaneous embedding evolution. It outperforms discrete models in few-step summarization by avoiding repetition and premature termination.