Global News Digest

Technology

arXiv

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

BitsMoE uses spectral energy-guided bit allocation to efficiently quantize MoE LLMs, preserving accuracy in ultra-low-bit scenarios. It outperforms GPTQ by 27.83% accuracy and 12.3x faster quantization.

arXiv

Flow-Based Generative Modeling for Optimizing Sampling Policies in Compressed Sensing Applications

This study introduces a flow-based generative framework to optimize subsampling in compressed sensing. It achieves state-of-the-art reconstruction performance in image and MRI tasks with minimal computational cost.

arXiv

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions

DAStatFormer integrates statistical features into a hybrid Transformer for efficient DAS pattern recognition. It achieves 99.4% accuracy with lower computational costs than existing models.

arXiv

Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems

Planktonzilla-17M is a massive, unified plankton dataset enabling superior supervised classification over CLIP-style models. It addresses generalization issues in marine imaging through standardized, large-scale data.

arXiv

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

The paper introduces Demo2Reward, a test-time adaptation method that optimizes VLM reward models using minimal expert demonstrations. This approach significantly reduces false positives and enhances policy learning in robotics without requiring additional model training.

arXiv

Hoeffding Concept Bottleneck Models with Applications to Overhead Images

Hoeffding Concept Bottleneck Models (HCBM) use non-linear, sparse aggregation to improve interpretability and performance. It outperforms linear CBMs in overhead image analysis and object detection.

arXiv

Can Predicted Dynamics Exist in the Physical World?

This study defines physical admissibility to filter AI-generated actions using kinematic and dynamic checks. Experiments show this gate effectively blocks invalid proposals while maintaining high task progress.

arXiv

Structured Visual Evidence Decomposition for Evidence-Grounded Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome

EviOSAHS decomposes facial images into structured evidence cards, merging them with clinical data for accurate OSAHS screening. It achieved 88.47% accuracy and 94.86% sensitivity in a 642-subject study.

arXiv

SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector

SentimentLens reconciles textual sentiment with numerical ratings using dual-modality analysis on 10,000+ hotel reviews. It transforms unstructured feedback into actionable insights for hospitality management.

arXiv

Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

This study integrates cellular sheaves with attention-based multiple instance learning to improve weakly-supervised pathology localization. The method significantly boosts patch-level AUC to 0.940 and attention performance to 0.953 on Camelyon16.

arXiv

Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems

This review identifies a critical gap in runtime authorization for Physical AI, where silent failures occur despite model confidence. It proposes a taxonomy of guardrails and evaluation metrics to bridge safety and capability tracks.

arXiv

Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry

MIND explicitly models data manifold geometry via discrete patch tokenization in diffusion models, achieving superior FID scores on ImageNet. It outperforms baselines like DiT and LlamaGen with significantly fewer parameters.

arXiv

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation

The Hierarchical Semantic-Geometric Map bridges the 2D-3D gap in Vision-Language Navigation by aligning 3D geometry with VLMs. This zero-shot framework improves navigation by separating semantic planning from low-level path execution.

arXiv

Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents

This study identifies "tool-use collapse" in visual agents, showing that prioritizing trajectory diversity over tool frequency improves reasoning. An entropy regularization method enhances performance despite reduced tool usage.

arXiv

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

DLLM-JEPA combines JEPAs with masked diffusion models, cutting training costs by 33% and boosting accuracy on benchmarks like GSM8K. It eliminates the need for parallel data pairs while outperforming diffusion-only fine-tuning.

arXiv

PEACE: A Planner-Executor Agent with Constraint Enforcement for UAVs

PEACE is a UAV planner-executor agent separating LLM planning from ROS 2 execution, ensuring constraint adherence and explainability. Validated via PX4 simulations, it reduces LLM calls while enhancing safety and recovery capabilities.

arXiv

CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection

CoCoVideo introduces a high-quality commercial-based dataset and a contrastive detection framework using MLLMs. It achieves state-of-the-art performance in identifying realistic AI-generated video forgeries.

arXiv

CoilDrop-MRI: Self-supervised physics-guided MRI reconstruction with coil dropout

CoilDrop-MRI is a self-supervised, physics-guided MRI reconstruction method using coil dropout. It outperforms existing techniques, achieving supervised-level quality without fully sampled training data.

arXiv

Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning

VGID unlearns sensitive data in MLLMs via visual-noise guided in-context distillation, achieving robust parameter-level unlearning without external teachers or retraining.

arXiv

Motif-based morphology signatures for interpretable ECG screening and monitoring

This study introduces motif-based ECG signatures to quantify morphological drift, enabling interpretable screening. Results show these metrics effectively differentiate normal from abnormal cardiac rhythms across short- and long-term datasets.