Technology
BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
BitsMoE uses spectral energy-guided bit allocation to efficiently quantize MoE LLMs, preserving accuracy in ultra-low-bit scenarios. It outperforms GPTQ by 27.83% accuracy and 12.3x faster quantization.
Flow-Based Generative Modeling for Optimizing Sampling Policies in Compressed Sensing Applications
This study introduces a flow-based generative framework to optimize subsampling in compressed sensing. It achieves state-of-the-art reconstruction performance in image and MRI tasks with minimal computational cost.
DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions
DAStatFormer integrates statistical features into a hybrid Transformer for efficient DAS pattern recognition. It achieves 99.4% accuracy with lower computational costs than existing models.
Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems
Planktonzilla-17M is a massive, unified plankton dataset enabling superior supervised classification over CLIP-style models. It addresses generalization issues in marine imaging through standardized, large-scale data.
From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models
The paper introduces Demo2Reward, a test-time adaptation method that optimizes VLM reward models using minimal expert demonstrations. This approach significantly reduces false positives and enhances policy learning in robotics without requiring additional model training.
Hoeffding Concept Bottleneck Models with Applications to Overhead Images
Hoeffding Concept Bottleneck Models (HCBM) use non-linear, sparse aggregation to improve interpretability and performance. It outperforms linear CBMs in overhead image analysis and object detection.
Can Predicted Dynamics Exist in the Physical World?
This study defines physical admissibility to filter AI-generated actions using kinematic and dynamic checks. Experiments show this gate effectively blocks invalid proposals while maintaining high task progress.
Structured Visual Evidence Decomposition for Evidence-Grounded Multimodal Screening of Obstructive Sleep Apnea-Hypopnea Syndrome
EviOSAHS decomposes facial images into structured evidence cards, merging them with clinical data for accurate OSAHS screening. It achieved 88.47% accuracy and 94.86% sensitivity in a 642-subject study.
SentimentLens: Reconciling Sentiment and Ratings via Dual-Modality in the Hospitality Sector
SentimentLens reconciles textual sentiment with numerical ratings using dual-modality analysis on 10,000+ hotel reviews. It transforms unstructured feedback into actionable insights for hospitality management.
Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization
This study integrates cellular sheaves with attention-based multiple instance learning to improve weakly-supervised pathology localization. The method significantly boosts patch-level AUC to 0.940 and attention performance to 0.953 on Camelyon16.
Silent Failures in Physical AI: A Literature Review of Runtime Action Authorization for Autonomous Systems
This review identifies a critical gap in runtime authorization for Physical AI, where silent failures occur despite model confidence. It proposes a taxonomy of guardrails and evaluation metrics to bridge safety and capability tracks.
Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry
MIND explicitly models data manifold geometry via discrete patch tokenization in diffusion models, achieving superior FID scores on ImageNet. It outperforms baselines like DiT and LlamaGen with significantly fewer parameters.
Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation
The Hierarchical Semantic-Geometric Map bridges the 2D-3D gap in Vision-Language Navigation by aligning 3D geometry with VLMs. This zero-shot framework improves navigation by separating semantic planning from low-level path execution.
Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents
This study identifies "tool-use collapse" in visual agents, showing that prioritizing trajectory diversity over tool frequency improves reasoning. An entropy regularization method enhances performance despite reduced tool usage.
DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models
DLLM-JEPA combines JEPAs with masked diffusion models, cutting training costs by 33% and boosting accuracy on benchmarks like GSM8K. It eliminates the need for parallel data pairs while outperforming diffusion-only fine-tuning.
PEACE: A Planner-Executor Agent with Constraint Enforcement for UAVs
PEACE is a UAV planner-executor agent separating LLM planning from ROS 2 execution, ensuring constraint adherence and explainability. Validated via PX4 simulations, it reduces LLM calls while enhancing safety and recovery capabilities.
CoCoVideo: The High-Quality Commercial-Model-Based Contrastive Benchmark for AI-Generated Video Detection
CoCoVideo introduces a high-quality commercial-based dataset and a contrastive detection framework using MLLMs. It achieves state-of-the-art performance in identifying realistic AI-generated video forgeries.
CoilDrop-MRI: Self-supervised physics-guided MRI reconstruction with coil dropout
CoilDrop-MRI is a self-supervised, physics-guided MRI reconstruction method using coil dropout. It outperforms existing techniques, achieving supervised-level quality without fully sampled training data.
Visual-Noise Guided In-Context Distillation for Multimodal Large Language Model Unlearning
VGID unlearns sensitive data in MLLMs via visual-noise guided in-context distillation, achieving robust parameter-level unlearning without external teachers or retraining.
Motif-based morphology signatures for interpretable ECG screening and monitoring
This study introduces motif-based ECG signatures to quantify morphological drift, enabling interpretable screening. Results show these metrics effectively differentiate normal from abnormal cardiac rhythms across short- and long-term datasets.