Technology
Calibrating Uncertainty for Zero-Shot Adversarial CLIP
This paper introduces an adversarial fine-tuning method for CLIP that reparameterizes outputs as Dirichlet distributions to balance accuracy and uncertainty. It restores calibrated uncertainty under adversarial perturbations while preserving zero-shot generalization.
VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
VocSim is a training-free benchmark evaluating zero-shot content identity in 125k single-source audio clips. It reveals generalization gaps in low-resource speech while validating embeddings via bioacoustic and HEAR benchmarks.
Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
This study introduces TOP-R, a privacy risk where LLM agents leak sensitive data by combining non-sensitive tool outputs. The authors propose TOP-Align, a post-training method that significantly reduces leakage compared to prompt-based safeguards.
Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies
Ev-Trust introduces an evolutionarily stable trust mechanism for decentralized LLM-agent economies, using cross-validation and revenue integration to ensure cooperative stability and reduce fraud.
Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)
This study uses the TD3 reinforcement learning algorithm to control a Twin Rotor Aerodynamic System. Simulations and lab tests show it outperforms PID controllers under wind disturbances.
Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)
This study uses Soft Actor-Critic RL to control quadrotor thrust vectors instead of rotor RPMs. The approach achieves faster training and superior, smoother path-following performance.
Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism
This study compares dynamic entropy tuning in SAC against TD3 for quadcopter control. Results show dynamic entropy significantly improves performance by enhancing exploration and mitigating catastrophic forgetting.
Uncovering Competency Gaps in Large Language Models and Their Benchmarks
This study introduces an unsupervised method using sparse autoencoders to automatically detect competency gaps in LLMs and benchmarks. It reveals hidden model weaknesses and benchmark deficiencies, offering a complementary tool for refining evaluation frameworks.
MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration
MGRegBench is a novel benchmark dataset with anatomical landmarks for mammography registration, enabling standardized, reproducible comparisons of classical and deep learning methods.
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
Avatar Forcing enables real-time, interactive head avatars using diffusion forcing and label-free optimization. It achieves 500ms latency and highly expressive reactions, outperforming baselines in speed and user preference.
VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
VLM4VLA reveals that VLM general capabilities and specialized embodied skills poorly predict VLA performance. Instead, the visual module is the primary bottleneck, and adding control-relevant supervision to it yields consistent improvements.
Paradoxical noise preference in RNNs
Contrary to standard practice, continuous-time RNNs often perform best with noise during inference, as removing it biases outputs near activation nonlinearities. This effect stems from noise-induced shifts in the network's stochastic dynamics.
Safe-FedLLM: Delving into the Safety of Federated Large Language Models
Safe-FedLLM defends Federated LLMs by using lightweight classifiers to detect malicious LoRA updates. This three-tiered framework ensures robustness against attacks without compromising performance or training speed.
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
This study identifies prototypicality bias in multimodal metrics, which favor stereotypical over semantically accurate images. The PROTOBIAS benchmark exposes these flaws, highlighting the gap between automated scores and human judgment.
FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation
FastSLM uses Hierarchical Temporal Abstractor to compress long-form audio by 97% without losing context, achieving SOTA performance with fewer resources.
Hot-Start Chinese Language Modeling:Visual Glyphs Accelerate Sample-Efficient Learning
Visual glyphs accelerate early Chinese language model training but converge to similar final accuracy as token IDs. This "hot-start" effect stems from pre-encoded radical structures, offering faster alignment without enhancing ultimate model capacity.
DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion
DSA-Tokenizer disentangles speech into semantic and acoustic tokens via flow matching, enabling high-fidelity reconstruction and voice cloning. It achieves efficient, controllable generation with low error rates, proving effective for large-model speech tasks.
SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models
SilentDrift exploits VLA action chunking to launch stealthy backdoor attacks using C2-continuous perturbations. It achieves 93.2% success with <2% poisoning while maintaining high clean task performance.
MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems
MASCOT is a multi-agent framework using bi-level optimization to prevent persona collapse and sycophancy. It enhances role consistency and dialogue diversity, outperforming state-of-the-art baselines in socio-collaborative companionship.
Physics-Encoded Inverse Modeling for Arctic Snow Depth Prediction
PhysE-Inv predicts Arctic snow depth using physics-encoded inverse modeling with LSTM and contrastive learning. It outperforms baselines, reducing MSE by 24.7% and boosting parameter estimation by 17.3%.
