Technology

Calibrating Uncertainty for Zero-Shot Adversarial CLIP
arXiv

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

This paper introduces an adversarial fine-tuning method for CLIP that reparameterizes outputs as Dirichlet distributions to balance accuracy and uncertainty. It restores calibrated uncertainty under adversarial perturbations while preserving zero-shot generalization.

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
arXiv

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

VocSim is a training-free benchmark evaluating zero-shot content identity in 125k single-source audio clips. It reveals generalization gaps in low-resource speech while validating embeddings via bioacoustic and HEAR benchmarks.

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
arXiv

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

This study introduces TOP-R, a privacy risk where LLM agents leak sensitive data by combining non-sensitive tool outputs. The authors propose TOP-Align, a post-training method that significantly reduces leakage compared to prompt-based safeguards.

Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies
arXiv

Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies

Ev-Trust introduces an evolutionarily stable trust mechanism for decentralized LLM-agent economies, using cross-validation and revenue integration to ensure cooperative stability and reduce fraud.

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)
arXiv

Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)

This study uses the TD3 reinforcement learning algorithm to control a Twin Rotor Aerodynamic System. Simulations and lab tests show it outperforms PID controllers under wind disturbances.

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)
arXiv

Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)

This study uses Soft Actor-Critic RL to control quadrotor thrust vectors instead of rotor RPMs. The approach achieves faster training and superior, smoother path-following performance.

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism
arXiv

Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism

This study compares dynamic entropy tuning in SAC against TD3 for quadcopter control. Results show dynamic entropy significantly improves performance by enhancing exploration and mitigating catastrophic forgetting.

Uncovering Competency Gaps in Large Language Models and Their Benchmarks
arXiv

Uncovering Competency Gaps in Large Language Models and Their Benchmarks

This study introduces an unsupervised method using sparse autoencoders to automatically detect competency gaps in LLMs and benchmarks. It reveals hidden model weaknesses and benchmark deficiencies, offering a complementary tool for refining evaluation frameworks.

MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration
arXiv

MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration

MGRegBench is a novel benchmark dataset with anatomical landmarks for mammography registration, enabling standardized, reproducible comparisons of classical and deep learning methods.

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
arXiv

Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

Avatar Forcing enables real-time, interactive head avatars using diffusion forcing and label-free optimization. It achieves 500ms latency and highly expressive reactions, outperforming baselines in speed and user preference.

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
arXiv

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models

VLM4VLA reveals that VLM general capabilities and specialized embodied skills poorly predict VLA performance. Instead, the visual module is the primary bottleneck, and adding control-relevant supervision to it yields consistent improvements.

Paradoxical noise preference in RNNs
arXiv

Paradoxical noise preference in RNNs

Contrary to standard practice, continuous-time RNNs often perform best with noise during inference, as removing it biases outputs near activation nonlinearities. This effect stems from noise-induced shifts in the network's stochastic dynamics.

Safe-FedLLM: Delving into the Safety of Federated Large Language Models
arXiv

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Safe-FedLLM defends Federated LLMs by using lightweight classifiers to detect malicious LoRA updates. This three-tiered framework ensures robustness against attacks without compromising performance or training speed.

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
arXiv

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics

This study identifies prototypicality bias in multimodal metrics, which favor stereotypical over semantically accurate images. The PROTOBIAS benchmark exposes these flaws, highlighting the gap between automated scores and human judgment.

FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation
arXiv

FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation

FastSLM uses Hierarchical Temporal Abstractor to compress long-form audio by 97% without losing context, achieving SOTA performance with fewer resources.

Hot-Start Chinese Language Modeling:Visual Glyphs Accelerate Sample-Efficient Learning
arXiv

Hot-Start Chinese Language Modeling:Visual Glyphs Accelerate Sample-Efficient Learning

Visual glyphs accelerate early Chinese language model training but converge to similar final accuracy as token IDs. This "hot-start" effect stems from pre-encoded radical structures, offering faster alignment without enhancing ultimate model capacity.

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion
arXiv

DSA-Tokenizer: Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

DSA-Tokenizer disentangles speech into semantic and acoustic tokens via flow matching, enabling high-fidelity reconstruction and voice cloning. It achieves efficient, controllable generation with low error rates, proving effective for large-model speech tasks.

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models
arXiv

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

SilentDrift exploits VLA action chunking to launch stealthy backdoor attacks using C2-continuous perturbations. It achieves 93.2% success with <2% poisoning while maintaining high clean task performance.

MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems
arXiv

MASCOT: Towards Multi-Agent Socio-Collaborative Companion Systems

MASCOT is a multi-agent framework using bi-level optimization to prevent persona collapse and sycophancy. It enhances role consistency and dialogue diversity, outperforming state-of-the-art baselines in socio-collaborative companionship.

Physics-Encoded Inverse Modeling for Arctic Snow Depth Prediction
arXiv

Physics-Encoded Inverse Modeling for Arctic Snow Depth Prediction

PhysE-Inv predicts Arctic snow depth using physics-encoded inverse modeling with LSTM and contrastive learning. It outperforms baselines, reducing MSE by 24.7% and boosting parameter estimation by 17.3%.