Technology
Detect Before You Leap: Mirage Detection in Vision-Language Models
This study introduces TC-LIA, a pre-release mirage detection method for VLMs that tracks image-text alignment across layers. It achieves ~94.7% accuracy, reducing mirage rates below 3% compared to 21% in baselines.
CodeCytos: AI-assisted spatial molecular imaging analysis via code-augmented agent action space
CodeCytos uses AI agents to automate and customize spatial molecular imaging analysis. It enables dynamic, code-driven exploration of complex tissue data without extensive manual input.
Pre-Deployment Robustness Stress Testing for CT Segmentation Systems Using Clinically Motivated Multi-Corruption Augmentation
RAMP, a multi-corruption augmentation framework, significantly improves CT segmentation robustness against clinical noise. It reduces performance gaps on corrupted images, ensuring safer real-world deployment.
TabChange: Precise Attribute Changes in Tabular Data
TabChange precisely edits tabular attributes by adapting to feature correlations, ensuring realistic, instance-specific changes. It outperforms baselines by generating more valid counterfactuals while preserving natural data structures.
Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning
SelSkill enables selective skill invocation via dual-granularity preference learning, significantly boosting task success and precision on benchmarks like ALFWorld and BFCL.
V-LynX: Token Interface Alignment for Video+X LLMs
V-LynX integrates new modalities into Video LLMs via a lightweight auxiliary pathway, aligning them with the model’s internal token interface. This approach achieves state-of-the-art performance without heavy encoders or paired supervision.
PaCo-VLA: Passivity-Shielded Compliance Prior for Contact-Rich Vision-Language-Action Manipulation
PaCo-VLA decouples semantic reasoning from control by using a passivity shield to regulate VLA compliance proposals, ensuring safe, high-frequency contact dynamics. This approach prevents unsafe predictions from overriding physics, achieving superior precision in contact-rich manipulation tasks.
CAFOSat: A Strongly Annotated Dataset for Infrastructure-Aware CAFO Mapping Using High-Resolution Imagery
CAFOSat is a strongly annotated dataset of 45,000 high-resolution patches for mapping US CAFOs with infrastructure details. It enhances model robustness through refined annotations and synthetic augmentation.
Interpretable Policy Distillation for Power Grid Topology Control
This study distills a deep RL power grid controller into interpretable tree models, achieving higher performance and transparency. The lightweight surrogates enable real-time deployment while revealing key operational drivers.
A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models
This study introduces a practical upper bound to estimate selection bias effects in medical prediction models using partially observable data. Validated on synthetic and real-world datasets, it offers a robust framework for assessing model generalizability in healthcare.
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
This study enhances neural algorithmic reasoning by introducing auxiliary reconstruction to improve encoder representations. This approach boosts performance on benchmarks by preserving input details and capturing intra-state feature dependencies.
Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence
This study reveals that parameter-based knowledge editing in LLMs causes reasoning collapse due to interference. Retrieval-based methods consistently outperform editing, highlighting the need to preserve core model capabilities.
On the Difficulty of Learning a Meta-network for Training Data Selection
This study identifies low gradient signal-to-noise ratio and poor feature correlation as key hurdles in meta-network training data selection. Increasing batch size and using informative features significantly improve performance across benchmarks.
Improving Visual Representation Alignment Generation with GRPO
VRPO replaces static alignment with reinforcement learning, boosting diffusion transformer training efficiency and image fidelity. It achieves 1.8x FID improvement and 2.3x faster convergence than REPA with minimal overhead.
Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback
Critic-R improves agentic search via a critic model providing natural language feedback. It refines queries at inference and optimizes retrievers using automatic supervision, boosting accuracy without manual annotations.
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
SPADER enhances multi-answer QA via step-wise peer advantage and diversity-aware rewards. It outperforms existing methods in recall and F1 on benchmarks like QAMPARI.
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
CARE-RL mitigates cross-domain conflicts in RL via protocol-aware rewards and capability-aware optimization. It outperforms baselines on Qwen2.5-7B and Qwen3-4B benchmarks.
MemGraphRAG: Memory-based Multi-Agent System for Graph Retrieval-Augmented Generation
MemGraphRAG uses a memory-based multi-agent system to build coherent knowledge graphs, resolving fragmentation issues in GraphRAG. It outperforms state-of-the-art baselines in retrieval accuracy and efficiency.
MemPro: Agentic Memory Systems as Evolvable Programs
MemPro treats agentic memory as evolvable code, using an agent to iteratively refine the entire retrieval pipeline. It outperforms static baselines by adapting to failures across diverse benchmarks.
Authenticity Debt and the Synthetic Content Threat Landscape: A Layered Framework for Trust, Provenance, and IP Governance in the Generative AI Era
The paper defines "authenticity debt" from unverified AI content and proposes a layered framework combining cryptographic provenance, human verification, and governance to mitigate risks and ensure trust.