Technology
A Primer in Post-Training Reasoning Data: What We Know About How It Works
This primer consolidates insights from 150+ studies to structure post-training reasoning data research. It addresses data nature, effectiveness, construction, and scaling to guide future model development.
Jailbreaking Multimodal Large Language Models using Multi-Clip Video
This study introduces Multi-Clip Video SafetyBench, revealing that increasing video clip variety significantly boosts jailbreak success in MLLMs. It proposes leveraging image stability as a defensive strategy against these video-specific vulnerabilities.
LALE: Lightweight-Transformer Architecture for Land-Cover Estimation
LALE is a lightweight transformer for land-cover estimation that balances efficiency and performance. It achieves high accuracy with significantly fewer parameters and computational costs than baselines.
How Hard Can It Be? Hardness-Aware Multi-Objective Unlearning
HAMU uses hardness-aware multi-objective optimization to guarantee forget quality improvements while minimizing retain utility loss. It identifies unavoidable trade-offs and outperforms baselines on image and text datasets.
Variational Learning for Insertion-based Generation
The Insertion Process (IP) model learns variable-length generation and insertion order via permutation-based variational inference. It outperforms fixed-grid methods in molecular and planning tasks by adapting to non-monotonic structures.
Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
UE-MCM combines lightweight and large models to detect rare egocentric errors, using dynamic collaboration and specialized loss functions to handle long-tailed distributions efficiently.
Rethinking Evaluation Paradigms in IBP-based Certified Training
This paper proposes Pareto front comparisons to fairly evaluate IBP-based certified training, revealing that prior methods were often undertuned. This approach establishes new state-of-the-art results and exposes significant performance complementarities among existing techniques.
VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting
VLBM is a variational latent basis model that enhances OOD robustness in multivariate time series forecasting by decomposing stable dynamics from OOD deviations. It achieves state-of-the-art performance, improving MAE by 15.08% and MSE by 7.74% across diverse benchmarks.
Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis
This study compares multimodal models on the RVL-CDIP benchmark, finding specialized transformers outperform LLMs for complex documents. Visual data proves more critical than OCR for accurate classification.
Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel
This protocol outlines an AI system using preoperative CT scans to predict colorectal anastomotic leak risk. It integrates vascular analysis with historical case retrieval to enhance surgical decision-making.
Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
The study introduces MIDI, a multilingual idiom dataset across varying resource levels, revealing that models struggle with literal idioms and low-resource languages. While conversational context helps, it cannot fully bridge performance gaps or overcome current model limitations.
Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization
FLAME localizes AI image forgeries by detecting intrinsic energy anomalies from diffusion processes, outperforming existing methods. It also introduces EditStream, an automated pipeline for continuous, instruction-based training data synthesis.
On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching
This study proves adjoint sensitivity is the optimal conditioning signal for topology optimization generalization. It introduces pseudo-sensitivities and validates their efficacy via Bernoulli flow matching across structural and CFD benchmarks.
Consistency Training while Mitigating Obfuscation via Rate Matching
Rate Matching Consistency Training (RMCT) mitigates obfuscation by stabilizing behavior rates rather than forcing identical outputs. This preserves monitorability while effectively reducing biases like sycophancy in language models.
Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
SAGC dynamically adjusts RL group sizes to mitigate stragglers, boosting wall-clock efficiency and model performance. It outperforms static baselines in training speed and reasoning benchmarks without explicit length penalties.
FW-NKF: Frequency-Weighted Neural Kalman Filters
FW-NKF integrates spectral shaping into neural Kalman filters to suppress band-limited noise. It reduces localization error by 10% and improves orientation accuracy across diverse benchmarks.
AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations
AgentRedBench introduces a dynamic redteaming benchmark for LLM agents, revealing high vulnerability to indirect prompt injections. Its companion defense, AgentRedGuard, drastically reduces attack success rates while maintaining low false positives.
Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification
This study proposes a decoupled, two-stage training framework to resolve optimization conflicts between image- and text-based person ReID. Results show pre-training with I2I and integrating textual supervision significantly boost unified representation performance.
CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation
CityTrajBench standardizes city-scale trajectory generation via a unified framework and protocol. It evaluates diverse models across five dimensions, revealing distinct trade-offs in realism, fidelity, and efficiency.
Quantitative Movement Testing: Measuring Patient Movements from a Single Smartphone Video
QMT extracts 3D kinematic biomarkers from smartphone videos, validating against motion capture. It reliably monitors chronic pain patients’ movements in home settings.