Global News Digest

Technology

arXiv

A Primer in Post-Training Reasoning Data: What We Know About How It Works

This primer consolidates insights from 150+ studies to structure post-training reasoning data research. It addresses data nature, effectiveness, construction, and scaling to guide future model development.

arXiv

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

This study introduces Multi-Clip Video SafetyBench, revealing that increasing video clip variety significantly boosts jailbreak success in MLLMs. It proposes leveraging image stability as a defensive strategy against these video-specific vulnerabilities.

arXiv

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE is a lightweight transformer for land-cover estimation that balances efficiency and performance. It achieves high accuracy with significantly fewer parameters and computational costs than baselines.

arXiv

How Hard Can It Be? Hardness-Aware Multi-Objective Unlearning

HAMU uses hardness-aware multi-objective optimization to guarantee forget quality improvements while minimizing retain utility loss. It identifies unavoidable trade-offs and outperforms baselines on image and text datasets.

arXiv

Variational Learning for Insertion-based Generation

The Insertion Process (IP) model learns variable-length generation and insertion order via permutation-based variational inference. It outperforms fixed-grid methods in molecular and planning tasks by adapting to non-monotonic structures.

arXiv

Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection

UE-MCM combines lightweight and large models to detect rare egocentric errors, using dynamic collaboration and specialized loss functions to handle long-tailed distributions efficiently.

arXiv

Rethinking Evaluation Paradigms in IBP-based Certified Training

This paper proposes Pareto front comparisons to fairly evaluate IBP-based certified training, revealing that prior methods were often undertuned. This approach establishes new state-of-the-art results and exposes significant performance complementarities among existing techniques.

arXiv

VLBM: Variational Latent Basis Modeling for OOD Robust Multivariate Time Series Forecasting

VLBM is a variational latent basis model that enhances OOD robustness in multivariate time series forecasting by decomposing stable dynamics from OOD deviations. It achieves state-of-the-art performance, improving MAE by 15.08% and MSE by 7.74% across diverse benchmarks.

arXiv

Multimodal Approaches for Visually-Rich Document Type Classification: A Comparative Analysis

This study compares multimodal models on the RVL-CDIP benchmark, finding specialized transformers outperform LLMs for complex documents. Visual data proves more critical than OCR for accurate classification.

arXiv

Predicting the risk of colorectal anastomotic leak based on preoperative mapping of the blood supply of the bowel

This protocol outlines an AI system using preoperative CT scans to predict colorectal anastomotic leak risk. It integrates vascular analysis with historical case retrieval to enhance surgical decision-making.

arXiv

Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages

The study introduces MIDI, a multilingual idiom dataset across varying resource levels, revealing that models struggle with literal idioms and low-resource languages. While conversational context helps, it cannot fully bridge performance gaps or overcome current model limitations.

arXiv

Order within Chaos: Capturing Intrinsic Energy Anomalies for AI-Manipulated Image Forgery Localization

FLAME localizes AI image forgeries by detecting intrinsic energy anomalies from diffusion processes, outperforming existing methods. It also introduces EditStream, an automated pipeline for continuous, instruction-based training data synthesis.

arXiv

On the Generalization in Topology Optimization via Sensitivity-Conditioned Bernoulli Flow Matching

This study proves adjoint sensitivity is the optimal conditioning signal for topology optimization generalization. It introduces pseudo-sensitivities and validates their efficacy via Bernoulli flow matching across structural and CFD benchmarks.

arXiv

Consistency Training while Mitigating Obfuscation via Rate Matching

Rate Matching Consistency Training (RMCT) mitigates obfuscation by stabilizing behavior rates rather than forcing identical outputs. This preserves monitorability while effectively reducing biases like sycophancy in language models.

arXiv

Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing

SAGC dynamically adjusts RL group sizes to mitigate stragglers, boosting wall-clock efficiency and model performance. It outperforms static baselines in training speed and reasoning benchmarks without explicit length penalties.

arXiv

FW-NKF: Frequency-Weighted Neural Kalman Filters

FW-NKF integrates spectral shaping into neural Kalman filters to suppress band-limited noise. It reduces localization error by 10% and improves orientation accuracy across diverse benchmarks.

arXiv

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations

AgentRedBench introduces a dynamic redteaming benchmark for LLM agents, revealing high vulnerability to indirect prompt injections. Its companion defense, AgentRedGuard, drastically reduces attack success rates while maintaining low false positives.

arXiv

Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

This study proposes a decoupled, two-stage training framework to resolve optimization conflicts between image- and text-based person ReID. Results show pre-training with I2I and integrating textual supervision significantly boost unified representation performance.

arXiv

CityTrajBench: A Unified Benchmark for City-Scale Vehicle Trajectory Generation

CityTrajBench standardizes city-scale trajectory generation via a unified framework and protocol. It evaluates diverse models across five dimensions, revealing distinct trade-offs in realism, fidelity, and efficiency.

arXiv

Quantitative Movement Testing: Measuring Patient Movements from a Single Smartphone Video

QMT extracts 3D kinematic biomarkers from smartphone videos, validating against motion capture. It reliably monitors chronic pain patients’ movements in home settings.