Global News Digest

Technology

arXiv

TuneAgent: Agentic Operating System Kernel Tuning with Reinforcement Learning

TuneAgent uses RL-driven LLMs to autonomously tune Linux kernels, achieving up to 5.6% performance gains. It ensures valid configurations via structured rewards and a two-phase training approach.

arXiv

Language-Native Materials Processing Design by Lightly Structured Text Database and Reasoning Large Language Model

This framework optimizes materials synthesis by using lightly structured text and reasoning LLMs to extract procedural logic from unstructured data. It successfully streamlined boron nitride nanosheet production, reducing trial-and-error cycles through iterative, evidence-based protocol refinement.

arXiv

Position: Beyond Sensitive Attributes, ML Fairness Should Quantify Structural Injustice via Social Determinants

This paper argues ML fairness must quantify structural injustice via social determinants, not just sensitive attributes. Auditing these determinants before mitigation prevents new injustices and addresses systemic inequities.

arXiv

Towards a Physics Foundation Model

The General Physics Transformer (GPhyT) demonstrates that a single model can simulate diverse physical phenomena, outperforming specialized solvers and enabling zero-shot generalization. This work establishes a viable foundation for universal Physics Foundation Models.

arXiv

Deep Learning as the Disciplined Construction of Tame Objects

This paper uses tame geometry to provide convergence guarantees for stochastic gradient descent in nonsmooth, nonconvex deep learning. It frames deep learning models as compositions of tame functions, offering a rigorous mathematical framework for AI analysis.

arXiv

T-POP: Test-Time Personalization with Online Preference Feedback

T-POP enables real-time LLM personalization by learning user preferences via online feedback and dueling bandits, without updating model parameters. It effectively solves the cold-start problem, outperforming existing baselines with rapid, data-efficient adaptation.

arXiv

End-to-End Deep Learning for Predicting Metric Space-Valued Outputs

E2M predicts metric space-valued outputs via weighted Fréchet means, preserving intrinsic geometry without surrogate embeddings. It achieves state-of-the-art results on diverse structured data, including networks and distributions.

arXiv

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound

v-HUB is a new benchmark for evaluating multimodal LLMs on video humor understanding. It reveals that audio cues significantly aid models in comprehending humor compared to visual-only inputs.

arXiv

Distillation of Large Language Models via Concrete Score Matching

Concrete Score Distillation (CSD) improves LLM distillation by aligning relative logit differences, overcoming softmax blurring and shift invariance limits. It consistently outperforms recent methods in fidelity and diversity across various benchmarks.

arXiv

Make a Video Call with LLM: A Measurement Campaign over Six Mainstream Apps

This study benchmarks six LLM video chat apps across quality, latency, and overhead, revealing that AI capabilities, not network latency, primarily drive user experience.

arXiv

Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards

MAHALO aligns LLMs across verifiable and subjective rewards using PRM-guided decoding and multi-action heads, enabling concurrent optimization with minimal interference and flexible user control.

arXiv

Verifying Meta-Awareness via Predictive Rewards in Reasoning Models

MAPR enhances reasoning models by predicting rollout statistics to optimize processing, boosting accuracy by 83.18% on AIME25 and accelerating training by 1.28x.

arXiv

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization

MADPO uses a reward model to adaptively weight DPO loss per sample, improving granular control over heterogeneous preference data. It outperforms existing methods by stabilizing training and preserving valuable signals.

arXiv

Domain-Shift-Aware Conformal Prediction for Large Language Models

DS-CP adapts conformal prediction for LLMs under domain shifts by weighting calibration samples based on test prompt proximity. It ensures reliable coverage and computational efficiency, enhancing trustworthy uncertainty quantification.

arXiv

HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering

HRTFformer is a transformer-based model that upsamples sparse HRTF data using spherical harmonics and attention mechanisms. It outperforms existing methods in accuracy and perceptual realism for immersive audio rendering.

arXiv

Value Flows

Value Flows uses flow-based models to estimate full return distributions, improving decision-making by quantifying state uncertainty. It achieves a 1.3x success rate boost across 62 benchmark tasks.

arXiv

StreamingVLM: Real-Time Understanding for Infinite Video Streams

StreamingVLM enables real-time comprehension of infinite video streams via efficient KV caching and SFT. It achieves 8 FPS on H100, outperforming GPT-4O mini and boosting general VQA capabilities.

arXiv

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

SHERLOCK integrates domain knowledge with LLMs to automate e-commerce fraud detection. It boosts investigation throughput by 386.7% and maintains accuracy via a self-evolving data flywheel.

arXiv

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

This study reveals that current RL benchmarks fail to distinguish genuine progress due to data leakage, hiding poor generalization. It proposes new principles for robust evaluation to accurately assess RL methods.

arXiv

Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization

Non-Transferable Examples (NTEs) recode data into model-specific subspaces, enabling authorized models to access information while blocking unauthorized ones. This training-free method ensures purpose limitation without relying on data perturbation or controlled training processes.