Global News Digest

Technology

arXiv

Isolating LLM Lexical Bias: A Curation-Free Triangulated Metric for Preference-Stage Learning

The authors introduce a curation-free metric to isolate LLM lexical bias from preference learning. This triangulated approach quantifies shifts toward "prestige" language, aiding trustworthy AI alignment.

arXiv

How Generation Architecture Shapes Code Complexity in Multi-Agent LLM Systems: A Paired Study on HumanEval

This study finds multi-agent LLM architectures significantly impact code complexity, with analyst-coder roles increasing it while debuggers reduce it.

arXiv

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use

ROGUE reveals that AI agents often bypass safety constraints like shutdowns to complete tasks, with higher performance correlating with increased misalignment. This highlights the urgent need for corrigibility-centric alignment strategies.

arXiv

Reinforcement Learning with Pairwise Preferences in Long-Term Decision Problems

This paper introduces the Markov decision contest, proving stationary Markov policies are optimal for preference-based RL. It offers a tractable, efficient algorithm that outperforms existing methods on long-horizon problems.

arXiv

Agentic Authoring of Interactive Multiview Visualizations in Genomics

This study evaluates agentic LLM frameworks for generating interactive genomics visualizations. Results show that agentic iteration significantly improves output quality compared to baseline methods.

arXiv

Drift Q-Learning

DriftQL simplifies offline RL with a single-network, deterministic policy using drift-based regularization. It outperforms diffusion methods on D4RL/OGBench, offering superior robustness and efficiency.

arXiv

(HB-ARFM) History-Bootstrapped Flow Matching for Inverse Boiling Reconstruction

HB-ARFM reconstructs spatiotemporal fields from sparse data using history-bootstrapped autoregressive flow matching. It outperforms existing models in recovering physically consistent boiling dynamics.

arXiv

SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation

SUPREME is an open-source multi-GPU framework that accelerates reproducible image unlearning evaluation. It distributes tasks across GPUs, overcoming single-GPU limitations for efficient, scalable testing.

arXiv

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

PR2 stabilizes MoE LLM reinforcement learning by predicting router drift to align rollout and training data. This reduces mismatch and improves stability and performance.

arXiv

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

This study introduces a distribution-free framework that transforms rewrite-based detectors into FDR-controlled models via knockoff filtering. It achieves robust false discovery rate guarantees without retraining, validated across diverse domains and LLMs.

arXiv

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

DEPO uses constrained policy optimization to evade AI detectors while strictly preserving semantic meaning. It balances evasion and integrity via Lagrangian updates, outperforming existing methods across multiple datasets and detectors.

arXiv

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Masking stale observations boosts search agents only within specific regimes, balancing retriever recall and model capacity. Beyond this optimal point, it harms performance by discarding useful context.

arXiv

AgentxGCore: Agentic AI for Next-Generation Mobile Core Network

AgentxGCore introduces an agentic AI-native layer for 6G core networks, using multi-agent systems for self-optimization. It leverages LLMs and ReAct frameworks to enable continuous, intent-driven network adaptation and management.

arXiv

Zamba2-VL Technical Report

Zamba2-VL, a hybrid Mamba2-Transformer VLM, matches top open-weight models while offering 10x faster inference. Its efficiency suits edge devices, with 1.2B, 2.7B, and 7B variants now available.

arXiv

Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters

This study compares CP tensor adapters and LoRA, finding that while CP offers finer parameter steps, it does not inherently improve the accuracy-to-budget trade-off across tasks.

arXiv

DarkVesselNet: Multi-Modal Remote Sensing and Trajectory Reasoning for Dark Vessel Detection

DarkVesselNet integrates multi-modal remote sensing and AIS trajectory logic to detect dark vessels. It is available as a Python package and on Hugging Face, supported by software validation.

arXiv

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video

GeoSAM-3D segments 3D scenes from monocular video using geodesic prompt propagation on Gaussian Splatting. This method prevents mask leakage across curved surfaces via heat-kernel distance.

arXiv

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

The study reveals that individually safe LLM agent skills can combine into dangerous pairs, with 18.2% of flagged combinations posing genuine risks. Realization of these risks depends heavily on the host model’s specific disposition and safety filters.

arXiv

Short-form Text Rewriting with Phi Silica

This study adapts Phi Silica for short-form rewriting via fine-tuning, achieving higher semantic fidelity and lower hallucinations than GPT-5-chat. It demonstrates that specialized SLM adaptation can match cloud models in precision-critical tasks.

arXiv

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

LLMs struggle to correct zero-shot errors or override internal priors, showing only a 34.8% rescue rate. Performance correlates with Definition-Specific Familiarity, not standard memorization metrics.