Global News Digest

Technology

arXiv

Monitoring Agentic Systems Before They're Reliable

This paper proposes a structural monitoring framework for agentic systems, using variance and FMEA to detect integration flaws. It proves structural defects mask task-level errors, enabling automated triage of 97% of issues.

arXiv

Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition

This paper introduces SAMN, a hyperparameter-free method for long-tailed recognition that uses monotonic normalization to rescale class weights. It improves performance by eliminating regularization reliance, achieving state-of-the-art results across benchmarks.

arXiv

Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation

MDA eliminates flying-point artifacts by using mixture-density representations to predict multiple depth hypotheses per pixel. This approach robustly handles boundary ambiguity, transparency, and sky regions with minimal computational cost.

arXiv

SimSD: Simple Speculative Decoding in Diffusion Language Models

SimSD enables speculative decoding in diffusion LLMs via a plug-and-play masking strategy, restoring token-level verification without training. This boosts inference speed while preserving parallel decoding benefits.

arXiv

From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression

SubFit compresses LLMs at the submodule level, outperforming layer-based methods in accuracy and speed. It achieves superior perplexity-accuracy trade-offs across various sparsity levels.

arXiv

Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics

This paper certifies high-probability safety for belief-space neural filters using conformal prediction. It addresses runtime inference errors to enable less conservative, verifiable safety in interactive robotics.

arXiv

Algebraic anti-unification

This paper establishes an algebraic theory of anti-unification within universal algebra, extending the field beyond syntactic term representations. It defines key concepts like minimally general generalizations and explores their computability in finite algebras.

arXiv

Unsupervised Cognition

This study introduces a novel, primitive-driven unsupervised learning framework that outperforms state-of-the-art methods. The model exhibits cognitive-like behaviors, surpassing both unsupervised and supervised benchmarks.

arXiv

AdaCodec: A Predictive Visual Code for Video MLLMs

AdaCodec optimizes video MLLMs by transmitting only reference frames or compact change descriptions, reducing token usage to 1/7 of baselines. This approach boosts long-video performance and cuts latency from 9.26 to 1.62 seconds.

arXiv

Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution

DhondtXAI is a novel, SHAP-independent XAI framework using the D'Hondt method for feature attribution. It ensures completeness and achieves high accuracy, matching SHAP benchmarks on healthcare datasets.

arXiv

Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

LEMAE uses LLMs to identify key states, guiding multi-agent exploration via SHIR rewards and KSMT memory. This reduces redundancy, outperforming SOTA methods with up to 10x speedup on benchmarks.

arXiv

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

This study addresses perceptual judgment bias in multimodal LLMs by introducing a perturbation-based dataset and a GRPO reward framework. The method significantly improves evaluation accuracy, consistency, and alignment with human judgment.

arXiv

Learning to Reduce Search Space for Generalizable Neural Routing Solver

L2R is a novel learning-based dynamic search space reduction framework for neural routing solvers. It enables scalable, high-quality solutions for VRPs with up to 10 million nodes.

arXiv

Safety Must Precede the Deployment of Open-Ended AI

Open-ended AI’s autonomy creates unique safety risks like emergent misalignment. This paper urges prioritizing safety and proactive research before large-scale deployment.

arXiv

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

Spurious correlations undermine VLM safety fine-tuning, enabling attacks and over-caution. Machine unlearning mitigates this by removing biased mappings, reducing attack success by 60% and unnecessary rejections by 84%.

arXiv

Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models

This study establishes a scaling law linking minimal parameter budgets for implicit reasoning to data complexity. It finds models can reason over ~0.008 bits per parameter, guiding efficient LM sizing.

arXiv

Language Model Networks: Supervision-Efficient Learning through Dense Communication

LMNet enables supervision-efficient learning by replacing discrete language communication with dense, differentiable vector exchanges between LLM nodes. This approach facilitates end-to-end optimization and significant performance gains with minimal training overhead.

arXiv

Formally Solving Answer-Construction Problems in Lean

The ECP framework addresses Lean answer-construction gaps by combining general LLMs for candidate generation with prover LLMs for verification. It outperforms baselines on PutnamBench and MathArena, ensuring admissible, formally verified solutions.

arXiv

EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion

EMoE estimates epistemic uncertainty in text-to-image diffusion via training-free expert disagreement. It outperforms baselines in ranking prompt alignment and reveals language-specific biases.

arXiv

Agent Guide: A Simple Agent Behavioral Watermarking Framework

Agent Guide embeds watermarks via behavioral probability biases, enabling reliable detection without disrupting agent actions. It offers a robust, low-false-positive solution for tracing and securing intelligent agents in digital environments.