Technology
Monitoring Agentic Systems Before They're Reliable
This paper proposes a structural monitoring framework for agentic systems, using variance and FMEA to detect integration flaws. It proves structural defects mask task-level errors, enabling automated triage of 97% of issues.
Why Not Hyperparameter-Friendly Optimisation? A Monotonic Adaptive Norm Rescaling Approach For Long-Tailed Recognition
This paper introduces SAMN, a hyperparameter-free method for long-tailed recognition that uses monotonic normalization to rescale class weights. It improves performance by eliminating regularization reliance, achieving state-of-the-art results across benchmarks.
Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation
MDA eliminates flying-point artifacts by using mixture-density representations to predict multiple depth hypotheses per pixel. This approach robustly handles boundary ambiguity, transparency, and sky regions with minimal computational cost.
SimSD: Simple Speculative Decoding in Diffusion Language Models
SimSD enables speculative decoding in diffusion LLMs via a plug-and-play masking strategy, restoring token-level verification without training. This boosts inference speed while preserving parallel decoding benefits.
From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression
SubFit compresses LLMs at the submodule level, outperforming layer-based methods in accuracy and speed. It achieves superior perplexity-accuracy trade-offs across various sparsity levels.
Permissive Safety Through Trusted Inference: Verifiable Belief-Space Neural Safety Filters for Assured Interactive Robotics
This paper certifies high-probability safety for belief-space neural filters using conformal prediction. It addresses runtime inference errors to enable less conservative, verifiable safety in interactive robotics.
Algebraic anti-unification
This paper establishes an algebraic theory of anti-unification within universal algebra, extending the field beyond syntactic term representations. It defines key concepts like minimally general generalizations and explores their computability in finite algebras.
Unsupervised Cognition
This study introduces a novel, primitive-driven unsupervised learning framework that outperforms state-of-the-art methods. The model exhibits cognitive-like behaviors, surpassing both unsupervised and supervised benchmarks.
AdaCodec: A Predictive Visual Code for Video MLLMs
AdaCodec optimizes video MLLMs by transmitting only reference frames or compact change descriptions, reducing token usage to 1/7 of baselines. This approach boosts long-video performance and cuts latency from 9.26 to 1.62 seconds.
Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution
DhondtXAI is a novel, SHAP-independent XAI framework using the D'Hondt method for feature attribution. It ensures completeness and achieves high accuracy, matching SHAP benchmarks on healthcare datasets.
Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration
LEMAE uses LLMs to identify key states, guiding multi-agent exploration via SHIR rewards and KSMT memory. This reduces redundancy, outperforming SOTA methods with up to 10x speedup on benchmarks.
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
This study addresses perceptual judgment bias in multimodal LLMs by introducing a perturbation-based dataset and a GRPO reward framework. The method significantly improves evaluation accuracy, consistency, and alignment with human judgment.
Learning to Reduce Search Space for Generalizable Neural Routing Solver
L2R is a novel learning-based dynamic search space reduction framework for neural routing solvers. It enables scalable, high-quality solutions for VRPs with up to 10 million nodes.
Safety Must Precede the Deployment of Open-Ended AI
Open-ended AI’s autonomy creates unique safety risks like emergent misalignment. This paper urges prioritizing safety and proactive research before large-scale deployment.
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Spurious correlations undermine VLM safety fine-tuning, enabling attacks and over-caution. Machine unlearning mitigates this by removing biased mappings, reducing attack success by 60% and unnecessary rejections by 84%.
Finding the Minimal Parameter Budget for Implicit Reasoning: A Data Complexity Driven Scaling Law for Language Models
This study establishes a scaling law linking minimal parameter budgets for implicit reasoning to data complexity. It finds models can reason over ~0.008 bits per parameter, guiding efficient LM sizing.
Language Model Networks: Supervision-Efficient Learning through Dense Communication
LMNet enables supervision-efficient learning by replacing discrete language communication with dense, differentiable vector exchanges between LLM nodes. This approach facilitates end-to-end optimization and significant performance gains with minimal training overhead.
Formally Solving Answer-Construction Problems in Lean
The ECP framework addresses Lean answer-construction gaps by combining general LLMs for candidate generation with prover LLMs for verification. It outperforms baselines on PutnamBench and MathArena, ensuring admissible, formally verified solutions.
EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion
EMoE estimates epistemic uncertainty in text-to-image diffusion via training-free expert disagreement. It outperforms baselines in ranking prompt alignment and reveals language-specific biases.
Agent Guide: A Simple Agent Behavioral Watermarking Framework
Agent Guide embeds watermarks via behavioral probability biases, enabling reliable detection without disrupting agent actions. It offers a robust, low-false-positive solution for tracing and securing intelligent agents in digital environments.