Technology
Agentic Transformers Provably Learn to Search via Reinforcement Learning
This study proves agentic transformers learn randomized depth-first search via RL, using specialized heads for action tracking and backtracking. The mechanism emerges from sparse feedback, enabling depth generalization and optimized search under imbalanced goals.
Learning to Construct Practical Agentic Systems
This paper introduces a modular framework for practical agentic systems, balancing simplicity and cost with performance. It combines hand-engineered fixed workflows with novel learning techniques to optimize both accuracy and inference expenses.
BAGEN: Are LLM Agents Budget-Aware?
BAGEN study reveals LLM agents lack inherent budget-awareness, often over-optimistically wasting resources. While training improves alerting and reduces costs, precise budget interval calibration remains challenging.
From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets
PRAXIS efficiently approximates decision tree Rashomon sets, drastically reducing runtime and memory usage. It enables scalable modeling of robust, interpretable machine learning models for real-world data.
The New Social Image: How AI Competency and AI Proactivity Influence Self- and Peer-Perceptions in the Workplace
Low AI competency/proactivity boosts ownership and satisfaction, while high performance may undermine it. Workplace AI design must prioritize human perceptions over pure metrics to preserve job meaningfulness.
Continuous Reasoning for Vision-Language-Action
This paper introduces Continuous Reasoning for VLA, using a shared Gaussian latent interface to replace text for fine-grained control. It employs self-verification to ensure robust, generalizable action prediction.
Civilizational Metamaterials: Engineering Coordination Under Capability Gradients and Structural Turbulence
This paper proposes a metamaterials-based framework to quantify governance, addressing AGI-induced "Freezing Equilibrium" through a constitutive law for institutional coordination. It outlines a three-tier provenance taxonomy and a trial to test hypotheses on preventing structural turbulence.
InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate
InfoAtlas is a foundation model enabling instant, zero-shot mutual information estimation via a single forward pass. It matches state-of-the-art precision while offering 100x speed improvements and robust generalization.
SEMBridge: Tagless-Final Program Semantics with Weakest-Precondition and Bounded-Checking Interpretations
SEMBridge is a Python framework generating weakest-precondition and bounded-checking interpretations from unified tagless-final programs. It synchronizes executable semantics with verification artifacts for rigorous program validation.
Effects of Varying LLM Access on Essay Writing Behavior
Unrestricted LLM access reduced student authorship and creativity, while restricted use fostered strategic revision and ownership without compromising essay quality.
When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE
WEINCE corrects InfoNCEās softmax limitations using extreme value theory, blending logits with batch statistics. It boosts frozen-feature performance on vision benchmarks without extra parameters.
StressDream: Steering Video World Models for Robust Policy Evaluation and Improvement
StressDream steers video world models toward high-impact, plausible outcomes by optimizing initial noise. This enables robust policy evaluation and improvement by identifying actions leading to undesirable results.
Hyperbolic and Evidence-Prioritized Experts for Large Vision-Language Models
AsyMoE uses hyperbolic geometry and evidence-prioritized experts to address modality asymmetry in LVLMs. It outperforms baselines by up to 3.8% and reduces parameter activation by 25.45%.
Synthetic Data from Cross-Domain Events for Large-Scale Recommendation Systems
SCALR generates synthetic user-item interactions for recommendation systems by translating source domain events, addressing data sparsity. This model-agnostic approach significantly improves performance in industrial A/B testing.
ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate
ARCA mitigates token signal degeneration in LoRA-based RL by measuring adapter residuals for credit assignment. It achieves competitive MATH performance without learned reward models or value heads.
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
TOPD improves on-policy distillation by using near-future guidance to target true reasoning divergences, boosting accuracy to 52.2% and outperforming standard methods on AIME benchmarks.
Real2SAM2Real: Generative 3D Caches as Complementary Context for Video Diffusion
Real2SAM2Real enhances video diffusion with generative 3D caches for precise camera and object control. This approach ensures robust spatiotemporal consistency during complex motions and occlusions.
Rethinking the Role of Temperature in Large Language Model Distillation
This study reveals temperatureās asymmetric impact on KL divergences, showing FKL outperforms RKL at higher temperatures. This overturns standard distillation practices, enabling simple KL methods to compete with state-of-the-art approaches.
DRL-Based Pose Control for Double-Ackermann Robots Under Actuation Uncertainties
This study enhances double-Ackermann robot pose control using DRL and a sim-to-sim-to-real approach to address actuation uncertainties. The method achieves high transfer success to physical hardware without further tuning.
LLMs Need Encoders for Semantic IDs Too
The paper introduces PrefixMem, a lightweight encoder for Semantic IDs in LLMs, demonstrating significant accuracy and recall improvements. This confirms that dedicated encoders, like those for vision, are essential for handling context-dependent non-textual modalities effectively.