Technology
MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models
MENTIS quantifies alignment-induced geometric changes in LLMs, revealing selective, depth-specific torsion shifts. This reveals internal structural impacts beyond behavioral metrics.
Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning
LLM-based cellular perturbation reasoning fails despite biological plausibility. The proposed CORE method improves accuracy by using contrastive evidence from related perturbations.
3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code
3DCodeBench benchmarks VLM agents’ procedural 3D modeling via code, revealing API mismatches and geometric flaws. It offers a public dataset and human-preference ranking system to guide future model improvements.
Leyline: KV Cache Directives for Agentic Inference
Leyline introduces KV cache directives for agentic LLMs, enabling policy-driven cache edits without full re-prefills. This solves inefficiencies in dynamic, non-append-only workflows.
Test-Time Training for Zero-Resource Dense Retrieval Reranking
DART enhances zero-resource dense retrieval reranking by adapting scoring functions via test-time training, achieving +2.1% NDCG@10 gains with under 10ms latency.
MViewRouter: Internalizing Geometric Equivariance via Multi-view Alternating Attention for Combinatorial Routing
MViewRouter embeds geometric equivariance via Multi-view Alternating Attention for robust combinatorial routing. It achieves competitive performance and strong zero-shot generalization on TSP and CVRP benchmarks.
Strong Stochastic Flow Maps
Strong Stochastic Flow Maps learn strong solution maps for SDEs, enabling few-step sampling via simulation-free training. This framework outperforms prior methods in image generation and molecular systems.
ThinkSwitch: Context Distillation with LoRA and Weight Interpolation for Specific-Purpose Reasoning Tasks
ThinkSwitch uses LoRA and weight interpolation to distill reasoning into lightweight instruct models. This co-training method boosts performance on specialized tasks while reducing latency and costs.
Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry
Implicit Drifting Policy (IDP) enables fast, one-step robotic action generation by leveraging conditional expert geometry. It outperforms explicit drifting methods while maintaining precise action manifold adherence.
A Fiber Criterion for Representation Identifiability in Supervised Learning
This paper establishes a fiber-based criterion for representation identifiability, showing that predictor-preserving augmentations prevent unique identification. Consequently, representation claims require specific assumptions beyond supervised performance.
MiCU: End-to-End Smart Home Command Understanding with Large Language Model
MiCU is an LLM for smart home commands, using automated data and token compression to boost accuracy by 20% and reduce latency. Deployed in Xiaomi Home, it significantly improves user experience and operational efficiency.
Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA
This study compares WAMs and VLAs using behavioral and representational diagnostics, revealing that WAMs enhance object-level behavior but vary by architecture. Sequential WAMs show distinct predictive structures, while others compress future info, highlighting trade-offs beyond task success.
Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context
Soft-NBCE replaces hard chunk selection with entropy-weighted fusion and consistency distillation to resolve semantic fragmentation. It outperforms baselines on LongBench while maintaining O(L^2/n) memory efficiency.
STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing
STARFISH restores pruned network accuracy by aligning internal states with minimal unlabeled data. It outperforms SOTA methods, recovering 82% of original accuracy at 75% pruning using only 0.4% of training data.
HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces
HASTE uses group-shared fixed fan-in sparsity to accelerate extreme multi-label classification. It achieves up to 25x speedups on GPUs while maintaining dense baseline accuracy.
ASE-26: a curriculum for agentic software engineering as a discipline
ASE-26 proposes a curriculum to formalize agentic software engineering, addressing the shift toward AI-directed coding. It introduces an "evolutionary spiral" model and a 21-module structure to define this emerging academic discipline.
AMP: A Vendor-Neutral Wire Format for Agent Memory Operations
MemoryWire standardizes agent memory operations via a vendor-neutral JSON protocol, enabling seamless interoperability across diverse systems. It supports five core operations and includes governance features, validated by high recall and cross-adapter conformance tests.
When Data Is Scarce: Scaling Sparse Language Models with Repeated Training
This study reveals that sparse training delays data saturation, enabling effective learning under scarcity. It establishes new scaling laws and identifies optimal sparsity levels for balancing performance and efficiency.
AI From the Margins (AIM): Rethinking Participatory AI Design Through the Lived Experience of Minoritized Communities
AIM centers minoritized communities’ lived experiences in AI design, shifting participation from late-stage feedback to foundational goal-setting. Tested in the Netherlands, it reshapes AI objectives through narrative elicitation and co-constructed rules.
Physics-Informed Deep Learning for Entropy Prediction in Heterogeneous Systems: Thermodynamic and Information-Theoretic Case Studies
This study proposes a unified Physics-Informed Deep Learning framework to predict entropy in thermodynamic and information-theoretic systems. It ensures strict Second Law adherence and achieves over 90% accuracy with only 30% of the data.