Global News Digest

Technology

arXiv

MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models

MENTIS quantifies alignment-induced geometric changes in LLMs, revealing selective, depth-specific torsion shifts. This reveals internal structural impacts beyond behavioral metrics.

arXiv

Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

LLM-based cellular perturbation reasoning fails despite biological plausibility. The proposed CORE method improves accuracy by using contrastive evidence from related perturbations.

arXiv

3DCodeBench: Benchmarking Agentic Procedural 3D Modeling Via Code

3DCodeBench benchmarks VLM agents’ procedural 3D modeling via code, revealing API mismatches and geometric flaws. It offers a public dataset and human-preference ranking system to guide future model improvements.

arXiv

Leyline: KV Cache Directives for Agentic Inference

Leyline introduces KV cache directives for agentic LLMs, enabling policy-driven cache edits without full re-prefills. This solves inefficiencies in dynamic, non-append-only workflows.

arXiv

Test-Time Training for Zero-Resource Dense Retrieval Reranking

DART enhances zero-resource dense retrieval reranking by adapting scoring functions via test-time training, achieving +2.1% NDCG@10 gains with under 10ms latency.

arXiv

MViewRouter: Internalizing Geometric Equivariance via Multi-view Alternating Attention for Combinatorial Routing

MViewRouter embeds geometric equivariance via Multi-view Alternating Attention for robust combinatorial routing. It achieves competitive performance and strong zero-shot generalization on TSP and CVRP benchmarks.

arXiv

Strong Stochastic Flow Maps

Strong Stochastic Flow Maps learn strong solution maps for SDEs, enabling few-step sampling via simulation-free training. This framework outperforms prior methods in image generation and molecular systems.

arXiv

ThinkSwitch: Context Distillation with LoRA and Weight Interpolation for Specific-Purpose Reasoning Tasks

ThinkSwitch uses LoRA and weight interpolation to distill reasoning into lightweight instruct models. This co-training method boosts performance on specialized tasks while reducing latency and costs.

arXiv

Implicit Drifting Policy: One-Step Action Generation via Conditional Expert Geometry

Implicit Drifting Policy (IDP) enables fast, one-step robotic action generation by leveraging conditional expert geometry. It outperforms explicit drifting methods while maintaining precise action manifold adherence.

arXiv

A Fiber Criterion for Representation Identifiability in Supervised Learning

This paper establishes a fiber-based criterion for representation identifiability, showing that predictor-preserving augmentations prevent unique identification. Consequently, representation claims require specific assumptions beyond supervised performance.

arXiv

MiCU: End-to-End Smart Home Command Understanding with Large Language Model

MiCU is an LLM for smart home commands, using automated data and token compression to boost accuracy by 20% and reduce latency. Deployed in Xiaomi Home, it significantly improves user experience and operational efficiency.

arXiv

Beyond Task Success: Behavioral and Representational Diagnostics for WAM and VLA

This study compares WAMs and VLAs using behavioral and representational diagnostics, revealing that WAMs enhance object-level behavior but vary by architecture. Sequential WAMs show distinct predictive structures, while others compress future info, highlighting trade-offs beyond task success.

arXiv

Soft-NBCE: Entropy-Weighted Chunk Fusion for Long-Context

Soft-NBCE replaces hard chunk selection with entropy-weighted fusion and consistency distillation to resolve semantic fragmentation. It outperforms baselines on LongBench while maintaining O(L^2/n) memory efficiency.

arXiv

STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing

STARFISH restores pruned network accuracy by aligning internal states with minimal unlabeled data. It outperforms SOTA methods, recovering 82% of original accuracy at 75% pruning using only 0.4% of training data.

arXiv

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces

HASTE uses group-shared fixed fan-in sparsity to accelerate extreme multi-label classification. It achieves up to 25x speedups on GPUs while maintaining dense baseline accuracy.

arXiv

ASE-26: a curriculum for agentic software engineering as a discipline

ASE-26 proposes a curriculum to formalize agentic software engineering, addressing the shift toward AI-directed coding. It introduces an "evolutionary spiral" model and a 21-module structure to define this emerging academic discipline.

arXiv

AMP: A Vendor-Neutral Wire Format for Agent Memory Operations

MemoryWire standardizes agent memory operations via a vendor-neutral JSON protocol, enabling seamless interoperability across diverse systems. It supports five core operations and includes governance features, validated by high recall and cross-adapter conformance tests.

arXiv

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

This study reveals that sparse training delays data saturation, enabling effective learning under scarcity. It establishes new scaling laws and identifies optimal sparsity levels for balancing performance and efficiency.

arXiv

AI From the Margins (AIM): Rethinking Participatory AI Design Through the Lived Experience of Minoritized Communities

AIM centers minoritized communities’ lived experiences in AI design, shifting participation from late-stage feedback to foundational goal-setting. Tested in the Netherlands, it reshapes AI objectives through narrative elicitation and co-constructed rules.

arXiv

Physics-Informed Deep Learning for Entropy Prediction in Heterogeneous Systems: Thermodynamic and Information-Theoretic Case Studies

This study proposes a unified Physics-Informed Deep Learning framework to predict entropy in thermodynamic and information-theoretic systems. It ensures strict Second Law adherence and achieves over 90% accuracy with only 30% of the data.