Technology
Coupling Language Models with Physics-based Simulation for Synthesis of Inorganic Materials
This study integrates LLMs with physics-based simulations to plan inorganic material synthesis. Results show LLMs generate more feasible strategies than traditional algorithms, leveraging their implicit knowledge.
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary
Extended reasoning fails deterministic tasks due to architectural limits, creating a "Deterministic Horizon" where tool delegation is essential. Hybrid approaches significantly outperform pure neural methods, confirming an inherent capability ceiling.
From Noise to Control: Parameterized Diffusion Policies
Parameterized Diffusion Policy (PDP) conditions diffusion on a learned behavior manifold, enabling precise control and smooth interpolation. It outperforms standard policies by synthesizing novel behaviors without weight updates.
From "Weak" Signals to Strong Models: Preference Delta Aggregation with LoRA Merging
Preference Delta Aggregation merges LoRA adapters from weak model pairs to boost strong LLMs. Combined with Geometric Alignment Merging, it significantly outperforms baselines in reasoning and agentic search.
Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture
This paper proposes ICAM, a six-layer framework for model-native computing, resolving LLM roles via a dual-plane perspective. It introduces three design laws to address cache, context, and agent efficiency challenges in future system architectures.
Evaluating Bivariate Causal Statements Based on Mutual Compatibility
This study proposes compatibility and incompatibility scores to evaluate bivariate causal statements without relying on ground truth. These metrics effectively distinguish accurate from erroneous claims, aiding validation of insights from humans or AI.
On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral
The study finds that while cheap LLMs classify well, high-cost models like GPT-5 are needed for effective legal triage questions. However, inconsistencies in specific areas like domestic violence highlight the need for specialized screening modules.
Robust Shielding for Safe Reinforcement Learning
This paper introduces robust shielding for safe reinforcement learning in unknown environments. It guarantees safety under worst-case transitions while allowing optimal agent behavior.
MindZero: Learning Online Mental Reasoning With Zero Annotations
MindZero enables efficient online mental reasoning in MLLMs via self-supervised reinforcement learning, eliminating annotation needs. It outperforms model-based methods in speed and accuracy for real-time AI assistance.
Capability Self-Assessment: Teaching LLMs to Know Their Limits
This study shows LLMs overestimate competence, but reinforcement learning effectively teaches Capability Self-Assessment (CSA) without degrading performance. CSA generalizes well and improves decision-making and training data selection.
Closed-Loop Neural Activation Control in Vision-Language-Action Models
CTRL-STEER introduces a closed-loop framework for VLA models, using adaptive control signals to replace static steering. This approach enhances task success and stability by dynamically adjusting interventions based on real-time feedback.
Geodesic Flow Matching for Denoising High-Dimensional Structured Representations
Geodesic Flow Matching denoises Spatial Semantic Pointers on toroidal manifolds, avoiding Euclidean flaws. It reduces SLAM tracking error by 72% and boosts neural efficiency by 40%.
TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation
TIGER mitigates multimodal hallucinations by routing evidence via graph-based risk scoring for targeted, parameter-free fact repair. It geometrically reduces risk while maintaining task quality across diverse cross-modal pathways.
CAST: Non-Privileged Clipped Asymmetric Self-Teaching with Advantage Flipping for GRPO
CAST enhances GRPO via non-privileged, clipped asymmetric self-teaching. It uses advantage flipping to correct token-level signals without reference answers.
Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs
Grovers shifts comprehension to the write phase via bottom-up inductive traversal, eliminating per-query LM costs. It ensures high KV-cache hits and zero fallback rates through deterministic, theorem-backed indexing.
A Multi-AI-agent Framework Enabling End-to-end Finite Element Analysis for Solid Mechanics Problems
AbaqusAgent uses six LLM-based agents to automate end-to-end Finite Element Analysis, achieving an 86% success rate across 50 solid mechanics problems. This framework simplifies FEA workflows and lowers entry barriers for users.
Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems
Product-aware autoencoders outperform global models in multi-product CPS, achieving 100% attack detection versus 22.2%. This approach eliminates blind spots caused by aggregated operational variance.
Evaluating Interactive Reasoning in Large Language Models: A Hierarchical Benchmark with Executable Games
This paper introduces a hierarchical benchmark using 474 executable games to evaluate LLMs' interactive reasoning, evidence gathering, and metacognitive adaptation. Results show significant performance disparities, particularly in counterfactual revision tasks.
On the evolution of the concept of probability as a mirror of the evolution of reason
This paper traces probability’s evolution as a mirror of reason, identifying its limits in handling conceptual ambiguity. It positions fuzzy logic and deep learning as complementary tools for modern scientific rationality.
Optimal Transport-based Permutation-Invariant Bayesian Optimization of Offshore Wind Farm Layouts
This study introduces Permutation-Invariant Bayesian Optimization (PIBO) using Optimal Transport to optimize offshore wind farm layouts. PIBO outperforms traditional methods, reducing computational time by 50% while generating superior configurations.