Global News Digest

arXiv

AXIOM: A Trust-First Neuro-Symbolic Execution Architecture for Verifiable Mathematical Reasoning

Title: AXIOM: A Neuro-Symbolic Execution Framework Prioritizing Trust for Verifiable Mathematical Logic

Abstract

This paper introduces AXIOM, a neuro-symbolic execution architecture designed with a "trust-first" approach to natural-language mathematical reasoning. Within this framework, the large language model (LLM) serves exclusively as a canonicalizer, transforming informal problem descriptions into a constrained schema that is processed by a deterministic Computer-Algebra-System (CAS) pipeline. This pipeline is responsible for deriving and verifying solutions, with the option to abstain from answering treated as a primary, first-class output rather than a failure mode.

The system’s routing mechanism relies on a strict 1:1:1 correspondence between problem-shape regular expressions, schema-specific prompts, and closed-form CAS handlers. To date, the system has deployed over 3,100 distinct routes, maintaining zero LOST_CORRECT regressions across more than 250 consecutive software releases.

Empirical evaluations across four MATH categories demonstrate a cumulative correctness rate of 94.36% (2,592 correct out of 2,747 cases) while achieving 100.00% trust. Notably, there were zero confident-wrong answers across the entire 2,747-record benchmark. Performance in all four domains exceeded the per-domain floor of 70/90/70, with per-domain trust consistently at 100.0%. Additionally, the median latency for rule-only handlers was recorded at 1 ms, covering 88% of records on the lm-eval arithmetic 20,000-record benchmark. The architecture has already processed approximately 30,000 production queries via a public deployment.

Rather than focusing solely on static accuracy metrics, we highlight the forward dynamic established by this architecture: every abstention logged in production becomes a candidate for correctness after a single ship cycle, as new tasks can be composed without causing regressions in the existing registry. The operational discipline underpinning this reliability—including math-template bucketing, the use of LOST_CORRECT scans as regression oracles, parseable-first onboarding protocols, and the treatment of abstention as a first-class output—forms a transferable framework for building trustworthy neuro-symbolic systems in fields beyond mathematics.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Schroders Renewable Unit Targets AI Assets as Power Demand Soars
Bloomberg

Schroders Renewable Unit Targets AI Assets as Power Demand Soars

Schroders’ renewable unit targets AI infrastructure, pivoting to meet soaring energy demand from artificial intelligence...

State Street's Paglia on SBI Group Partnership, ETFs
Bloomberg

State Street's Paglia on SBI Group Partnership, ETFs

State Street's Paglia discusses the SBI Group partnership and ETFs, but the source text is missing. Please provide the a...

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’
Bloomberg

Nvidia Boss Says Workers Should Be Paid ‘as Much as Possible’

Nvidia CEO Jensen Huang advocates for paying workers “as much as possible,” emphasizing maximum compensation. This stanc...

TSE Talking With Regulator For Easing ETF Listing Rules
Bloomberg

TSE Talking With Regulator For Easing ETF Listing Rules

The Tokyo Stock Exchange is discussing with regulators to ease ETF listing rules. This aims to simplify market access an...

S&P DJI CEO on Japan Markets, Mega IPOs
Bloomberg

S&P DJI CEO on Japan Markets, Mega IPOs

S&P DJI CEO discusses Japan's financial markets and major IPOs.