arXiv

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

Title: Empowering Large Language Models with Bidirectional Logic for Resilient Chain Repair

Abstract:

Standard autoregressive chain-of-thought (CoT) reasoning in large language models (LLMs) is inherently unidirectional, as each generated step depends exclusively on preceding tokens. This one-way inductive bias makes even highly capable models vulnerable to error snowballing; a single logical or arithmetic mistake in an early stage can irreversibly compromise the entire reasoning process. To address this, we present Teleological Reasoning Infilling (\TRI{}), a training framework that equips decoder-only transformers with a native goal-conditioned bridging ability. The core concept involves treating flawed reasoning segments as fill-in-the-middle (FIM) problems. Specifically, given a verified prefix premise ($P$), a verified downstream milestone ($S$), and the initial query ($Q$), the model is tasked with synthesizing the logical bridge ($M$) that rigorously and completely links $P$ to $S$.

To facilitate this within standard causal architectures, we propose a Prefix-Suffix-Middle (PSM) sequence rearrangement technique utilizing three distinct, non-overlapping sentinel tokens. This approach allows $M$ to attend to both $P$ and $S$ without requiring any structural changes to the self-attention mechanism. Our training strategy consists of two phases: (i) Supervised Fine-Tuning (SFT) using symbolically verified $(P, S, M)$ triples derived from formal mathematics datasets, and (ii) Direct Preference Optimisation (DPO) where a deterministic symbolic verifier (such as Lean 4 or Python) serves as the exclusive reward oracle, thereby removing the risk of LLM-judge sycophancy.

During inference, TRI functions as a precise repair module within a dual-system loop. A causal draft model first produces an initial reasoning trace, after which a verifier identifies any failures. TRI then infills only the corrupted segment, preserving verified portions of the chain. Extensive experiments across three benchmarks reveal that TRI delivers state-of-the-art results on all tasks while cutting token consumption per problem by 31.2%.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.