arXiv

Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair

June 4, 2026 · Zehua Cheng, Wei Dai, Jiahao Sun, Thomas Lukasiewicz · Original Source

Title: Empowering Large Language Models with Bidirectional Logic for Resilient Chain Repair

Abstract:

Standard autoregressive chain-of-thought (CoT) reasoning in large language models (LLMs) is inherently unidirectional, as each generated step depends exclusively on preceding tokens. This one-way inductive bias makes even highly capable models vulnerable to error snowballing; a single logical or arithmetic mistake in an early stage can irreversibly compromise the entire reasoning process. To address this, we present Teleological Reasoning Infilling (\TRI{}), a training framework that equips decoder-only transformers with a native goal-conditioned bridging ability. The core concept involves treating flawed reasoning segments as fill-in-the-middle (FIM) problems. Specifically, given a verified prefix premise ($P$), a verified downstream milestone ($S$), and the initial query ($Q$), the model is tasked with synthesizing the logical bridge ($M$) that rigorously and completely links $P$ to $S$.

To facilitate this within standard causal architectures, we propose a Prefix-Suffix-Middle (PSM) sequence rearrangement technique utilizing three distinct, non-overlapping sentinel tokens. This approach allows $M$ to attend to both $P$ and $S$ without requiring any structural changes to the self-attention mechanism. Our training strategy consists of two phases: (i) Supervised Fine-Tuning (SFT) using symbolically verified $(P, S, M)$ triples derived from formal mathematics datasets, and (ii) Direct Preference Optimisation (DPO) where a deterministic symbolic verifier (such as Lean 4 or Python) serves as the exclusive reward oracle, thereby removing the risk of LLM-judge sycophancy.

During inference, TRI functions as a precise repair module within a dual-system loop. A causal draft model first produces an initial reasoning trace, after which a verifier identifies any failures. TRI then infills only the corrupted segment, preserving verified portions of the chain. Extensive experiments across three benchmarks reveal that TRI delivers state-of-the-art results on all tasks while cutting token consumption per problem by 31.2%.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC