arXiv

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

June 4, 2026 · Xizi Luo, Changhong He, Dongdong Geng, Chenggong Shi, Yu Mei · Original Source

Title: Enhancing LLM-Based Optimization for Vehicle Routing: A Constraint Injection Approach Beyond Objective Equivalence

Abstract:

While large language models (LLMs) are increasingly utilized to convert natural-language descriptions of optimization problems into executable solver code, significant challenges remain for constraint-heavy operations research (OR) tasks. Current data-filtering and training methodologies predominantly depend on objective-equivalence metrics, such as answer agreement and differential testing. However, these signals are insufficient; a generated program may pass these tests by introducing irrelevant constraints or silently dropping essential ones, provided that such errors do not affect the objective value on the specific test instance.

To address this, we introduce "constraint injection," a technique that employs feasible probes to detect spurious over-constraints and one-constraint-violating probes to uncover silent omissions. When integrated with differential testing, this method creates a robust dual verification system. We demonstrate the efficacy of this approach on vehicle routing problems (VRPs), a complex combinatorial optimization domain characterized by coupled operational constraints.

Our work features VRPCoder, an 8-billion-parameter end-to-end model designed to translate natural-language VRP scenarios into Gurobi scripts. This model is supported by a comprehensive benchmark suite of 21 VRP variants, all verified by domain experts. We utilize the constraint injection verifier as a rejection-sampling filter during data synthesis and as a per-rollout reward signal within Group Relative Policy Optimization (GRPO).

Experimental results across four VRP benchmarks show that VRPCoder-GRPO achieves an average Pass@1 rate of 93%. It outperforms Gemini-3.1-Pro Preview on three benchmarks, surpasses Claude-Sonnet-4.5 by an average margin of 28 points, and exceeds previous OR-LLMs by an average of 78 points.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC