arXiv

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

June 4, 2026 · Jo\~ao Pedro Gandarela, Thiago Rios, Stefan Menzel, Andr\'e Freitas · Original Source

Title: R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

Abstract:

While Large Language Models (LLMs) demonstrate fluency in open-ended tasks, this capability does not guarantee reliable performance in agentic environments. In such settings, systems must plan, utilize tools, and execute actions over extended periods. We attribute this reliability gap to three interconnected structural deficiencies: the lack of error localization, the absence of evaluation for worst-case perturbations, and the failure to invalidate accumulated knowledge. We posit that these issues stem from a common source: the conflicting demands placed on shared context by abductive, counterfactual, meta-inductive, corrective, and inductive reasoning modes.

To address these challenges simultaneously, we introduce Reflective Adversarial Pareto Search (R-APS). To the best of our knowledge, this is the first approach that tackles all three failures through reasoning-mode decomposition. This method assigns a dedicated context to each reasoning mode and orchestrates their interaction across three distinct timescales: staged compositional reasoning paired with a typed validation critic for failure localization; sensitivity-guided counterfactual stress-testing as a primary Pareto objective for robustness; and meta-inductive rule extraction with explicit invalidation for persistent memory management. R-APS operates without fine-tuning, relying solely on structured protocol design to function with a frozen LLM.

We evaluated the method on planar mechanism synthesis tasks relevant to robotics, prosthetics, and mechanical design, where every candidate design was verified by a kinematic solver. Across 32 target trajectories, R-APS achieved robustness certificates 3.5 times tighter than those from uniform-perturbation baselines. It also completed iterations-to-first-admission 46% faster and reduced Chamfer distance by a factor of 2.1 compared to an Enum+GA baseline, all while simultaneously controlling for bar count and worst-case robustness. Furthermore, experiments with small 4B reasoning-specialized models showed they could compete with general-purpose 70B backbones within the protocol, indicating that structured protocols can mitigate the need for large model scales.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC