Distilling Answer-Set Programming Rules from LLMs for Neurosymbolic Visual Question Answering
Title: Extracting Answer-Set Programming Rules from Large Language Models to Enhance Neurosymbolic Visual Question Answering
Abstract
Visual Question Answering (VQA) involves responding to queries regarding images, a process that necessitates the synthesis of multimodal data and complex reasoning. While modular frameworks that embed logic-based structures into their reasoning engines provide superior interpretability compared to end-to-end trained models, they often impose a heavy development workload when task specifications evolve. To mitigate this issue, we introduce a technique for deriving rules from Large Language Models (LLMs). This method utilizes an LLM to expand an existing VQA reasoning theory, formulated as an answer-set program, to accommodate new task demands. The process is guided by examples drawn from VQA datasets, which serve to validate outputs and refine inaccurate rules through feedback loops generated by the ASP solver. Our results confirm the efficacy of this approach across a variety of VQA benchmarks. Significantly, the LLM requires only a minimal number of examples to generate accurate rules. Experimental findings indicate that distilling rules from LLMs represents a viable and promising substitute for conventional data-driven rule learning methods. This work is currently under consideration for publication in Theory and Practice of Logic Programming (TPLP).
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



