Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis
Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis
Abstract:
Public consultations typically yield massive quantities of stakeholder submissions, creating datasets that are virtually impossible to review manually. To address this challenge, we introduce a comprehensive, end-to-end pipeline powered by Large Language Models (LLMs), accompanied by an interactive dashboard, designed for the structured extraction of topics from regulatory consultation documents. We demonstrate the utility of this system using the European Commission’s Digital Fairness Act (DFA) public call for evidence as a primary case study.
The proposed system is capable of ingesting both raw PDF attachments and web-form responses, subsequently extracting topic annotations while ensuring that every finding is anchored to a verbatim quote from the original source material. In our application of the pipeline to 4,322 DFA submissions, the system generated 15,368 topic annotations, each backed by 20,951 verbatim evidence quotes.
The architecture of our solution is guided by three core tenets: verbatim grounding, complete traceability, and transparency by design. The accompanying dashboard presents the entire extraction dataset through five distinct analytical perspectives, ranging from high-level topic summaries at the dataset level to granular, paragraph-level drill-downs. Crucially, every data point within the dashboard remains traceable back to its original source.
Notably, beyond the predefined topic categories established for the DFA, the pipeline identified specific stakeholder concerns—such as Age Verification, Payment Processor Censorship, and Digital Ownership—that would likely have been overlooked by a rigid, fixed-taxonomy methodology. The pipeline is designed to be domain-agnostic; adapting it for a new consultation involves merely updating the prompt and loading a new dataset. A live demonstration of the dashboard can be accessed at https://dfa-dashboard.thalesbertaglia.com/. Both the source code and the processed data are publicly accessible via https://github.com/thalesbertaglia/dfa-dashboard.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




