arXiv

Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis

Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis

Abstract:

Public consultations typically yield massive quantities of stakeholder submissions, creating datasets that are virtually impossible to review manually. To address this challenge, we introduce a comprehensive, end-to-end pipeline powered by Large Language Models (LLMs), accompanied by an interactive dashboard, designed for the structured extraction of topics from regulatory consultation documents. We demonstrate the utility of this system using the European Commission’s Digital Fairness Act (DFA) public call for evidence as a primary case study.

The proposed system is capable of ingesting both raw PDF attachments and web-form responses, subsequently extracting topic annotations while ensuring that every finding is anchored to a verbatim quote from the original source material. In our application of the pipeline to 4,322 DFA submissions, the system generated 15,368 topic annotations, each backed by 20,951 verbatim evidence quotes.

The architecture of our solution is guided by three core tenets: verbatim grounding, complete traceability, and transparency by design. The accompanying dashboard presents the entire extraction dataset through five distinct analytical perspectives, ranging from high-level topic summaries at the dataset level to granular, paragraph-level drill-downs. Crucially, every data point within the dashboard remains traceable back to its original source.

Notably, beyond the predefined topic categories established for the DFA, the pipeline identified specific stakeholder concerns—such as Age Verification, Payment Processor Censorship, and Digital Ownership—that would likely have been overlooked by a rigid, fixed-taxonomy methodology. The pipeline is designed to be domain-agnostic; adapting it for a new consultation involves merely updating the prompt and loading a new dataset. A live demonstration of the dashboard can be accessed at https://dfa-dashboard.thalesbertaglia.com/. Both the source code and the processed data are publicly accessible via https://github.com/thalesbertaglia/dfa-dashboard.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

AI Concentration Risk Is the Problem: 3-Minutes MLIV
Bloomberg

AI Concentration Risk Is the Problem: 3-Minutes MLIV

The article argues that AI concentration risk, rather than the technology itself, is the primary concern. It highlights ...

Reuters

Foxconn announces strategic collaboration with Intel on next-gen AI infrastructure

Foxconn and Intel announced a strategic partnership to develop next-generation AI infrastructure. This collaboration aim...

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Reuters

Europe's tech 'liberation day'? Computer says not yet

Europe’s expected tech breakthrough remains unrealized, as current systems indicate that a true "liberation day" has not...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.