HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Title: HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings
Abstract: Precisely labeling earnings reports can generate substantial short-term gains for investors and other stakeholders. While public financial filings are required to use machine-readable inline eXtensible Business Reporting Language (iXBRL), the intricate and granular nature of its taxonomy hinders the portability of tagged Key Performance Indicators (KPIs) across different companies. To overcome this challenge, we present the Hierarchical Financial Key Performance Indicator (HiFi-KPI) dataset. This extensive corpus comprises 1.65 million paragraphs and 198,000 distinct labels, which are hierarchically structured and associated with iXBRL taxonomies. HiFi-KPI is designed to support various analytical tasks; we specifically evaluate its utility in KPI classification, KPI extraction, and structured KPI extraction. For quicker benchmarking, we also provide HiFi-KPI-Lite, a manually curated subset containing 8,000 paragraphs. Baseline results on HiFi-KPI-Lite indicate that encoder-based models attain a macro-F1 score exceeding 0.906 in classification tasks, whereas Large Language Models (LLMs) achieve an F1 score of 0.440 in structured extraction. Furthermore, qualitative insights suggest that most extraction errors stem from issues with dates. We have open-sourced both the code and the data at https://github.com/aaunlp/HiFi-KPI.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




