arXiv

Position: Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

June 2, 2026 · Sophia N. Wilson, Andrew Millard, Gu{\dh}r\'un Fj\'ola Gu{\dh}mundsd\'ottir, Raghavendra Selvan, Sebastian Mair · Original Source

Title: Position: Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

Original: arXiv:2602.19789v2 Announce Type: replace Abstract: This position paper argues that the machine learning community must move from preaching to practising data frugality for responsible artificial intelligence (AI) development. For too long, progress has been equated with ever-larger datasets, driving remarkable advances but now yielding increasingly diminishing performance gains alongside rising energy use and carbon emissions. While awareness of data frugal approaches has grown, their adoption has remained rhetorical, and data scaling continues to dominate development practice. We argue that this gap between preach and practice must be closed, as continued data scaling entails substantial and under-accounted environmental impacts. To ground our position, we provide indicative estimates of the energy use and carbon emissions associated with the downstream use of ImageNet-1K. We then present empirical evidence that data frugality is both practical and beneficial, demonstrating that subset selection methods can substantially reduce training energy consumption with little loss in accuracy, while also mitigating dataset bias. Finally, we outline actionable recommendations for moving data frugality from rhetorical preaching to concrete practice for responsible development of AI.

Rewrite: Title: Position: Stop Preaching and Start Practising Data Frugality for Responsible Development of AI

Abstract: This position paper contends that the machine learning community must transition from advocating for data frugality to actively implementing it as a cornerstone of responsible artificial intelligence (AI) development. Historically, the field has conflated advancement with the expansion of dataset sizes, a strategy that has fueled significant progress but now results in marginal performance improvements coupled with escalating energy consumption and carbon output. Although recognition of data-efficient methods has increased, their integration into workflows remains largely theoretical, with the scaling of data still prevailing in standard practices. We assert that this disconnect between rhetoric and action needs to be resolved, given that the ongoing expansion of data carries significant, often unquantified, environmental costs. To substantiate our stance, we offer preliminary calculations regarding the energy usage and carbon footprint linked to the downstream application of ImageNet-1K. Subsequently, we provide empirical support showing that data frugality is both feasible and advantageous; specifically, we demonstrate that techniques for selecting data subsets can markedly lower the energy required for training with minimal compromise on accuracy, while also helping to address dataset bias. The paper concludes by proposing practical steps to transform the concept of data frugality from mere discussion into tangible practice, thereby fostering the responsible evolution of AI.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC