Be Fair! Can Machine Learning Engineering Agents Adhere to Fairness Constraints?
Title: Ensuring Equity: Can Machine Learning Engineering Agents Satisfy Fairness Requirements?
Abstract:
Machine learning engineering (MLE) agents offer the potential to automate the entire machine learning pipeline, transforming raw data and natural language directives into functional models. This automation could democratize access to machine learning, allowing non-technical domain experts to build models independently. However, in sectors that are highly regulated or sensitive, this high level of abstraction introduces a significant responsibility gap. End-users often cannot see the underlying design decisions that impact the model’s correctness, robustness, fairness, or adherence to regulatory standards. We contend that current benchmarks fail to adequately determine whether MLE agents can be deployed safely in these critical contexts. To address this, we propose a set of requirements for a responsibility-focused evaluation framework and carry out an exploratory investigation into melanoma classification, treating fairness across different skin tones as a key constraint. Our assessment of two recent MLE agents reveals that the pipelines they generate exhibit substantial variance and consistently lag behind manually crafted baselines in both predictive accuracy and fairness metrics, even when fairness-specific prompts are utilized. These initial findings indicate a pressing need for further research aimed at reengineering MLE agents. Specifically, future developments should enable humans to steer the search process and ensure that the quality and regulatory compliance of generated pipelines can be reliably verified.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC




