Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery
Title: Focus on Structural Identification, Not Complex Models, to Advance Scientific Discovery
Abstract:
The application of modern Machine Learning (ML) and Artificial Intelligence (AI), particularly large language models (LLMs), to generate scientific hypotheses and mechanistic explanations from observational data is becoming increasingly common. This position paper contends that mechanistic learning is inherently underdetermined within the high-dimensional proxy regimes where contemporary ML systems excel. Because numerous incompatible mechanisms can produce identical observational relationships across the data’s support, achieving predictive accuracy and providing coherent narratives do not constitute sufficient proof of having discovered the true underlying mechanism. This problem of underdetermination poses a unique risk when utilizing LLMs, as these models have a tendency to consolidate vast equivalence classes of potential explanations into a single, fluent story. To address this, the paper outlines specific standards for "mechanistic ML," arguing that such norms are essential to ensure that LLM-centric workflows genuinely support scientific inquiry rather than merely simulating it.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



