arXiv

scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns

June 2, 2026 · Sergey V. Samsonau · Original Source

Title: scicode-lint: Identifying Methodological Flaws in Scientific Python Code via LLM-Derived Patterns

Abstract:

Methodology bugs within scientific Python scripts often yield results that appear credible yet are fundamentally flawed, a category of error that conventional static analysis tools and traditional linters fail to identify. While various research teams have developed machine learning-specific linters to prove that such detection is possible, these solutions face significant sustainability challenges. They typically rely on specific versions of Python or pylint, suffer from limited packaging capabilities, and demand manual engineering efforts to support every new detection pattern. As the proliferation of AI-generated code expands the volume of scientific software, the demand for automated methodology verification—covering issues such as data leakage, improper cross-validation, and absent random seeds—has intensified.

This paper introduces scicode-lint, a tool featuring a two-tier architecture that decouples pattern design, handled by frontier models during the build phase, from execution, which is performed by a smaller local model at runtime. Rather than being hand-coded, these patterns are generated by AI; consequently, adapting to updates in library versions requires only computational tokens rather than significant engineering hours. In evaluations using Kaggle notebooks with human-labeled ground truth, the tool achieved 100% recall with 65% precision for preprocessing leakage detection. When applied to 38 published scientific papers utilizing AI/ML methods, it attained 62% precision (as judged by an LLM), though performance varied considerably across different pattern categories. On a separate, held-out set of papers, precision dropped to 54%. Furthermore, controlled testing demonstrated that scicode-lint maintains 97.7% accuracy across 66 distinct patterns.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC