arXiv

Parthenon Law: A Self-Evolving Legal-Agent Framework

June 4, 2026 · Hejia Geng, Leo Liu · Original Source

Title: Parthenon Law: A Self-Evolving Legal-Agent Framework

Abstract:

As legal-domain large language model (LLM) agents become increasingly sophisticated, they hold the potential to transform document-intensive processes into manageable, reviewable outputs. However, their reliable deployment is currently hindered by three primary challenges: the absence of large-scale empirical evidence regarding the performance of today’s most advanced model-and-harness combinations on complete legal matters; the lack of agent architectures specifically tailored to the legal sector, with existing solutions relying on general-purpose frameworks; and the inability of systems to learn from their own outcomes in dynamic environments characterized by evolving facts, authorities, and deadlines. This paper addresses each of these gaps.

First, we present a large-scale empirical study conducted on Harvey LAB, analyzing 12,510 agent trajectories. The findings reveal that even frontier-level agents struggle to resolve matters in a single pass. While per-criterion accuracy improves with more powerful models, the rate of strict matter completion remains stagnant.

To overcome these limitations, we introduce \textsc{Parthenon}, a self-evolving legal-agent framework. This architecture decomposes the system into auditable components comprising the Model, Harness, Agent roles, legal Knowledge, deterministic Tools, and procedural Skills. These components are designed to ensure source traceability, accurate grounding of dates and numbers, compliance with deliverable standards, and effective issue closure.

Furthermore, \textsc{Parthenon} incorporates an anti-leakage learning loop. This mechanism transforms scored failures into task-agnostic adjustments to skills, tools, and knowledge bases. This allows the system to enhance its performance through experience—similar to how a law firm refines its checklists and playbooks after each case—without requiring modifications to the underlying model weights. Our extensive empirical analysis demonstrates that \textsc{Parthenon} significantly boosts the performance of state-of-the-art models and harnesses on legal-matter tasks.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC