arXiv

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

June 4, 2026 · Thanh Luong Tuan, Abhijit Sanyal · Original Source

Title: Ensuring Enterprise AI Reliability Before Deployment: A Framework for Ontology-Based Simulation and Trust Certification

Abstract:

A significant disconnect persists between benchmarking large language model (LLM) capabilities and their actual production deployment, specifically regarding the pre-deployment verification of enterprise AI agents. Once these agents are live, traditional safeguards such as post-deployment monitoring, human-in-the-loop oversight, and prompt-level guardrails provide insufficient assurance. To address this, we introduce a verification framework rooted in ontology that integrates three core elements: an "Agent Operational Envelope," which defines the certification boundary through permissions, domain constraints, safety properties, governance rules, and autonomy levels; an automated pipeline that converts ontology data into regulatory, operational, and adversarial test scenarios; and a "Trust Certificate" featuring machine-verifiable attestations with tiered deployment outcomes (Approved, Conditional, or Rejected).

We conducted a controlled pilot involving five industry-by-regulatory-regime cells across the United States and Vietnam, spanning four highly regulated sectors: Fintech, Banking, Insurance, and Healthcare. This study evaluated 1,800 scenarios against 125 primary-source regulatory requirements and 25 injected faults. The results demonstrated that ontology-grounded generation (G4) achieved a regulatory coverage of 48.3%, significantly outperforming the persona-based baseline’s 33.1% (corrected p = .0006) and yielding the highest domain specificity score of 4.77/5.0 (p = 2e-6). However, this coverage advantage over both the baseline and retrieval-augmented prompting methods did not remain statistically robust after applying Bonferroni correction. These findings were replicated through cross-validation across three LLM families—Claude Sonnet 4, Qwen 2.5 72B, and Gemma 4 26B—using a total of 5,400 scenarios, which reaffirmed the superiority of the ontology approach over persona-based methods. Ultimately, the study establishes ontology-grounded scenario generation as a reliable supplement to persona-based test suites, particularly for domains with intensive regulatory demands.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC