arXiv

Auditable Climate Risk Intelligence from Fragmented ESG Data: Deterministic Orchestration and Imbalance-Aware Learning for Scope 1-3 Validation

June 3, 2026 · Karan Sehgal, Khawar Naveed Bhatti · Original Source

Title: Ensuring Auditable Climate Risk Insights from Disparate ESG Data: A Framework for Deterministic Orchestration and Imbalance-Aware Scope 1-3 Validation

Abstract:

Current ESG and climate risk information is scattered across diverse reporting landscapes encompassing Scope 1, Scope 2, and Scope 3 emissions. Traditional validation workflows often fall short in providing provenance-aware auditability, detecting hidden data drift, and ensuring governance that prioritizes reproducibility. To address these gaps, this study introduces a deterministic climate risk intelligence framework. This approach combines a single source of truth orchestration model with temporal anomaly detection, imbalance-aware ensemble learning, and governance structures designed for explainability, thereby enabling auditable ESG validation.

To facilitate open reproducibility, we have developed and published a synthetic ESG validation benchmark. This dataset is calibrated to reflect the public characteristics of established standards, including the GHG Protocol, PCAF, and ISSB. The proposed methodology employs temporal drift analysis, SMOTE-based optimization for rare events, ensemble learning techniques, and provenance-aware orchestration. Furthermore, it utilizes TreeSHAP-based interpretability to support governance inspection and the reconstruction of audit trails.

We assessed the framework’s performance by comparing it against statistical classifiers, anomaly detection algorithms, temporal forecasting baselines, and a threshold-based system. Evaluation metrics included classification measures (recall, F1 score, ROC AUC), calibration indicators (Expected Calibration Error, Brier score), and a governance-specific audit trace completeness metric. This final metric quantifies the proportion of flagged anomalies for which a deterministic provenance chain—tracing from the source to the escalation point—can be successfully reconstructed. All results are presented as mean values with standard deviations, derived from stratified five-fold cross-validation accompanied by paired significance testing. Ultimately, this framework shifts the paradigm of ESG reporting toward a deterministic climate risk governance infrastructure that prioritizes reproducibility, explainability, and operational auditability.

Source: arXiv Generated at: 2026-06-03 00:00:00 UTC