arXiv

Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

June 2, 2026 · Devansh Lalwani, Swapnil Bhat, Maulik Shah · Original Source

Harmonizing Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

Abstract:

While attention-based multiple instance learning (ABMIL) built upon foundation features has achieved near-optimal slide-level performance on the Camelyon16 dataset for weakly-supervised whole-slide image classification, the resulting attention maps remain a flawed localization tool. In clinical settings, trust is compromised when a model yields correct classifications but fails to highlight the actual lesion. To bridge this gap, we introduce cellular sheaves—a mathematical framework that assigns finite-dimensional vector spaces to graph vertices and edges, connected by consistent linear maps. This structure offers a rigorous method for identifying local inconsistencies within graph-structured data.

By integrating a sheaf disagreement field with ABMIL, we apply cellular sheaves to the task of weakly-supervised tumor localization in whole-slide images. Although the standard training objective, which promotes consistency among similar features, generates a disagreement field that reflects tissue texture rather than diagnostic relevance, we propose a solution: attention-conditional consistency. This approach leverages the classifier’s attention mechanisms to dictate which neighboring patches should maintain agreement.

Jointly training the classifier and the sheaf under this specific objective yields a disagreement field achieving a patch-level AUC of 0.940 on Camelyon16. Furthermore, it elevates the attention head’s performance from the ABMIL-only baseline of 0.717 to 0.953. Ablation studies involving a two-stage approach, where the classifier is frozen at its ABMIL-derived values, result in only a 0.727 AUC for the disagreement field and leave attention unchanged at 0.717. This confirms that the performance gains stem from the projector’s co-adaptation under both objectives, rather than from the loss modification alone.

Additionally, the trained model demonstrates strong transferability to annotated slides from Camelyon17 without the need for retraining, sustaining a Delta AUC of 0.932 ± 0.083 and an attention AUC of 0.955 ± 0.099. The outcome is a synchronized pair of attention and sheaf-disagreement maps that activate on identical diagnostic regions, providing clinicians with two complementary interpretive insights for every slide-level prediction.

Source: arXiv Generated at: 2026-06-02 00:00:00 UTC