CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval
Title: CRISP: A Clustering-Driven Method for Reducing Redundancy in Instance Sampling for Pathological Case Representation and Retrieval
Abstract:
Digital pathology repositories are accumulating an increasing volume of whole-slide images (WSIs) for individual cases, a trend that captures spatially distinct tumor regions and highlights inherent morphological heterogeneity. Despite this data richness, prevailing methodologies typically depend on a single slide chosen by a pathologist, which results in the loss of valuable evidence contained in the remaining WSIs. To date, an autonomous system capable of comprehensive multi-WSI case processing has not been introduced. In this study, we propose an unsupervised framework for case-level analysis designed to synthesize information from every available slide within a case. Instead of limiting analysis to one designated slide, our approach generates case-level representations by selectively extracting informative patches from across the WSIs.
We introduce CRISP (Clustering-Based Redundancy-Reduced Instance Sampling for Pathology), a two-stage methodology. First, it minimizes redundancy within individual WSIs, and then it employs clustering-based sampling to identify a compact yet representative collection of patches for the entire case. This resulting patch set effectively encapsulates case-level heterogeneity without the computational burden of processing gigapixel images in their entirety, while simultaneously functioning as a direct retrieval index.
We evaluated CRISP using two breast cancer datasets from Mayo Clinic, focusing on diagnosis and treatment planning. Our results show that CRISP consistently performs on par with or better than the current standard practice, which combines model-based and pathologist-driven slide selection for patient and case search and retrieval. By automating the processing of cases and removing the subjectivity associated with WSI selection, CRISP offers the potential to unlock clinically significant information distributed across multiple WSIs that is currently ignored.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC





