
Diagnosing medical conditions from histopathology data requires a thorough analysis across the various resolutions of Whole Slide Images (WSI). However, existing generative methods fail to consistently represent the hierarchical structure of WSIs due to a focus on high-fidelity patches. To tackle this, we propose Ultra-Resolution Cascaded Diffusion Models (URCDMs) which are capable of synthesising entire histopathology images at high resolutions whilst authentically capturing the details of both the underlying anatomy and pathology at all magnification levels. We evaluate our method on three separate datasets, consisting of brain, breast and kidney tissue, and surpass existing state-of-the-art multi-resolution models. Furthermore, an expert evaluation study was conducted, demonstrating that URCDMs consistently generate outputs across various resolutions that trained evaluators cannot distinguish from real images. All code and additional examples can be found on GitHub.

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0770_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0770_supp.pdf

https://www.nejm.org/doi/10.1056/NEJMp1607591 https://www.cancer.gov/ccg/research/genome-sequencing/tcga


Review #1

    This paper presents a novel method utilizing Ultra-Resolution Cascaded Diffusion Models (URCDMs) to generate high-quality, realistic histopathology images at the Whole Slide Imaging (WSI) scale, which is a pioneering achievement in the field. The approach effectively captures intricate details at various magnifications and facilitates long-range contextual understanding, overcoming the memory limitations observed in attention-based models. Importantly, it accomplishes this with significantly reduced computational resources, enabling efficient image generation, particularly in data-intensive WSI learning scenarios.

    1. The paper conducts comprehensive experiments on three diverse datasets.
    2. Addressing the topic of generating WSIs holds clinical significance.
    1. Lack of experiments utilizing the proposed methods as an augmentation technique to enhance patch classification and WSI classification.
    2. The computational requirements of the paper are significant, but there is a lack of detailed analysis on this aspect.
    3. The technique described in the paper is not depicted clearly. The meaning of the blue, red, and green boxes in Fig. 1 remains unclear.
    4. Code availability would greatly aid in understanding the proposed method.
    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

    see weakness

    Weak Reject — could be rejected, dependent on rebuttal (3)

    Although this task is intersting and significant, the technique introduction is poor, also missing some important experiments.

    Confident but not absolutely certain (3)

    Weak Accept — could be accepted, dependent on rebuttal (4)

    The authors have addressed my first question regarding the clarity of the method description and my second question about computation cost, promising to include the computation cost analysis in the revised manuscript. However, they did not respond to my question about the downstream task. Therefore, I change my rate to weak accept.

Review #2

    The paper introduces an innovative method called Ultra-Resolution Cascaded Diffusion Models (URCDM) to generate high-fidelity, photorealistic histopathology images across multiple magnifications of Whole Slide Images (WSI). This method addresses the challenge of representing the hierarchical structure of WSIs by synthesizing entire images, capturing detailed anatomical and pathological features. Evaluated on three distinct datasets (brain, breast, kidney tissues), URCDMs have shown to surpass existing state-of-the-art models and produce images indistinguishable from real ones by trained evaluators.

    1.The paper employs a three-stage cascade structure to capture detailed features at multiple magnifications effectively. This novel approach allows for a fine-grained synthesis of images, enhancing the resolution and detail fidelity at each subsequent stage. 2.Similar to Cascaded super-resolution methods, in this paper, each stage of the image generation process is conditioned on the output from the previous stage, ensuring continuity and consistency with the high-level structure of the original image. This method significantly enhances the coherence and realism of the synthesized histopathology images, maintaining important contextual relationships.

    1.The paper lacks a detailed and clear explanation of the URCDM image generation process, particularly the mechanics of transitioning between different stages of the model. This omission hampers the reader’s understanding of the operational framework and theoretical underpinnings of the proposed method. 2.The presentation of experimental results is limited to a few examples, which does not sufficiently demonstrate the robustness or effectiveness of the proposed methods across varying conditions or datasets. 3.the authenticity and applicability of the synthesized images have only been validated on one of the three datasets used in the study. This narrow scope of validation raises concerns about the generalizability of the results across different types of histopathology data.

    Very Good

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

    It is necessary to provide a detailed description of the overall framework of the proposed Ultra-Resolution Cascaded Diffusion Models. For example, How the 1024x1024 image generated in the first stage is upsampled to generate the 6400x6400 image in the second stage is not explained in detail.

    (1)The author needs to provide a detailed description of how to use the image patches from the previous stage as conditions to ensure that the generated patches are consistent with the high-level structure of the image. (2) It is necessary for the authors to explain how to ensure the authenticity of the generated pathological images in order to apply them to downstream tasks such as classification, segmentation, and recognition.

    Weak Accept — could be accepted, dependent on rebuttal (4)

    Utilizing diffusion models to generate pathological images is quite common, however, generating whole slide images using these models is a relatively rare endeavor. Therefore, this work is both intriguing and insightful.

    Very confident (4)

    Weak Accept — could be accepted, dependent on rebuttal (4)

    My decision remains unchanged; the rebuttal lacks new perspectives or results that would warrant an improved score.

Review #3

    The authors introduced a new method for generating high-quality synthetic whole slide images (WSI) in histopathology using cascaded diffusion models. The effectiveness of the method was demonstrated through a comprehensive analysis and evaluation by experts.

    1. They contributed to a new research direction for WSI in pathology, which had previously focused primarily on high-quality patch-by-patch studies.
    2. Their expert evaluation provides a balanced view, highlighting both the strengths and weaknesses of their approach.
    1. Question for the need to synthesize high-resolution WSIs: Given that current pathology analysis algorithms mainly relies on a patch-by-patch approach, I have a concern about the need to generate high-resolution whole slide images (WSIs) that differ from studies such as [1].
    2. Validity of expert assessment: If synthetic WSIs are of such a quality that their authenticity can be easily judged by experts familiar with certain shortcuts, it is questionable whether it is a meaningful experiment to assess their authenticity. [1] Aversa, Marco, et al. “Diffinfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology.” Advances in Neural Information Processing Systems 36 (2024).
    Very Good

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

    Performing downstream analysis using real and synthetic WSIs seems like a good way to validate the usefulness of synthetic WSIs.

    Weak Accept — could be accepted, dependent on rebuttal (4)

    The experimental results and discussion of the results were a major factor in the overall score.

    Confident but not absolutely certain (3)

    Weak Accept — could be accepted, dependent on rebuttal (4)

    This work pioneered WSI generation work, bringing interesting research directions to the domain.

We thank the reviewers for their thoughtful feedback and time. The reviewers agree on pioneering methodological novelty, commend the new research direction, clinical significance, and thorough evaluation.

(R1, R4) Clarity of method description: We use a cascading diffusion model approach. (1) generation of a low-resolution (1024x1024 pixels) WSI using the first of three CDMs. (2) results from the first sage are enhanced through overlapping patches of the generated low-resolution image as conditioning for the second CDM. This second model uses the spatial context provided by the first to generate higher-resolution images (also 1024x1024 pixels) for each patch’s center. These patches are then stitched together, taking into account the overlaps, to form a medium-resolution image of 6400x6400 pixels. The process is repeated with the third CDM to achieve a final, high-resolution synthetic WSI of 41,344x41,344 pixels. This method allows each stage of magnification to build upon the last, refining details and expanding the image size. Inpainting ensures seamless integration of patches, avoiding artifacts and ensuring that synthetic images are useful for both computational analysis and practical clinical applications. As explained in the caption of Figure 1 the blue box is the conditioning patch, the green box signifies the center patch that is generated and the red box shows the output of size 1024x1024 for each image. We will use colour font to highlight this in the caption.

(R4) downstream experiments/computational costs: Our method is particularly useful for niche domains where limited or no public data is available, thus kidney pathology is a prominent example in our work. We also integrate public datasets to a) show that our method scales to other domains and b) to foster reproducibility. To the best of our knowledge only very few works focus on large scale dependencies and we are the first to synthesize at WSI scale. Using synthetic patch data as augmentation has been evaluated in literature before. As stated in the introduction, our main aim is to make such data publicly available through synthesis and to enhance other data, e.g., for bias mitigation. This is essential, when downstream tasks depend on WSI-level assessment such as structural assessment in the kidney in contrast to, e.g., localized cell anomalies in cancer. We will clarify computational requirements in the camera ready version. Note, that synthesis only needs to be done once before, e.g., a synthetic dataset can be shared.

(R3, R4) expert validation: To the best of our knowledge our work is the first to show synthetic data that is essentially indistinguishable from real data for experts. At present, we only have access to kidney pathologists and will expand this evaluation in future, since human expert user studies need to be carefully balanced with the workload of rare experts. Note that the shortcut mechanism (R3) is extremely rare and cannot be used to reliably identify synthetic data at scale. We will clarify in the paper.

(R3, R4) usefulness of synthetic WSIs: Many patch-based pathology algorithms rely on “bags of patches” sourced from complete WSIs; therefore, synthesizing the entire WSI is crucial for providing realistic and structurally accurate training data. Furthermore, the generation of high-resolution WSIs mirrors clinical practice and can also be used for human training on top of data sharing, augmentation, and bias mitigation options as discussed above. We will discuss differences of large-mask synthesis (Aversa et al.) vs. whole WSI generation in the paper.

(R1) only a few examples: we evaluate on three different datasets and the supplement shows several visual examples. We will add more visual examples to the supplement in the final version.

(R3, R4) We will publish our code together with a synthetic dataset from our kidney data with the camera-ready paper as a public github repository.


Meta-review #1

Meta-review #2

    Based on the reviews and author feedback, I recommend accepting this paper. The reviewers collectively appreciate the methodological novelty of the Ultra-Resolution Cascaded Diffusion Models (URCDM) which generate high-fidelity, photorealistic histopathology images across multiple magnifications. While there are some concerns about the clarity of the method and limited experimental demonstrations, these issues are not severe enough to outweigh the benefits.

