Abstract

In the field of computational pathology, deep learning algorithms have made significant progress in tasks such as nuclei segmentation and classification. However, the potential of these advanced methods is limited by the lack of available labeled data. Although image synthesis via recent generative models has been actively explored to address this challenge, existing works have barely addressed label augmentation and are mostly limited to single-class and unconditional label generation. In this paper, we introduce a novel two-stage framework for multi-class nuclei data augmentation using text-conditional diffusion models. In the first stage, we innovate nuclei label synthesis by generating multi-class semantic labels and corresponding instance maps through a joint diffusion model conditioned by text prompts that specify the label structure information. In the second stage, we utilize a semantic and text-conditional latent diffusion model to efficiently generate high-quality pathology images that align with the generated nuclei label images. We demonstrate the effectiveness of our method on large and diverse pathology nuclei datasets, with evaluations including qualitative and quantitative analyses, as well as assessments of downstream tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0251_paper.pdf

SharedIt Link: https://rdcu.be/dY6ic

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_4

Supplementary Material: N/A

Link to the Code Repository

https://github.com/hvcl/ConNucDA

Link to the Dataset(s)

Lizard: https://www.kaggle.com/datasets/aadimator/lizard-dataset PanNuke: https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke EndoNuke: https://endonuke.ispras.ru/ More information about the data splits and preprocessing used in the experiment can be found in the code repository.

BibTex

@InProceedings{Oh_Controllable_MICCAI2024,
        author = { Oh, Hyun-Jic and Jeong, Won-Ki},
        title = { { Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {36 -- 46}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors proposed a text-guided diffusion-based model to synthesize multi-class pathology images. The generated image-mask pairs can be used for nuclei segmentation. The model contains two stages. One is a label synthesis process that generates semantic and instance labels with text prompts. The text guidance can introduce more information like tissue type, class type and propotion, which help to solve class-imbalance issues. The other is a image synthesis process which is to create corresponding image for the generated labels. The text encoder and diffusion model for generating images are pre-trained with pathology images. Several visual and quantitative results validates the effectiveness of this model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed text-guided multi-class generation approach represents a novel contribution to the field of pathology image synthesis. The pipeline appears to be effective in generating diverse multi-class pathology images.
    2. The utilization of guidance that does not solely rely on pixel-wise correspondence can be advantageous for handling out-of-distribution image samples. This approach helps alleviate the challenges associated with data imbalancing, contributing to improved performance and robustness of the model.
    3. The proposed method demonstrates higher generation efficiency compared to previous methods, thanks to the utilization of latent diffusion models.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed methods include numerous references, which restrict the self-containment of this paper. Additionally, some key implementations are not clearly explained, such as the utilization of C_pr in the backward diffusion process during label synthesis or the specific implementation details of incorporating the scalable component with the trained diffusion model.

    2. The improvement in efficiency is largely attributed to the adoption of LDM (Latent Diffusion Models) in this study. However, the main contribution of this paper, which focuses on text-guided multi-label generation may even perform weaker than the compared methods.

    3. In the field of multi-class image generation, there are more recent methods in both pathology image and natural image domains that should be considered for experimental comparison. For example: [1] Oh, Hyun-Jic, and Won-Ki Jeong. “Diffmix: Diffusion model-based data synthesis for nuclei segmentation and classification in imbalanced pathology image datasets.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023. [2] Tang, Hao, et al. “Edge guided gans with multi-scale contrastive learning for semantic image synthesis.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

    4. The segmentation results of the Hover-net baseline on the PanNuke and Lizard datasets in this paper show a significant decrease compared to the results reported in existing literatures: [1] Doan, Tan NN, et al. “SONNET: A self-guided ordinal regression neural network for segmentation and classification of nuclei in large-scale multi-tissue histology images.” IEEE Journal of Biomedical and Health Informatics 26.7 (2022): 3218-3228. [2] Lou, Wei, et al. “Structure embedded nucleus classification for histopathology images.” IEEE Transactions on Medical Imaging (2024). [3] Wang, Huadeng, et al. “Multi-task generative adversarial learning for nuclei segmentation with dual attention and recurrent convolution.” Biomedical Signal Processing and Control 75 (2022): 103558. [4] Lou, Wei, et al. “Cell Graph Transformer for Nuclei Classification.” arXiv preprint arXiv:2402.12946 (2024). What’s your implementation and data splits on these datasets? The evaluation of the data augmentation effects can be greatly influenced by the different implementation performance of the baseline.

    5. How is the efficiency calculated for the compared methods and the proposed method? Which parts/stages are included in the generation time?

    Minor:

    1. The format of symbols is not unified, for example, in eq.(1), eq.(2), the `x_t’ seems has different format.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    There are several important points that need clarification by the authors. Firstly, the comparison with methods like Diffmix [1] should be elaborated further. Secondly, the discrepancy between the baseline results and existing methods needs to be addressed. In addition to these concerns, since the majority of the utilized dataset focus on cell nucleus classification, the proposed method could demonstrate greater advantages in multi-class classification tasks. It would be beneficial for the authors to validate the effectiveness of their approach against existing cell nucleus classification methods such as Hover-net and SONNET. [1] Oh, Hyun-Jic, and Won-Ki Jeong. “Diffmix: Diffusion model-based data synthesis for nuclei segmentation and classification in imbalanced pathology image datasets.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023. [2] Doan, Tan NN, et al. “SONNET: A self-guided ordinal regression neural network for segmentation and classification of nuclei in large-scale multi-tissue histology images.” IEEE Journal of Biomedical and Health Informatics 26.7 (2022): 3218-3228. In addition, it is important for the authors to provide detailed descriptions of key implementation details in order to ensure clarity in the methodology. For instance, how the text features are incorporated into the diffusion process during label synthesis and how the text features and labels are combined with trainable layers and the frozen diffusion model during the image generation process. Providing such information will aid readers in understanding and reproducing this work effectively.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of some comparisons; Lack of detailed illustrations for the proposed methods; Concers on the performance of the baseline models on two datasets.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision
    1. Lack of clarification and details about the framework.

    2. The data splits and experimental settings are incorrect, which undermines the validation of the proposed data augmentation method’s effectiveness. In reality, since the generative model is trained based on segmentation labels from the training set, the generated images already learned from the distribution of a large amount annotations of the training set. However, the baseline segmentation methods being compared do not have access to these training set labels. This comparison is therefore very unfair.

    3.The generation time is not reported; the provided sampling steps cannot be represented as time-consuming. The data augmentation method can be very time-comsuming which limits its real application ability.



Review #2

  • Please describe the contribution of the paper

    Proposing a novel two-stage framework for multi-class pathology nuclei data augmentation Introducing text-conditioned label synthesis using a joint diffusion model to control spatial layout and class distributions Tailoring pre-trained latent diffusion models for efficient high-quality image synthesis conditioned on generated labels and text prompts Demonstrating effectiveness through comprehensive evaluations across diverse datasets Improving downstream nuclei segmentation and classification performance with the synthesized data

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novel formulation of text-conditioned diffusion models for controllable multi-class label synthesis, combined with the tailoring of latent diffusion models for efficient pathology image generation, is a key strength enabling high-quality and diverse data augmentation. The comprehensive evaluations and comparisons further highlight the method’s effectiveness and clinical feasibility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Could you please clarify which specific portions of the Lizard, PanNuke, and EndoNuke datasets are allocated for training, validation, and testing purposes? To my understanding, the reported baseline F1 score for Hover-net on the PanNuke dataset is 0.8, as per reference [1], not 0.763 as stated in your manuscript. Could you please explain this discrepancy?

    [1] Table Ⅴ in “PanNuke Dataset Extension, Insights and Baselines”

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Provide more details on experimental setup.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend this paper for acceptance based on its novel formulation of text-conditioned diffusion models for controllable multi-class nuclei label synthesis combined with tailored latent diffusion models for efficient pathology image generation. The method’s effectiveness is thoroughly demonstrated through comprehensive evaluations on diverse datasets and downstream tasks. The ability to generate high-quality, diverse, and controllable synthetic pathology data has significant potential for improving computational models for disease diagnosis and prognosis, making this a valuable contribution to the field of computational pathology and medical image synthesis.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a multi-class nuclei data generation algorithm that uses text as conditions and a diffusion model as the generator. By altering the text descriptions, the required cell annotations and corresponding images can be generated. The quality of the generated images is exceptionally high, with cell annotations accurately corresponding to the cells in the images. Additionally, this method achieves remarkable results on both H&E and IHC types of pathological images, making it a highly valuable contribution to the field.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Utilizes textual descriptions to generate matching image and cell mask data, leveraging the descriptions as conditions.

    1. The generated images of various stains are realistic, and the generation speed significantly surpasses existing methods.
    2. Comprehensive experimental results demonstrate the effectiveness of the proposed method. 
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The writing logic in the Introduction section needs improvement.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Each paragraph should clarify a single logic; for example, the first paragraph should only explain the significance of data generation without adding descriptions of GANs; the second paragraph should summarize generative models first, then detail the shortcomings of GANs and diffusion models, thereby introducing the authors’ work.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work realizes the generation of matched high-quality multi-class cell segmentation training images by text description, and verifies its validity on H&E and IHC data, which is a pioneering work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    Even without a rebuttal, I believe this work must be accepted. The preparation and annotation of pathological images are time-consuming and labor-intensive tasks. Although other researchers, like those in ‘Diffusion-based Data Augmentation for Nuclei Image Segmentation, MICCAI 2023,’ have also generated synthetic training data, those methods are uncontrollable and random. However, this work provides an interactive solution by controlling the generation of images and masks through text, which is of significant importance. Researchers can obtain training data cheaply and quickly as per the requirements of their tasks, such as controlling the quantity of certain types of cells to address sample imbalance issues. This is especially meaningful for training data related to the recognition of rare pathological diseases. Therefore, despite lacking algorithmic innovation or other deficiencies, I believe this work should be accepted. I hope that the promise of open-sourcing the code mentioned by the authors is fulfilled.




Author Feedback

We appreciate the reviewers for acknowledging the novelty and contribution of our research. We provide detailed responses to feedback below.

[R1, R3] Clarification: We will revise the minor errors and clarify the logic for better understanding.

[R1, R3, R4] Source code & data: We plan to release the code with links to the data along with a detailed explanation of the algorithm.

[R3] Details about method (C_pr and trainable layers of pretrained LDM): Regarding the implementation of the backward process with C_pr for label synthesis, as described in Section 3.2, we followed the source code of Imagen [21]. For the scalable components of the pretrained LDM, we employed a trainable copy of the encoder structure from PathLDM [29], with zero convolution layers, similar to the approach used in ControlNet [32]. While it may not be feasible to include all the implementation details due to page limitations, we will try our best to clarify this in the revised text.

[R3] Efficiency of LDM for the task: As shown in Table 3, our method achieved comparable or better performance compared to SDM. However, it is important to note that while SDM generates synthetic images from the pixel-level label constraints, our method can generate both synthetic labels and image data from simple text constraints. Considering our goal of improving model performance through data augmentation, achieving similar performance to SDM with our LDM-based image synthesis demonstrates a reasonable contribution in terms of reducing the time cost associated with data generation.

[R3] Downstream performance comparison (e.g., DiffMix and Tang et al.): Ensuring semantic alignment at the instance level is crucial when performing data augmentation on nuclei images. We compared our method to SDM because it has demonstrated effective performance and semantic alignment capabilities for nuclei image synthesis, as evidenced by NASDM. Although we did not implement specific downstream tasks like addressing class imbalance, which DiffMix focuses on, our results can provide insights similar to DiffMix since DiffMix is a framework based on the SDM. The work by Tang et al. is not directly comparable to our study because it does not guarantee instance-specific differentiation during image synthesis, which is a key requirement for nuclei segmentation tasks.

[R3] Generation time calculation: As shown in Table 1 and 2, we calculated the synthesis time for generating the output label or image using the optimal number of sampling steps (T). In this experiment, SDM achieved its best performance with T=1000, while our fine-tuned LDM obtained optimal performance at T=100.

[R3, R4] Details about data splits and experimental settings for downstream tasks: To train the generative model, we divided each dataset into training and test sets with the following ratios: 80:20 for PanNuke, 85:15 for EndoNuke, and the NASDM split for the Lizard dataset. The difference between the baseline (HoverNet) performance in Table 3 and the performance reported in the literature for the Lizard and PanNuke datasets is due to the different training data sizes used for the downstream task; Our baseline model was trained using the patches that were not used for training the generative model to avoid data contamination, which is actually the test set used in the literature. Since this is much smaller than the original training data in the literature, the performance was lower than the reported numbers in the literature. As long as we used the same data setting, the relative performance comparison between methods in Table 3 should be fine.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a new two-stage pipeline for text-conditioned nuclei image synthesis. The reviewers and AC acknowledge the importance of this problem setting and the reasonable results. However, the reviewer pointed out value comments on experimental description and method explanation, which should be further clarified in the final version. Moreover, as promised in the rebuttal, please provide the code and data in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents a new two-stage pipeline for text-conditioned nuclei image synthesis. The reviewers and AC acknowledge the importance of this problem setting and the reasonable results. However, the reviewer pointed out value comments on experimental description and method explanation, which should be further clarified in the final version. Moreover, as promised in the rebuttal, please provide the code and data in the final version.



back to top