Abstract

Pathological image analysis is a crucial field in deep learning applications. However, training effective models demands large-scale annotated data, which faces challenges due to sampling and annotation scarcity. The rapid developing generative models show potential to generate more training samples in recent studies. However, they also struggle with generalization diversity when limited training data is available, making them incapable of generating effective samples. Inspired by pathological transitions between different stages, we propose an adaptive depth-controlled diffusion (ADD) network for effective data augmentation. This novel approach is rooted in domain migration, where a hybrid attention strategy blends local and global attention priorities. With feature measuring, the adaptive depth-controlled strategy guides the bidirectional diffusion. It simulates pathological feature transition and maintains locational similarity. Based on a tiny training set (samples ≤ 500), ADD yields cross-domain progressive images with corresponding soft labels. Experiments on two datasets suggest significant improvements in generation diversity, and the effectiveness of the generated progressive samples is highlighted in downstream classification tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2051_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2051_supp.pdf

Link to the Code Repository

https://github.com/Rowerliu/ADD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_Generating_MICCAI2024,
        author = { Liu, Zeyu and Zhang, Tianyi and He, Yufang and Zhang, Guanglei},
        title = { { Generating Progressive Images from Pathological Transitions via Diffusion Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes an adaptive depth-controlled diffusion (ADD) network to generate pathological progressive images for effective data augmentation. The main innovations of ADD include: 1) a pathological domain migration approach using a bidirectional diffusion process to bridge different pathological states, 2) a hybrid attention strategy to maintain local and global similarities during generation, and 3) an adaptive depth-control mechanism with feature measuring to simulate the progressive transitions and generate corresponding soft labels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Significant technique novelty, I think this method can be extended the more scenarios.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The clinical relevance of generating progressive images between different stages is unclear. For instance, it’s uncertain what a generated example with 0.5 negative and 0.5 positive denotes.
    2. Table 1 illustrates that generation diversity stems from the HAS module. Authors should provide a clearer explanation for this. On the other hand, while the paper claims to improve generation diversity by generating progressive images, table 2 shows ADD without HAS only achieves low generation diversity.
    3. The experiments lack persuasiveness. To validate the proposed method’s effectiveness, experiments are conducted on small datasets. More extensive datasets should be utilized. Furthermore, additional experiments should be designed to demonstrate its effectiveness for downstream tasks, such as testing different numbers of generated samples to assess their impact on classification performance.
    4. the diversity metric is unreasonable. The low LPIPS only illustrates the diversity in pixel space, but not the semantic space. I think authors can use UMAP as related work [1] to measure the diversity of the generated images.

    [1] Latent Diffusion Models with Image-Derived Annotations for Enhanced AI-Assisted Cancer Diagnosis in Histopathology

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    please release the source code upon acceptance of the submission

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see weakness 1, 2, 4

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clinical significance is unclear, and the metric also is not reasonable.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    Despite the authors’ detailed response, I still maintain my rating for the following reasons:

    The clinical significance of this paper remains concerning. Although the authors use progressive samples to illustrate the transition of pathological stages, no experiments have been conducted to substantiate this point.

    The chosen metric, LPIPS, is not suitable for measuring diversity. Two images with significant pixel-level diversity may have similar semantics, while two images with minimal pixel-level diversity may have entirely different semantics.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method for generating images using diffusion model by considering the pathological transitions. The idea presented in the paper is based on a prior work called dual diffusion models for image to image translation. The proposed method improves this method by adding 2 components: 1) hybrid attention strategy that focus on global and local features to ensure consistency between source and target images, 2) Adaptive Depth-controlled Strategy to progressively sample from different pathological stages. The method is evaluated in terms of the quality of the generated samples and training a classifier using these samples for augmentation. The results show that the proposed method improves the existing methods on multiple datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper improves and adapts an interesting prior work called Dual diffusion models (DDM) to generate progressive images from pathological transitions.
    • Each component introduced in the paper is evaluated on multiple datasets to show their improvement.
    • Comparisons with the existing image-to-image translation and image generation methods convincingly show the potential of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In section 2.1, the paper briefly describes the diffusion models and the method in [18]. It is not very clear if the paper directly applies [18] at this stage or propose any modification.

    • Hybrid Attention Strategy (HAT) component also deserves more elaboration. Where is this attention block placed in the UNet used in diffusion model? What are given as Q, K, and V? How is the prioritization achieved to use global and local attention? I think these details are important to understand the method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Hybrid Attention Strategy (HAT) is an important component of the proposed method; however, its description is quite convoluted. A more clear description of the method is necessary for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I wrote my detailed feedback in the weaknesses section about each weakness of the method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this is a quite strong paper with sufficient experiments and deserves publication. Although I am quite positive, I would like to make sure that the method is described clearly and I understood most of the details. Therefore, I suggest a conditional acceptance for the paper and expect more elaboration of HAT in the rebuttal.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Thanks to authors for the rebuttal. The author’s responses have addressed my concerns. Therefore, I suggest acceptance for this paper.



Review #3

  • Please describe the contribution of the paper

    The paper presents the Adaptive Depth-Controlled Diffusion (ADD) network, an innovative solution for generating progressive pathological images, which tackles the issue of scarce annotated data in pathological image analysis training. By incorporating a hybrid attention mechanism and adaptive depth control, the ADD network efficiently simulates transitions between pathological stages. This novel methodology offers potential benefits such as enhanced data augmentation, improved model training, and preservation of crucial locational similarities, thereby facilitating accurate pathological analysis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The ADD network presents an innovative methodology by integrating hybrid attention and adaptive depth control, enabling the generation of progressively varied samples representing different pathological stages. This advancement enriches the training dataset, mitigating overfitting without compromising data privacy or necessitating extensive manual annotations. Moreover, the network’s ability to generate images that transition between pathological stages enhances model generalization and diagnostic accuracy in medical image analysis. The paper demonstrates, through quantitative metrics, that the ADD network produces higher quality and more diverse images compared to existing methods, which is essential for training robust pathological analysis models.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The complexity and computational demand of the ADD network may limit its accessibility in certain research and clinical environments due to significant resource requirements. While the paper provides thorough validation, questions remain regarding the generalizability of the proposed method across diverse pathological conditions and datasets. Further clarification on how the ADD network’s performance might vary across different pathological contexts is needed to enhance confidence in its applicability beyond specific scenarios presented in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    To strengthen its impact, further exploration into the generalizability of the ADD network across diverse pathological conditions and datasets is recommended. Additionally, providing detailed information on the computational resources required by the ADD network would enhance its practical applicability in research and clinical settings. Furthermore, clarification on the operation and contribution of the hybrid attention mechanism and assessment of the ADD network’s robustness to input variability are suggested to enrich the paper’s technical depth and applicability. Moreover, it would be beneficial to delve deeper into how this mechanism operates and its specific contribution to the generation process. Providing more details on the design choices and architectural components related to attention mechanisms could aid in better understanding the network’s inner workings.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper offers a strong contribution to the field of medical image analysis by addressing critical issues related to data scarcity and model overfitting through an innovative generative approach. However, addressing concerns regarding computational demands, generalizability, and providing insights into the hybrid attention mechanism and robustness to input variability would further strengthen the paper’s contribution.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I maintain my previous decision of accepting the paper regardless of the rebuttal.




Author Feedback

We sincerely appreciate all reviewers for their dedicated comments. We are encouraged by the consensus on novelty (R1-5), paper quality (R1-5) and adequate evaluation (R1, R5). We are revising the manuscript accordingly and this rebuttal focuses on clarifying certain details.

To Reviewer #1 We deeply thank you for your recognition and constructive comments and will revise our manuscript accordingly.   To Reviewer #3 Thanks a lot for your recognition of our method and constructive comments on our manuscript. R3.1 The constructive comment on clinical relevance is well taken. This paper aims to improve the downstream performance with effective sample generation, specifically, by generating the progressing samples to illustrate the transition progress of pathological stages. According to senior pathologists, taking ROSE for example, the positive samples are mutated from the negative ones, with positive features gradually appearing in the transition progress. Therefore, we take the information changes as measurements (0.5 for likely 50% of supports in positive identification) for cells that undergo changes in nucleus-cytoplasm ratio or morphology. We agree the importance on evaluation with actual progressing samples. Accordingly, we had put some ROSE samples which further confirmed the clinical relevance from the generated images matching the spatio-temporal evolutionary properties. We will put more details accordingly in the revision. R3.2 We apologize for the confusing typo in “The numerical performance in Table 2 confirms the effectiveness of HAS (ADD vs U-BDP)” which should be “The numerical performance in Table 1 confirms the effectiveness of HAS (ADD vs ADD(no-HAS))”. The ADD with HAS performs better and is applied in all other experiments. We will revise the typo and put more details. R3.3 We appreciate the comment on the dataset scale. Yes, larger dataset is helpful to evaluate the further value of this work. Still, the ADD is proposed to enhance DL methods in most pathological scenarios with sampling difficulty. Regarding the ablation suggestion, we need to point out that the suggested multiple sample size generation experiments had been put in the original supplementary material. We can improve this point and put in the body according to this valuable comment. R3.4 We appreciate the comment on evaluating the method on semantic space with UMAP. We have referred to the provided article and performed the UMAP for comparison. In UMAP analysis, ADD samples keeps better consistence with real data in distribution space. We will improve the measuring illustration with more details. However, LPIPS is used in our manuscript as it is most generally applied in few-shot generation tasks [15,22,23]. Low-level and repetitive information such as texture, shape, and structure can be comprehensively captured in feature spaces to measure diversity.

To Reviewer #5 We deeply appreciate your recognition and are improving the method writing. Our main innovation roots in the depth-control strategy and the intention to generate the progressive samples. Still, the proposed HAS is designed with improvement on the DDIB [18] backbone. Two attention-based U-nets are applied corresponding to the diffusion and denoising process, and we improve their attention modules with global and local priorities. Specifically, in each block of [18], the features are processed by 2 CNN (C) and 1 attention block (A), following the order of CCA or ACC. The A module does not change the shape of feature X while its MLP and attention calculation prioritize the feature. In the diffusion process, we first perform MLP (with global view) to get the corresponding Q, K, V and then split them into multi heads for attention. On the contrary, in the denoising process, the features are split into multiple local heads and then the Q, K, V and attention are calculated with local MLP. We will revise and offer more details in the revision.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents the Adaptive Depth-Controlled Diffusion (ADD) network, a novel approach for generating progressive pathological images, addressing the challenge of scarce annotated data in pathological image analysis. The ADD network incorporates a hybrid attention mechanism and adaptive depth control, enabling efficient simulation of transitions between pathological stages. The paper received two strong accepts and 1 reject. The main criticism raised seems to be the evaluation metrics for diversity. The authors during the rebuttal have sufficiently addressed the main concerns raised by the reviewers, highlighting the effectiveness of the ADD network for downstream tasks, which is the main focus. I would also encourage the authors take the constructive comments into their future work to strengthen the paper’s impact and applicability.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents the Adaptive Depth-Controlled Diffusion (ADD) network, a novel approach for generating progressive pathological images, addressing the challenge of scarce annotated data in pathological image analysis. The ADD network incorporates a hybrid attention mechanism and adaptive depth control, enabling efficient simulation of transitions between pathological stages. The paper received two strong accepts and 1 reject. The main criticism raised seems to be the evaluation metrics for diversity. The authors during the rebuttal have sufficiently addressed the main concerns raised by the reviewers, highlighting the effectiveness of the ADD network for downstream tasks, which is the main focus. I would also encourage the authors take the constructive comments into their future work to strengthen the paper’s impact and applicability.



back to top