Abstract

Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion model features poses a great challenge to prostate segmentation. In this paper, we proposed CriDiff, a two-stage feature injecting framework with a Criss-cross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation. The CIS maximizes the use of multi-level features by efficiently harnessing the complementarity of high and low-level features. To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS: the Boundary Enhance Conditioner (BEC) and the Core Enhance Conditioner (CEC), which discriminatively model the image edge regions and non-edge regions. Moreover, the GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0339_paper.pdf

SharedIt Link: https://rdcu.be/dZxdd

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_10

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0339_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_CriDiff_MICCAI2024,
        author = { Liu, Tingwei and Zhang, Miao and Liu, Leiye and Zhong, Jialong and Wang, Shuyao and Piao, Yongri and Lu, Huchuan},
        title = { { CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {102 -- 112}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a diffusion model-based method for prostate segmentation. The main contribution of the paper comes from the design of the individual boundary and non-boundary feature extraction (using the boundary enhance conditioner (BEC) and core enhance conditioner (CEC), respectively) to separately guide the denoising process of the diffusion model for prostate segmentation. Four public prostate datasets (three in MR and one in ultrasound modality) were used to evaluate the proposed methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of using decomposed boundary and non-boundary features to guide the diffusion model for image segmentation is novel. The general (or mainstream) paradigm of using diffusion models to do the image segmentation task is to condition the diffusion model on the raw image features extracted from the image. This work tried to condition the diffusion model on some refined features (i.e., the decomposed boundary and non-boundary features separately extracted from the raw image) to achieve better performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The technical novelty of the proposed generative pre-train (GP) approach is limited. The GP approach seems more like some training trick (which is well-known and often used in the training of diffusion models) to accelerate the convergent speed. It is hard to see any innovation behind this proposed design.

    • The motivation or significance of the proposed design of BEC and CEC is unclear or questionable. It is unclear why the BEC and CEC are necessary and important to the proposed method. If the BEC and CEC can perfectly work as what we expect, why not directly utilize the extracted boundary and non-boundary features to infer the prostate segmentation mask? Why do we need to feed the extracted features into a diffusion model, which requires much more computational resources and time than a regular CNN like U-Net?

    • Many typos or expression issues are found. Please see my detailed comments for more details.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Page 1, Abstract, “… which discriminatively model the image edge regions and non-edge regions.”: Insert “respectively” to the end of this sentence.

    • Page 1, Section 1, “… they use the softmax in the cross-entropy loss overemphasizes the highest logit, leading to deterministic predictions.”: This statement is not true. Actually, the output of CNN often suffers from the over smoothing issue due to the large-size of receptive field.

    • Page 2, Section 1, “limiting early-stage accuracy in object localization or shape …”: “shape” -> “shaping.”

    • Page 2, Section 1, “Consequently, it is essential to intrudce an efficient method”: Typo “introduce.”

    • Page 3, Section 1, “(2) We introduced a Generative Pertrain (GP) method for …”: Typo “pre-train.”

    • Page 6, Table 1: Fix the citation issue of “SegDiff [?]” method.

    • Page 6, Section 3: There is no statistical analysis to demonstrate the significance of improvement margins.

    • Page 6, Section 3, “We conducted 25 ensemble runs with T=500.”: Applying ensemble strategy to the proposed method is unfair to other competing methods since they just have one trial.

    • Page 7, Section 3.2: Fix the citation issue of “SegDiff [?]” method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Major flaws (such as limited technical novelty and unconvincing motivation of the proposed methodology) are found in this paper, which requires a substantial revision to fix. Considering the limited time and space for rebuttal, I recommend a rejection of this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Thanks for the authors’ efforts in addressing my concerns and comments. Most of my concerns are addressed or clarified in the rebuttal except for the one regarding the motivation and significance of the proposed design of BEC and CEC. The authors’ answer to my question is still not quite convincing: If the BEC and CEC work well, why do we need a subsequent diffusion model to do the segmentation task? If the author want to introduce the uncertainty estimation function to the model, we can use testing-time augmentation to do that in a much cheaper way, rather than the diffusion model. The inference of the diffusion model is very expensive in terms of time, especially when we want to run it for multiple inferences. After another round of review and considering other reviewers’ comments, I decide to raise my score from 2 (Reject) to 3 (Weak reject).



Review #2

  • Please describe the contribution of the paper

    This paper introduces a method based on Diffusion Probabilistic Models for prostate segmentation. It aims to overcome the limitations of previous methods, which fail to allow diffusion models to effectively learn both edge and non-edge features, by introducing a Crisscross Injection Strategy (CIS). Additionally, the paper employs a generative pre-training approach to establish a robust backbone for the model. The effectiveness of the proposed method is evaluated using four distinct datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method shows superior segmentation performance across four datasets than other competing methods.
    2. This paper is well-written and well-organized.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed method lacks novelty. The strategy of treating the boundary and core of the target with different supervision signals is commonly employed in this field. Furthermore, the use of a pretext task within a diffusion framework has been extensively explored in prior research.
    2. The ablation studies suggest that model P* yields stronger results compared to model P. However, the authors have not provided a reason for choosing P* over P.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Refer to weaknesses.
    2. In Figure 2, could you please elaborate on how the use of reversed module connection structures for the boundary and core conditioners enhances the model’s performance?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the limited novelty and the unconvincing evidence of the proposed module’s effectiveness, “Weak Reject” is suggested.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors addressed most of my concerns. So I raised the score to “5. Accept”.



Review #3

  • Please describe the contribution of the paper

    The authors proposed a Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    overall a good paper, novel method, easy to read and follow, valid experiments

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    authors could provide more motivation of their method design

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    overall a good paper, novel method, easy to read and follow, valid experiments

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    overall a good paper, novel method, easy to read and follow, valid experiments

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper
    • A framework named CriDiff is proposed based on diffusion models targeting the semantic segmentation of prostate in MRIs/ Ultrasounds on publicly available datasets.
    • The proposed framework is equipped with two novel conditioners which account for edge and non-edge (core) prostate features, namely: Boundary Enhance Conditioner (BEC) and Core Enhance Conditioner (CEC).
    • A cross-attention mechanism is applied to facilitate the fusion of boundary and core features before integration into the diffusion model.
    • The model is trained following a two stage procedure. First, the diffusion model undergoes unconditional training to generate synthetic prostate images. Second, the pre-trained model is fine-tuned in conjunction with the novel conditioner blocks and real prostate images to generate the corresponding semantic segmentation masks.
    • When using the synthetic generated images with real samples, the authors show that the semantic segmentation results improve.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors demonstrated the advantages of incorporating prior information into the model through the integration of boundary and core blocks as supplementary conditioning information during the training of diffusion models for semantic segmentation of prostates in MRIs and ultrasounds.
    • The authors designed two novel conditioners Boundary Enhance Conditioner (BEC) and Core Enhance Conditioner (CEC) to provide complementary information about the boundary and core information of prostate, supplementing the conditioning information based on real prostate images.
    • Quantitative results and ablation studies show that by using the generated images as additional source of training samples enhance the performance of prostate segmentation models. -
    • Qualitative results depicting both the generated segmentation masks and prostate images show the benefits of the model in capturing anatomical information of the prostate.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Despite the statistically validated results, it would have been interesting to validate the results by clinicians. This could provide valuable insights into the practical applicability of the proposed methodology.
    • While the reported metrics (Dice, IoU, HSD, ASD) are indicative of improvements in segmentation results using the generated synthetic samples, reporting additional metrics focusing on synthetic data diversity or fidelity, such as FID or KID, would offer a more comprehensive evaluation.
    • It seems that some equations need to be reviewed: Section 2.1 appears to lack the representation of p_theta(I_{T}). Additionally, Equation #5 does not incorporate the Dice loss,
    • The captions accompanying the Figures appear to lack completeness, with Figure 1’s caption lacking sufficient descriptive detail. Similar concerns are raised regarding the caption for Figure 2.
    • Figure 1 can be improved to make it even more clear that the prostate images are used as conditioning information within the Criss-Cross Injection Strategy.
    • The notation of equations 2, 3, 4, and 6 can be improved to facilitate better comprehension and interpretation of the mathematical expressions.
    • Finally, an exploration of potential future avenues for addressing current limitations would enrich the discussion in Section 4.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • The method and training setting is well described. The release of the paper’s source code could be beneficial to the research community.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The paper shows interesting results regarding the benefits of using synthetic generated samples for improving segmentation tasks. Nevertheless, an assessment of the fidelity and heterogeneity of the generated samples could be further enhanced by incorporating additional metrics, such as FID and KID.
    • Enhancement of the captions accompanying the Figures is recommended to provide a more comprehensive understanding of the architectural intricacies depicted.
    • Refinements to the notation of the equations can facilitate a clearer comprehension of the proposed modules.
    • The notion of the gradient I’ can be improved.
    • Future Work and current limitations are not mentioned on the Conclusion part.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The method holds a certain level of technical contribution and its validity is thoroughly established through different experimental results on different prostate datasets.
    • The presented results are convincing. In addition, the model relies on novel generative networks, such as diffusion-based approaches which are on par with the SOTA.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks to reviewers for providing insightful comments toward improving our paper. We will make a point-by-point response to all comments. To R1: About the novelty: The mentioned strategy with different supervision signals is not listed as our contribution. Our primary contribution lies in the Criss-cross Injection Strategy(CIS) with BEC and CEC, enabling the diffusion model to utilize multi-level features of the boundary and core areas. In terms of the pre-training method, we apologize for less clear description of it. Different from other pre-training methods, our method uses the results from the second stage as weighted coefficients for the GP loss, making the model focus on the areas that are more challenging in the second stage. This interactive pre-training not only enhances the fidelity of the generated images but also narrows the domain gap between the image conditioning network and the diffusion backbone. We will add more details about our GP. Choose P* over P: There seems to be a misunderstanding. We choose the structure of P over P.P is our baseline by only injecting prostate features. P is stronger because it injects more features, such as prostate, core, and boundary features generated from a simple FPN. P* validates the effectiveness of boundary and core features injection. Therefore, we design more powerful BEC and CEC rather than FPN ((5) in Tab.2), which on average reduces ASD by 16%. The reversed structure of BEC and CEC: BEC and CEC both remain a top-down structure. BEC stacks more shallow convolutions, focusing on learning edge texture features. Conversely, CEC focuses on learning semantic features by using deeper convolutions. To R3: More validation and evaluation: Thank you for your valuable comments, this work is supported by a local hospital in which a set of prostate images for validation is being under ethical assessment. We also expect to extend our method for practical uses. The FID of our GP in four datasets is 4.25, 6.73, 5.21 and 8.63, respectively. As per your advice, we will add these metrics in the final version. About figures: As per your advice, we will provide more details for Fig. 1 and 2 and revise Fig.1 in more clear way. Future work and limitations: As per your advice, we will shift the future work and limitations from the supplementary to the conclusion in the final version. About the minor issues: We will carefully correct them. To R4: Thank you for your insightful comments. In medical diagnosis, aggregating interpretations from multiple experts reduces the frequency of misdiagnosis. Therefore, we propose a multi-guided diffusion model including the guidance of boundary, core and generative pre-training to simulate multi-expert predictions. Our model not only enhances multi-level feature utilization but also narrows the domain gap between the image conditioning network and the diffusion backbone. To R5: The motivation of GP: Please refer to the response (novelty) in R1. The motivation of BEC and CEC: Most diffusion-based segmentation methods overlook boundary information and treat all regions equally. Directly predicting boundary maps by hard labels ignores the context of surrounding pixels. In our method, we decouple the original label into soft boundary labels and core labels. This allows BEC and CEC to learn contextual coherence between boundary and core regions, helping the diffusion model better capture the shape and location of target objects. Injection into the diffusion model: The outputs of independent BEC and CEC are deterministic, tending to be overly confident even on wrong predictions. This does not fit the uncertainty estimates required by clinical diagnosis. Feeding extracted feature is to incorporate multi-level feature utilization into diffusion model. Unfair experiment setting: We would like to clarify that all diffusion-based methods are conducted in 25 ensemble runs as described in Section 3.2. About the minor issues: We will correct them as per your advice.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    the authors made rebuttal successfully.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    the authors made rebuttal successfully.



back to top