Abstract

Semi-supervised learning (SSL) has achieved notable progress in medical image segmentation. To achieve effective SSL, a model needs to be able to efficiently learn from limited labeled data and effectively exploiting knowledge from abundant unlabeled data. Recent developments in visual foundation models, such as the Segment Anything Model (SAM), have demonstrated remarkable adaptability with improved sample efficiency. To harness the power of foundation models for application in SSL, we propose a cross prompting consistency method with segment anything model (CPC-SAM) for semi-supervised medical image segmentation. Our method employs SAM’s unique prompt design and innovates a cross-prompting strategy within a dual-branch framework to automatically generate prompts and supervisions across two decoder branches, enabling effectively learning from both scarce labeled and valuable unlabeled data. We further design a novel prompt consistency regularization, to reduce the prompt position sensitivity and to enhance the output invariance under different prompts. We validate our method on two medical image segmentation tasks. The extensive experiments with different labeled-data ratios and modalities demonstrate the superiority of our proposed method over the state-of-the-art SSL methods, with more than 9% Dice improvement on the breast cancer segmentation task.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0321_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/JuzhengMiao/CPC-SAM

Link to the Dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html https://scholar.cu.edu.eg/?q=afahmy/pages/dataset

BibTex

@InProceedings{Mia_Cross_MICCAI2024,
        author = { Miao, Juzheng and Chen, Cheng and Zhang, Keli and Chuai, Jie and Li, Quanzheng and Heng, Pheng-Ann},
        title = { { Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a new consistency regularisation and a new cross pseudo labelling, both on different prompts, with SAM for semi-supervised segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of combining the classical consistency regularisation on different prompts is interesting.
    2. The results look good that the proposed method outperformed a few previous popular methods, although some of the relevant state-of-the-art methods are missing but I think the experimental results are sufficient here.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In both the datasets BUSI and ACDC, the objects of interest are both regularly shaped and compact. This might imply that the proposed method has limitations on more sparse objects of interest (e.g. lung vessels from axial direction of CTs) or small objects. Maybe the authors can address my question in the rebuttal?
    2. The design of PCR is actually counterintuitive to me and the performance gain of that component does seem marginal (less than 1% according to Table 2 last row). I think it is counterintuitive because PCR seems to be restricting the variances, however, consistency regularisation intuitively benefits more from larger variances, as shown in some self supervised literatures [1]. Can the authors elaborate on the motivation of this design in the rebuttal? [1] On the importance of asymmetry for siamese representation learning (CVPR 2022)
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The descriptions of the ablation studies are not very clear, especially for the experiments included in Figure 3. Can the authors elaborate what that ablation study entails?
    2. The writing on PCR is a bit convoluted in section 2.3. The authors should improve it during the rebuttal.
    3. Figure 1 should also be improved to reflect how PCR works. Or maybe briefly summarise it in the captain, the current state is a bit misleading.
    4. Applying consistency on dual branch outputs which are derived from different augmented features has been popular, including the proposed prompt based approach. In the future version of this work (as MICCAI doesn’t allow extra results in rebuttal), the authors could improve more up-to-date baselines such as: [2] Learning morphological feature perturbations for calibrated semi-supervised segmentation (MIDL 2022) [3] Semi-supervised medical image segmentation via cross teaching between CNN and Transformer (MIDL 2022)
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The new consistency on different prompts is very interesting, it shows a possible direction to utilise the latest development of foundation models in semi-supervised segmentation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose an interesting cross prompting consistency with segment anything model, named CPC-SAM, to fulfill the semi-supervised medical image segmentation task. Specifically, the whole network follows the classical semi-supervised architecture CPS with two branches and the unprompted output from one branch is used to generate prompts for the other branch. The extensive experiments on two public clinical datasets for breast cancer segmentation and cardiac structure segmentation have validated its superiority compared to other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The topic is interesting, and and the paper is well written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main concern is the novelty of this paper where the cross pseudo supervision [1] has now become a widely-used semi-supervised framework and such cross regularizations are not original enough. Besides, considering the well-defined task and motivation of the paper, I did not find any other major weaknesses in the paper. [1] Chen, X., Yuan, Y., Zeng, G., & Wang, J. (2021). Semi-supervised semantic segmentation with cross pseudo supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2613-2622).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should better describe their special contributions compared to existing CPS or SAM methods.
    2. Analysis about computation burden is necessary for a comprehensive comparison.
    3. Available source code is beneficial for the related communities.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Interesting problem but the novelties are limited.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper titled “Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation” proposes a novel framework that employs prompt-based foundation models in a semi-supervised context. Specifically, the authors focus on the segmentation task and the SAM model. The proposed methodology utilizes a cross-prompting strategy within a student-teacher-like architecture, where the output of each model serves to guide the point prompt of the alternate model. Additionally, the authors introduce a new regularization term in the loss function to enhance the stability of the training process. The efficacy of the model is validated using two publicly available datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript provides extensive baseline comparison, ablation studies, and proper discussion on the result. The organization of the manuscript is proper, and the language is clear and scientific.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The claim of “significant” improvement compare to SOTA should have been supported by a statistical hypothesis test.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The assertion that the improvement is significant must be substantiated by a statistical significance test, which necessitates multiple runs with different random seeds or a cross-validation approach. Please provide further details.

    • There are medical-specific foundation models such as SAM-Med2D, which are fine-tuned on a substantial amount of labeled medical data. It would be beneficial to assess the effectiveness of your dual-branch strategy using these models instead of the original SAM. Please include a discussion on this topic in the paper (experimental validation is not required).

    • Please clarify how the “supervised” component of the semi-supervised approach is implemented. Currently, Figure 1 suggests a method akin to self-supervision, as the masks and prompts are derived from the model outputs each epoch. Please specify where and how the labeled portion of the training data is incorporated into the workflow.

    • In addition to the previous comment, it would be intriguing to examine the performance of the proposed two-branch model in a purely self-supervised setting where no labeled training data is used (this is a suggestion for future research, not for inclusion in the current paper).

    • Table 2 contains four rows; the heading should not be counted as the first row when referencing the table in the text.

    • To ensure your document adheres to the required standards, please review the template guidelines concerning the placement of figures and tables. It is important to verify whether having a figure and a table in the same row is permissible under these guidelines.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    please refer to parts 4-10

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the AC and reviewers for their time and valuable feedback. We would like to clarify several questions raised by the reviewers.

  1. The design of PCR is counterintuitive according to some self-supervised literature (R1). (1) Consistency with intuition. Intuitively, in semi-supervised segmentation tasks, the key is to generate reliable predictions for unlabeled data. However, due to SAM’s high sensitivity to prompt positions, prompts with similar semantic contexts but different positions may produce predictions with different qualities and reliabilities as indicated by [1,2]. Without ground truth for the unlabeled inputs, the prompts generated from unprompted outputs can be inherently unreliable and noisy. The prompted output is thus likely to be unreliable as well, failing to provide meaningful guidance. To address this issue, we design a novel PCR strategy to reduce SAM’s sensitivity to prompt positions and enhance the invariance of the output, thus ensuring more reliable and stable results. (2) Consistency with the self-supervised literature. As stated in the reference mentioned by the reviewer, representation learning benefits from a lower variance in target encodings and a higher variance in source encodings. Our proposed PCR method is similar to providing a lower variance from the target aspect since we aim to provide a more stable and reliable learning target, which is expected to be able to improve performance. In response to the constructive suggestions of R1, we will improve the writing of related text such as the introduction section, section 2.3, and the caption of Fig. 1 in our final version to clarify the motivation of this design and how it works. Additionally, we will provide detailed elaboration on our ablation studies to enhance the comprehension of the effectiveness of each component in our proposed method. [1] Samaug: Point prompt augmentation for segment anything model. arXiv’23 [2] Desam: Decoupling segment anything model for generalizable medical image segmentation. arXiv’23

  2. Effectiveness on medical-specific foundation models such as SAM-Med2D (R3). Our proposed dual-branch strategy can be easily extended to medical-specific foundation models like SAM-Med2D without any modifications. With more domain knowledge, we anticipate an enhanced performance in semi-supervised segmentation. Also, we notice that the generalization performance of SAM-Med2D remains limited as indicated in its original paper, especially when using a single-point prompt. Our proposed method has the potential to serve as a label-efficient way to assist SAM-Med2D in adapting well to a new dataset with minimal labeled data and a plentiful supply of easily obtainable unlabeled data. Moreover, our proposed method enables automatic segmentation and eliminates the need for expert prompts during inference, which cannot be achieved using SAM-Med2D. In the meanwhile, the promptable nature of these models is fully utilized in our training process. We will include this discussion in our final version.

  3. Clarification of the supervised part (R3). Thanks for the valuable suggestion. In Fig. 1, we only illustrate the use of unlabeled data. The use of labeled images is depicted in Equ. 3, where we use the annotations to supervise all the prompted outputs and unprompted outputs for the labeled data. Notably, our proposed cross-prompting and PCR strategies are not applied to the labeled data. We will improve Fig. 1 or its caption in our final version.

  4. Open access to the source code (R4). We will release the source code upon acceptance as we have mentioned at the end of the abstract in our original manuscript.




Meta-Review

Meta-review not available, early accepted paper.



back to top