Abstract

Semi-supervised learning (SSL) has significantly advanced 3D medical image segmentation by effectively reducing the need for laborious dense labeling from radiologists. Traditionally focused on \textit{model-centric} advancements, we anticipate that the SSL landscape will shift due to the emergence of open-source generalist foundation models, e.g., Segment Anything Model (SAM). These generalists have shown remarkable zero-shot segmentation capabilities with manual prompts, allowing a promising \textit{data-centric} perspective for future SSL, particularly in pseudo and expert labeling strategies for enhancing the data pool. To this end, we propose the Foundation Model-driven Active Barely Supervised (FM-ABS) learning paradigm for developing customized 3D specialist segmentation models with shoestring annotation budgets, i.e., merely labeling three slices per scan. Specifically, building upon the basic mean-teacher framework, FM-ABS accounts for the intrinsic characteristics of 3D imaging and modernizes the SSL paradigm with two key data-centric designs: (i) specialist-generalist collaboration where the in-training specialist model delivers class-specific prompts to interact with the frozen class-agnostic generalist model across multiple views to acquire noisy-yet-effective pseudo labels, and (ii) expert-model collaboration that advocates active cross-labeling with notably low annotation efforts to progressively provide the specialist model with informative and efficient supervision in a human-in-the-loop manner, which benefits the automatic object-specific prompt generation in turn. Extensive experiments on two benchmark datasets show the promising results of our approach over recent SSL methods under extremely limited (barely) labeling budgets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0050_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xu_FMABS_MICCAI2024,
        author = { Xu, Zhe and Chen, Cheng and Lu, Donghuan and Sun, Jinghan and Wei, Dong and Zheng, Yefeng and Li, Quanzheng and Tong, Raymond Kai-yu},
        title = { { FM-ABS: Promptable Foundation Model Drives Active Barely Supervised Learning for 3D Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a SAM-based semi-supervised learning method with two data-centric designs: specialist-generalist collaboration to interact with SAM and expert-model collaboration to provide the specialist with informative and efficient supervision.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Expert-model collaboration with an active function, including least confidence, classical highest entropy and highest entropy ratio, to grow the cross-labeled set.
    • Superior performance on two benchmarks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The comparison seems not fair. Authors should evaluate the proposed method with SAM-based ones such as [1] and [2] [1] Li, Ning, et al. “Segment anything model for semi-supervised medical image segmentation via selecting reliable pseudo-labels.” International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2023. [2] Zhang, Yichi, Yuan Cheng, and Yuan Qi. “Semisam: Exploring sam for enhancing semi-supervised medical image segmentation with extremely limited annotations.” arXiv preprint arXiv:2312.06316 (2023).
    • The semi-supervised framework is quite simple. Authors can adapt more recent scheme on this area such as [3] [3] Pham, Hieu, et al. “Meta pseudo labels.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Authors should compare with SAM-based method such as [1] and [2] [1] Li, Ning, et al. “Segment anything model for semi-supervised medical image segmentation via selecting reliable pseudo-labels.” International Conference on Neural Information Processing. Singapore: Springer Nature Singapore, 2023. [2] Zhang, Yichi, Yuan Cheng, and Yuan Qi. “Semisam: Exploring sam for enhancing semi-supervised medical image segmentation with extremely limited annotations.” arXiv preprint arXiv:2312.06316 (2023).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The paper employs the original mean-teacher method without any novelty
    • The comparison seems not fair
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have address all concerned raised previously. The contributions of the paper are solid and clear now.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a Foundation Model-driven Active Barely Supervised (FM-ABS) learning paradigm inspired by SAM for developing customized 3D specialist segmentation models. The proposed method was evaluated on two publicly available datatsets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper reads well, and the motivation behind the study is clear. The proposed active learning method for semi-supervised learning with SAM is novel. Promising empirical performance against a few other algorithms.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors compare the proposed method with mostly the mean teacher and a few other algorithms. They miss many state-of-the-art methods in semi-supervised learning (e.g., SASSNet, MC-Net/+, Co-BioNet). Since this work is inspired by SAM, I think it’s fair to compare your method against SAM and other SAM-inspired works in the medical domain, such as SAMMed 2D, SAMMed 3D, SAM3D, and MedSAM without active learning components. The authors mentioned that they outperformed supervised baselines, but it seems that their supervised baseline is the proposed work trained with 100% labeled data. The authors missed nnUNet and other supervised models during their evaluation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The manuscript is missing class-wise performance on the Brain tumor segmentation task (TC,ET, WT) and details on which tumor class’s performance has been reported.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Fig. 2 presents a discrepancy that requires clarification. It appears to depict the same ground truth for all three axes, which is puzzling. Pseudo-masks from three views should be distinct, yet all three appear remarkably similar, as shown in the Figure.

    Few questions for authors: How exactly does the fusion take place? Why did you choose 2D generalist models over 3D generalist models? The authors mentioned that the 2D generalist model has low inference efficiency for 3D scans. Will this work for multi-class segmentation? For example, tumor classes of Brain tumor. If not, how can you extend your method for multi-class segmentation?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Even though the paper proposes a novel approach, it lacks evaluations with state-of-the-art methods. The authors could make this paper stronger by designing other baselines that are exposed to the same amount of data and information.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a 3D medical image segmentation method, Foundation Model-driven Active Barely Supervised (FM-ABS), focusing on applying Segment Anything Models (SAM) to the semi-supervised learning paradigm. FM-ABS includes two data-centric strategies: 1) Specialist-Generalist Collaboration and 2) Expert-Model Efficient Collaboration, to respectively incorporate knowledge from general models and human experts. The authors conducted extensive experiments on two public medical image segmentation datasets, i.e., Left Atrial (LA) and Brain Tumor Segmentation (BraTS), to evaluate FM-ABS and demonstrate its superior performance compared to other approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of leveraging knowledge from both SAM and experts in the training of semi-supervised segmentation models is promising. The proposed method was extensively evaluated, and the results are convincing. Additionally, the sensitivity of design choices has also been analyzed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The relationship between the two data-centric schemes (i.e., Specialist-Generalist and Expert-Model) is unclear: are they independent or interactive? Please explain. Fig. 3 is visually unclear. It is recommended to improve the visual quality of Fig. 3 with vector graphics.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is recommended to make the code publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weaknesses mentioned above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of combining knowledge from both SAM and experts in the training of semi-supervised segmentation models is new overall, despite potential confusion in the interaction between the two paradigms (i.e., Specialist-Generalist and Expert-Model).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We are glad that reviewers find our work “well-written” (R1), “clear motivation” (R1), “novel and promising method” (All), “extensive and convincing evaluation with sensitivity analysis” (R3) and “promising performance” (All). The only negative recommendation came from R4 (weak reject). Thanks for the constructive comments. Our responses to major concerns are as follows.

Q1 (R1): (1) Add more SOTA semi-supervised methods like SASSNet, MC-Net and Co-BioNet. (2) Comparison with SAM and variants w/o active learning components. (3) The presented supervised baselines are trained with 100% labeled data? and lack of other supervised models? A1: (1) We plan to add these baselines in future extension. Notably, CAML (MICCAI’23) [7], as shown in Table 1, is an improved version of MC-Net, and ACMT (MIA’23) [22] also set recent SSL SOTA on our used datasets. (2) Direct comparison with SAM may be confusing as it requires manual prompting, whereas our goal is to develop a fully automated specialist. We have included several SAM variants to analyze our framework’s sensitivity as shown in Fig.3 (c). (3) Training with 100% labeled data is the upper bound. The baselines are supervised models trained with limited cross-labeled data. We keep consistent backbone and training protocols with previous works to ensure fairness.

Q2 (R1): (1) Pseudo mask from 3 views should be distinct, yet similar in Fig.2. (2) Why 2D generalist models rather than 3D generalist models? (3) Applicability to multi-class segmentation? A2: (1) Sorry for any confusion. Fig.2 shows different labels of the same slice for clear comparison, and we intended to say “pseudo masks generated via prompting from the three views“. We will revise the caption of Fig.2. (2) We chose 2D generalists for their robustness and prevalence in the AI industry, coupled with the flexibility in model choices. Besides, automatically generating precise 3D prompts with on-the-fly coarse masks is more challenging. (3) Yes. For now, we follow recent SSL work [22] that segments WT only. We consider multi-class tasks for future work, where our methods, including SAM, can accommodate prompts with predefined class numbers.

Q3 (R3): (1) The relationship between the two data-centric schemes, i.e., independent or interactive? A3: They interact cooperatively. The expert-model collaboration enriches the model with more expert supervision within the specialist-generalist collaboration, providing better object-specific prompts for higher-quality generalist-based labels. An improved model, in turn, delivers more precise informativeness signals for active selection.

Q4 (R4): Should compare with SAM-based method such as [r1] and [r2]. A4: Thanks for the recommendation. [r1] uses SAM to select reliable pseudo labels, while [r2] enforces consistency between point-prompted SAM outputs and student outputs. We reimplemented them and found that [r1, r2] only achieve {78.35%, 77.49%} Dice in our LA setting (60 labeled slices), which may be due to lack of high-quality label generation and calibration designs. We consider adding them in our future extension.

Q5 (R4): The paper employs the original mean-teacher (MT) method without any novelty. A5: As highlighted in the paper, our novelty lies in pioneering a shift towards more data-centric designs in future SSL, driven by the generalist foundation models. This shift reduces the emphasis on model-centric advancements (e.g., more advanced methods than MT). Excitingly, incorporating the simple MT model, we have already achieved promising results, better showing the efficacy of our two proposed data-centric designs.

Other minor concerns will be carefully revised. Code will also be released to help follow the details.

Ref: [r1] Segment anything model for semi-supervised medical image segmentation via selecting reliable pseudo-labels. ICONIP’23 [r2] SemiSAM: Exploring SAM for enhancing semi-supervised medical image segmentation with extremely limited annotations. arXiv’23




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers have agreed to accept this paper after rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers have agreed to accept this paper after rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After rebuttal, all three reviewers suggested Weak Accept.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    After rebuttal, all three reviewers suggested Weak Accept.



back to top