Abstract

Deep learning-based 3D medical image segmentation typically demands extensive densely labeled data. Yet, voxel-wise annotation is laborious and costly to obtain. Cross-annotation, which involves annotating only a few slices from different orientations, has recently become an attractive strategy for labeling 3D images. Compared to previous weak labeling methods like bounding boxes and scribbles, it can efficiently preserve the 3D object’s shape and precise boundaries. However, learning from such sparse supervision signals (aka. barely supervised learning (BSL)) still poses great challenges including less fine-grained object perception, less compact class features and inferior generalizability. To this end, we present a Multi-Faceted ConSistency (MF-ConS) learning framework for the BSL scenario. Our approach starts with an active cross-annotation strategy that requires only three orthogonal labeled slices per scan, optimizing the usage of limited annotation budget through a human-in-the-loop process. Building on the popular teacher-student model, MF-ConS is equipped with three types of consistency regularization to tackle the aforementioned challenges of BSL: (i) neighbor-informed object prediction consistency, which improves fine-grained object perception by encouraging the student model to infer complete segmentation from partial visual cues; (ii) non-parametric prototype-driven consistency for more discriminative and compact intra-class features; (iii) a stability constraint under mild perturbations to enhance model’s robustness. Our method is evaluated on the task of brain tumor segmentation from T2-FLAIR MRI and the promising results show the superiority of our approach over relevant state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0102_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wu_Few_MICCAI2024,
        author = { Wu, Xinyao and Xu, Zhe and Tong, Raymond Kai-yu},
        title = { { Few Slices Suffice: Multi-Faceted Consistency Learning with Active Cross-Annotation for Barely-supervised 3D Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a Multi-Faceted ConSistency (MF-ConS) learning framework for label efficient learning in 3D medical images, especially for sparse supervision signals. Three types of consistency regularization are used to tackle the challenges. Overall, the method is techniqually sound and results on the tumor segmentation benchmark can demonstrate the effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Well-written paper.
    2. The proposed methods is techniqually sound.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper proposes a Multi-Faceted ConSistency (MF-ConS) learning framework for label efficient learning in 3D medical images, especially for sparse supervision signals. Three types of consistency regularization are used to tackle the challenges. Overall, the method is techniqually sound and results on the tumor segmentation benchmark can demonstrate the effectiveness. However, there are still some issues need to be addressed and I would be very glad to improve my gradings if the authors can solve my raised problems. Comments:

    1. The techniques used in the proposed method have been well explored in the field of natural images, e.g., consistency regularization [1] and perturbation [2]. In addition, sparsely-annotated segmentation is also been explored in some recent works [3, 4]. It would be better if the authors could discuss or compare them in the revision, making a more thorough study.
    2. My biggest concern is that this paper only present the results of MRI brain tumor segmentation. Recent advances [5, 6] have demonstrated results on several different datasets. I think it would be better if the authors could add one or two more datasets in the experiments, and compare recent SOTA 3d medical image analysis methods [5, 6, 7]. References: [1] Querying Labeled for Unlabeled: Cross-Image Semantic Consistency Guided Semi-Supervised Semantic Segmentation. TPAMI 2023 [2] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation. CVPR 2023 [3] Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures. CVPR 2023 [4] Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation. Arxiv 2024 [5] VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. CVPR 2024 [6] Self-supervised pre-training of swin transformers for 3d medical image analysis. CVPR 2023 [7] nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Method 2021
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The code should be available upon acceptence.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Add more thorough discussions and comparisons with the mentioned works;
    2. Add one or two more datasets for evaluation. I would raise my gradings if the authors could solve these two problems.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the weakness. If the authors could solve my two raised problems in the rebuttal, I would be very glad to accept this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    The authors have solved all my problems.



Review #2

  • Please describe the contribution of the paper

    The paper presents a MF-ConS learning framework for barely supervised learning scenario with three orthogonal labeled slices per scan. The MF-ConS is built on teacher-student model and equipped with three types of consistency regularization to tackle three challenges.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Three types of consistency regularization are designed to tackle three challenges posed by existing sparse supervision methods to traditional SSL methods, i.e., less fine-grained object perception, less compact class features and inferior generalizability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The new method has only been evaluated on one dataset, which is not sufficient to demonstrate its superiority.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The open access of source code will be helpful.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    More segmentation tasks are suggested to evaluate new method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is weak accept. The three consistency regularizations have some innovation, but the experiments are not sufficient.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have solved my problems.



Review #3

  • Please describe the contribution of the paper

    To alleviate the annotation burden for 3D medical image segmentation, this paper introduces an active cross-annotation strategy, specifically, only requiring only 3 orthogonal labeled slices per scan and select the informative samples to label on-the-fly. Then, the paper features consistency learning and proposes a MF-ConS framework. The method is validated on the brain tumor segmentation and shows promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Paper presentation: The paper is well-written and the organization is clear.
    2. Practical setting: The typical semi-supervised setting randomly pre-selects the limited scans with dense labels, which may limit the diversity. Instead, the paper uses a cross-annotation strategy equipped with active learning, which helps better budget allocation while retaining boundary and inter-slice information.
    3. Well-supported design decisions: The components in the MS-ConF framework seem well-motivated (with three identified challenges) and supported by the ablation study. The visualization in Fig. 2 helps understand the multi-faced consistency.
    4. Results: the results are promising with significantly reduced human efforts.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have several questions on the papers:

    1. The masking probability transits from 0.25 to 0.5, why do you choose this design?
    2. The authors are suggested to revise the title to brain tumor segmentation as the current experiment is limited to this application.
    3. Why do you choose the design of only labeling three orthogonal slices per scan?
    4. Regarding the prototype-driven consistency, what are the differences between [23] and this paper?
    5. Why do authors only choose the entropy-based sampling as active selection?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weak comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper addresses an interesting setting which can scale up typical semi-supervised 3D medical image segmentation. The presentation and motivation are good.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The rebuttal response address the main concerns, so I kept the score.




Author Feedback

We are encouraged that reviewers find our work “well-written and clear organization” (R3,R4), “technically sound” (R4), “practical setting” (R3), “good motivation and well-supported design decisions” (R1,R3) and “promising performance” (R3,R4). Thanks for the constructive comments. Our responses to major concerns are as follows.

Q1: (R4) It would be better if the authors could discuss or compare [1,2,3,4] in the revision, making a more thorough study. A1: Thanks for suggesting these references. While our focus was more on methods specific to the medical field, we acknowledge the importance of discussing broader techniques. We will include and discuss these studies, noting how [1] utilizes labeled prototypes to rectify pseudo labels, [2] introduces feature perturbation, and [3,4] innovate with adaptive Gaussian mixtures for a new type of consistency.

Q2: (R1,R4) It would be better if the authors could add more datasets in the experiments; (R4) and compare recent SOTA 3D medical image analysis methods [5,6,7]. A2: (1) We agree that including more datasets can enhance the study. Due to MICCAI’s rules, we may not directly add dataset to the current version, we plan to include it in our extension. We further validated our method on a left atrium benchmark, following the data split in [8]. With only 24 cross-labeled scans (72 labeled slices) and 56 unlabeled scans, the Dice of Sup(cross)/CPS/AC-MT/CAML/UPCoL/DeSCO are 0.772/0.789/0.800/0.720/0.809/0.798. Our MF-ConS achieves 0.829 (+AL: 0.842), while the upper bound is 0.916. (2) [5,6] focus on self-supervised learning. While we will discuss them in the final version, including them into our current experimental design is challenging as they are beyond our scope. As for [7], while it is an excellent study, we currently ensure fair comparison by using the same backbone across all models.

Q3 (R3): (1) The masking probability transits from 0.25 to 0.5, why choose this design? (2) Why only label three orthogonal slices per scan? (3) For the prototype-driven consistency, what are the differences between [23] and this paper? (4) Why choose the entropy-based sampling as active selection? A3: (1) As the model’s perceptual capabilities become stronger during training, we gradually increase the difficulty on this NIOP consistency regularization. (2) As mentioned in Sec.2.1, while our study uses the baseline that labels one slice per plane, the strategy can be adapted to label additional slices per plane for more challenging tasks or when fewer training samples are available. (3) As mentioned in Sec.1, considering the scarcity of reliable labels in both sparsely-labeled data and unlabeled data for prototype generation, our strategy involves the fusion of prototypes derived from both data types to comprehensively represent the distribution of the feature space instead of separate usage in [23] with relatively adequate ground truth. (4) This active function has proven robust and effective to achieve informative volume identification. While our paper does not focus on new active functions, more advanced functions could potentially yield further benefits.

Other minor concerns will be carefully revised.

Ref: [1] Querying Labeled for Unlabeled: Cross-Image Semantic Consistency Guided Semi-Supervised Semantic Segmentation. TPAMI’23 [2] Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation. CVPR’23 [3] Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures. CVPR’23 [4] Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation. Arxiv’24 [5] VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis. CVPR’24 [6] Self-supervised pre-training of swin transformers for 3d medical image analysis. CVPR’23 [7] nnU-Net. Nature Method’21 [8] Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. MICCAI’19




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper’s proposed MF-ConS framework for barely supervised learning is interesting to the MICCAI community.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper’s proposed MF-ConS framework for barely supervised learning is interesting to the MICCAI community.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed the concerns raised by the reviewers, and all reviewers have now reached an agreement to accept the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed the concerns raised by the reviewers, and all reviewers have now reached an agreement to accept the paper.



back to top