Abstract

The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection framework (SPSS) for BSS. Specifically, SPSS comprises two main components: 1) self-paced uncertainty sample selection (SU) for explicitly improving the quality of pseudo labels in the image space, and 2) self-paced bidirectional feature contrastive learning (SC) for implicitly improving the quality of pseudo labels through enhancing the separability between class semantics in the feature space. Both SU and SC are trained collaboratively in a self-paced learning manner, ensuring that SPSS can learn from high-quality pseudo labels for BSS. Extensive experiments on two public medical image segmentation datasets demonstrate the effectiveness and superiority of SPSS over the state-of-the-art.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3156_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3156_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Su_SelfPaced_MICCAI2024,
        author = { Su, Junming and Shen, Zhiqiang and Cao, Peng and Yang, Jinzhu and Zaiane, Osmar R.},
        title = { { Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a sample selection framework for barely supervised 3D medical image segmentation. The proposed framework exploits uncertainty estimation in a self-paced learning curriculum and a contrastive learning scheme to improve the quality of pseudo-labels in the context of a registration-segmentation paradigm.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper presents uncertainty-based, self-paced learning as an approach to handle noisy pseudo labels in a barely supervised regime
    • Experimental comparison to several models from prior work
    • Experimental results show that the proposed method reaches slightly better performance when compared to prior work
    • Ablation results showcase the contribution of each component of the proposed model
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Methodology section is incomplete and unclear. For example: (1) the task is not properly described (what is the desired output of the model?), (2) Section 2.1 starts by commenting on the drawbacks of registration-segmentation approaches without formally introducing them in the context of the problem formulation (which is critical since the whole framework is based on it), (3) the registration pseudo labels, segmentation pseudo labels, the teacher and student models and their predictions are never formally described, (4) mentions of “weakly-perturbed” and “strongly-perturbed” images without proper description, to name a few.
    • The novelty is limited as it is the application of previously existing techniques to a new problem
    • Some statements are misleading. Describing “during self-paced learning only a course of difficulty and an age parameter $\lambda$ are required” is misleading, since additional hyperparameters are introduced such as the temperature coefficient, the warm-up function, and the self-paced weight.
    • Writing clarity and formality should be improved. The contribution list starts with “considering the issue of the registration-segmentation paradigm…”.
    • It does not evaluate the performance of the proposed method for different levels of annotated data. This is a relevant experiment since the amount of annotated data is at the essence of the problem addressed in the paper
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The method description is unclear and incomplete, therefore it would be difficult for readers to reproduce the model and framework.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Framework seems to be beneficial for the proposed task, given the improvement in performance metrics with respect to prior work. However, its description and evaluation (particularly ablation with respect to the amount of annotated data) would significantly improve the quality of the submission.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed framework addresses an interesting problem and seems to yield good results. However, a proper, complete description of the method is imperative for publication. Moreover, additional evaluation regarding the sensitivity with respect to the number of annotated samples would significantly strengthen the submission.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I decided to maintain my score because some of my concerns were not cleared in the rebuttal, and therefore I am not confident in this paper being ready for publication.



Review #2

  • Please describe the contribution of the paper

    This paper presents an approach for barely-supervised volumetric segmentation which is motivated from self-paced learning. It employs the self-distillation framework to train a segmentation model which is guided by pseudo labels generated by a registration module. Here only those voxels are considered for calculation of loss which are confidently segmented by the model. Confidence (uncertainty) estimation is done in self-paced manner. The approach also employs contrastive loss to increase discriminative ability of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of self-paced voxel selection in a self-distillation segmentation framework is interesting.
    • The presented approach is able to utilize unlabelled images along with barely-labelled ones for learning.
    • Paper is easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Main weakness of the paper is superficial treatment of certain components e.g. in section 2.4 it is mentioned that the ground truth for the supervised loss are obtained using weighted fusion of the registration pseudo labels and the segmentation pseudo labels, however the weighting scheme is not described. Similarly in equation 4 cosine similarity is calculated between z^w1, z^w2 which appear to be scalers.
    • Implementation details about the existing methods considered for comparison are missing. It is unclear if the numbers reported for those methods are their best numbers or obtained after a fix amount of training.
    • The ablation study does not discuss the impact of the values set for the age parameter which governs the self-paced voxel selection.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The weighting scheme details for weighted fusion of the registration pseudo labels and the segmentation pseudo labels is missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The ablation study should more comprehensive while considering the impact of choices such as age parameter.

    • The results shown in Introduction section can be removed to create space for including implementation details of the existing approaches.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has merit however missing details about the results obtained from the existing approaches raises concerns on the experimental results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Some of my concerns are addressed in the rebuttal while many things are promised for appendix or revised version. I believe that the paper has merit and keep my rating unchanged.



Review #3

  • Please describe the contribution of the paper

    Authors propose a 2-step method to improve segmentation model training when annoatations are very scarce. They learn using pseudo-labels from image-registration, and predictions from a student-teacher model. The steps are (1) psuedo-label selection based on prediction uncertainty (Monte Carlo Dropout) (2) separation of class-specific features by contrastive loss. While the steps are not novel, Authors propose applying them with a ‘self-pacing’ routine. Rather than manually setting a threshold for high-quality predictions, Authors employ a weighting based on the ratio of the unlabelled loss to training epoch. The weight determines the top fraction of pseudo-labels used in learning. The downsampled mask is also used to select class-specific features for contrastive learning in the student model. Contrastive samples are created by applying image augmentations. Authors show self-paced sample selection (SPSS) and feature separation allow the model to continue improving for longer.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Author’s apply a novel training regime to alleviate the problem of extremely scarce segmentation annotations. Authors have defined the use-case for their work well, and provided source code to give other a head-start in implementing the method in their own work. Authors perform a useful ablation study to demonstrate the benefit from each of their proposed sample selection and feature separation steps.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks clarity on the constrcution of positive and negative samples that are used in the feature-level contrastive loss. Author’s also should make note of other barely-supervised medical image segmentation methods.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The source code was bundled with the supplementary material. However, the reader should not have to resort to the code to learn implementation details such as model architecture.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Authors refer only to Parasitic-like Network (PLN) as the SOTA barely-supervised segmentation method, however other methods like Compete-to-Win (ComWin) have also shown promise in tackling this issue in the medical imaging domain. ComWin also relies on an uncertainty prediction from multiple networks (rather than using Dropout as in the Author’s work). Can Author’s comment on the efficay of their method in comparison with other BSS methods, and whether their method could be generalised to be applicable to other BSS methods.

    2. The positive and negative features selected for the contrastive loss are generated by applying weak and strong ‘perturbations’. Authors define their “strong” perturbations as the result of applying CutMix on the weakly perturbed images. It would be more appropriate to call this an augmentation rather than a mere ‘perturbation’. How did the authors decide on this operation, rather than any other linear or non-linear augmnetation strategy? Are there hyperparameters associated with this augmentation, or is the patch-size fixed? There is also no discussion of what ‘weak’ perturbations are applied to the images. Does the strength of the augmentation contribute at all to the success or failure of their self-paced methodology? A more thorough explaination of this step is important given that it is the basis on which their contrastive samples are selected.

    3. Authors show results comparing models in a Barely-Supervised regime. However we see only results for a consistent number of annotated slices (16 in one experiment, and 38 in another). Given that the Author’s have built their method around the barely-supervised regime, it would be useful to know when the method breaks, or becomes comprable with other methods. Could Authors comment on the performance of SPSS compared with other methods when (1) the number of annotations increases, and (2) how few annotations are needed for SPSS to continue to outperform other methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of self-paced sample selection is good, and clearly applicable to the problem the Authors have described. Their result appear to demonstrate its efficacy, however more detail on it’s limitations would be useful.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Independent of rebuttal, the Authors’ work demonstrates a novel approach to this issue with interesting results. Rebuttal emphasizes these claims and justifies their work.




Author Feedback

We thank all the reviewers for their constructive comments. They found our work is novel (R1, R5), well-organized (R1, R5) and effective (R1, R4, R5).

Problem formulation and contributions (R1, R4, R5) The state-of-the-art BSS approach, i.e., PLN, builds upon the registration-segmentation paradigm, which includes a registration module and a teacher-student segmentation model: For a volumetric image with a single-labeled slice, 1) the registration module transfers the label of the annotated slice to its two adjacent slices iteratively to predict the registration pseudo label Y_r; 2) the teacher model predicts a pseudo segmentation label Y_s; 3) the student model is trained by the weighted pseudo label Y = w * Y_s + (1 – w) * Y_r with w gradually increasing to 1 by a time-dependent Gaussian function. 1 New Problem. We pinpoint the issue of the registration-segmentation paradigm in existing BSS methods: the noisy pseudo labels caused by registration seriously degrading the training of the segmentation model. 2 New Method. We explore a new registration-segmentation-selection paradigm for BSS. Based on the weighted pseudo labels, we propose Self-Paced Sample Screening (SPSS) mechanism for selecting pixels with high-quality pseudo labels in both the image and feature spaces. Additionally, we introduce a new contrastive learning sample construction strategy based on strong-weak augmentation to increase the diversity of positive-negative sample pairs, thereby enhancing the discriminative ability of the model. 3 New Findings. We validate the impact of pseudo label quality for BSS, and our self-paced selection strategy alleviates the noise issue caused by registration. Moreover, the results indicate that only 25% of to the training of the segmentation model (cf. Fig. 1(a)).

Compared to ComWin (R1) Our BSS setting, where each labeled image has only one labeled slice, is more challenging than the BSS setting in ComWin, which is, in fact, a semi-supervised regime where all slices of each labeled image are annotated.

Self-paced Uncertainty sample selection (SU) (R4) SU require two parts of hyperparameters: 1) the age parameter has been investigated in the supplementary, and 2) the self-paced course includes temperature coefficient, warm-up function. We set the temperature coefficient to 0.8; more detailed experiments will be provided in the appendix. Following PLN, we adopt the gaussian function as the warm-up function. The self-paced weight v is not a hyperparameter and is generated by v = (1 – L_u/λ).

Self-paced bidirectional feature Contrastive learning (SC) (R1, R4, R5) Weak augmentations include random image geometry and pixel transformations. We leverage CutMix as the strong augmentation due to its efficacy in image segmentation (CPS, CVPR 2021). We will explore the impact of different strong augmentations, e.g. MixUp and Cutout, in the appendix. Positive samples are defined as pixels with the same pseudo labels at corresponding positions in feature maps of two weakly-perturbed feature maps; negative samples are defined as pixels with different pseudo labels from the positive samples in the strongly perturbed feature map.

No. labeled slices (R1, R4) We have conducted experiments on LA and KiTS datasets with various No. barely-labeled 3D images (single-labeled slice) [5%, 10%, 20%, 50%, and 100%]. The performance increases with No. labeled images, and our method consistently outperforms the compared methods. Due to limited space, only the results for 20% barely-labeled data are reported in the manuscript. Furthermore, we will investigate different numbers of labeled slices per barely-labeled image in the appendix.

Details of the compared methods (R5) The compared methods were implemented using their official codes and following the optimal hyperparameters posited in their papers. We will specify the details in the final version.

Age parameter (R5) Ablation study on the age parameter has been reported in the appendix.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed the major concerns raised by the reviewers and promised to revise the paper accordingly. Positive reviews outweigh negative opinions.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed the major concerns raised by the reviewers and promised to revise the paper accordingly. Positive reviews outweigh negative opinions.



back to top