Abstract

Semi-supervised medical image segmentation (SSMIS) has been demonstrated the potential to mitigate the issue of limited medical labeled data. However, confirmation and cognitive biases may affect the prevalent teacher-student based SSMIS methods due to erroneous pseudo-labels. To tackle this challenge, we improve the mean teacher approach and propose the Students Discrepancy-Informed Correction Learning (SDCL) framework that includes two students and one nontrainable teacher, which utilizes the segmentation difference between the two students to guide the self-correcting learning. The essence of SDCL is to identify the areas of segmentation discrepancy as the potential bias areas, and then encourage the model to review the correct cognition and rectify their own biases in these areas. To facilitate the bias correction learning with continuous review and rectification, two correction loss functions are employed to minimize the correct segmentation voxel distance and maximize the erroneous segmentation voxel entropy. We conducted experiments on three public medical image datasets: two 3D datasets (CT and MRI) and one 2D dataset (MRI). The results show that our SDCL surpasses the current State-of-the-Art (SOTA) methods by 2.57%, 3.04%, and 2.34% in the Dice score on the Pancreas, LA, and ACDC datasets, respectively. In addition, the accuracy of our method is very close to the fully supervised method on the ACDC dataset, and even exceeds the fully supervised method on the Pancreas and LA dataset.(Code available at https://github.com/pascalcpp/SDCL).

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0821_paper.pdf

SharedIt Link: https://rdcu.be/dZxej

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_53

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0821_supp.pdf

Link to the Code Repository

https://github.com/pascalcpp/SDCL

Link to the Dataset(s)

Pancreas dataset: https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT Left atrium dataset: http://atriaseg2018.cardiacatlas.org ACDC dataset: https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html

BibTex

@InProceedings{Son_SDCL_MICCAI2024,
        author = { Song, Bentao and Wang, Qingfeng},
        title = { { SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {567 -- 577}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a new framework based on the classical mean teacher framework for semi-supervised medical image segmentation. The proposed framework has one teacher network and two student networks. Two correction loss functions are employed to minimize the correct segmentation voxel distance and maximize the erroneous segmentation voxel entropy. Promising results have been achieved on three publicly available datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The results are very promising. The proposed semi-supervised method can even surpass the fully supervised counterparts.
2. The design of the two loss functions is reasonable.
3. Three datasets were employed to validate the effectiveness of the proposed framework.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The framework has two student networks, which definitely increases the complexity.
2. All three datasets utilized are for organ segmentation. There is only one large target in each input image. The effectiveness of the proposed method for multiple small targets is questionable.
3. There are many hyper-parameters involved. Although in the supplementary file, the authors show that gamma and mu only slightly affect the results. How about alpha and beta?
4. The authors kept stating that diversity (using different structures for the two students) is important. However, no experiments were conducted to verify this point.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Do you have any additional comments regarding the paper’s reproducibility?

No.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. The authors should include the complexity metrics (FLOPS and parameter counts) of different methods when reporting the results.
2. More experiments should be supplemented to properly validate the proposed framework. For example, ablation studies with different alpha and beta values and using the same structures for the two students.
3. The suitability of the proposed framework for segmentation tasks other than organ segmentation (e.g., brain tissue, brain tumor, etc.) should be discussed.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the results of the proposed framework are very promising. The loss designs are reasonable. With additional validations and results, the paper can be accepted.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The manuscript presents a significant contribution to semi-supervised medical image segmentation with the introduction of the Students Discrepancy-Informed Correction Learning (SDCL) framework, which effectively addresses the issue of biased pseudo-labels by utilizing student discrepancies for self-correction. The proposed method demonstrates substantial improvements over State-of-the-Art techniques and approaches the accuracy of fully supervised models, offering a promising solution for applications where labeled data is limited.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The method design with prediction bias areas is based on the property of SSL, so that it is well motivated. 2) The experiments conducted are very thorough, with ten comparative methods and three public datasets, which can demonstrate the effectiveness of the proposed method. 3) The authors propose a method with multiple student networks and use different student networks to promote diversity. 4) The authors have designed an innovative method to optimize bias correction learning, which effectively reviews correct cognition and rectifies error biases by examining the discrepancies in the predictions from two distinct student networks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1） The related work should be more thoroughly reviewed, for instance, the application of this method with different architectures has also been explored in the semi-supervised domain, as seen in [1]. It is necessary to clarify how the current approach differs from these existing works. 2） This paper is based on BCP; however, it would be beneficial to conduct an ablation study without BCP, since the core of this work is Discrepancy-Informed Correction, which is independent of BCP. 3） The reason why the method proposed in this paper can outperform fully supervised approaches needs to be explained, and moreover, it has demonstrated a significant improvement of over 2 points on the Pancreas-CT dataset. 4） The text in the formulas within Figure 1 is too small and needs to be adjusted for better formatting.

[1] Luo, Xiangde, et al. “Semi-supervised medical image segmentation via cross teaching between cnn and transformer.” International conference on medical imaging with deep learning. PMLR, 2022.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The overall framework is relatively novel, and the experiments are very comprehensive, but there are some issues that need to be addressed, such as the related literature should be better reviewed, and the ablation study could be more thorough.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Weak accept. The overall framework is relatively novel, and the experiments are very comprehensive.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The proposed SSL method called SDCL aims to tackle confirmation and cognitive biases that affect mean-teacher-based approaches. This framework includes two students and one non-trainable teacher to guide the self-correcting learning via the segmentation difference between the two students. In addition, two correction loss functions are used to minimize the correct segmentation voxel distance and maximize the errorous segmentation voxel entropy.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. SDCL addresses confirmation and cognitive biases through two student networks and two correction loss functions
2. The experiment was reasonable and sufficient.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

For the mean-teacher-based methods updated by EMA, one of the reasons for the erroneous pseudo-labels is since the teacher network accrues biase from the student network. BCP has already addressed this issue by reducing the distribution gap between labeled and unlabeled data by performing a localized copy-and-paste on them. It may reduce the biases that occur in the student network. Is SDCL further improved?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Please provide more details for the questions mentioned in limitations (section 6).
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is generally understandable and the research topic is valuable, and the experimental results are convincing. But its technicalities still need to be debated.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

Thank meta reviewer for the early accept decision. We are deeply thankful to the reviewers for their valuable feedback and constructive criticism. Their positive assessments of our paper as reasonable (R1), well motivated (R3), and Very Good (R4) are truly encouraging. We are pleased that our approach is recognized as very promising (R1), experiments are very comprehensive (R3), valuable (R4). We now present detailed responses addressing the main concerns raised. Q1: Should include the complexity metrics of different methods. (R1) A1: Due to space limitations, we could not discuss the complexity metrics in detail. We will evaluate the model from a more comprehensive perspective in future work. Q2: Ablation studies with different alpha and beta values. Using the same structures for the two students for experiments and discuss diversity. (R1) A2: These two parameters are thoroughly discussed in BCP, so we did not repeat the study. Furthermore, regarding model diversity, current semi-supervised methods typically rely on consistency learning under various perturbations. Network perturbations usually involve different architectures or initializations to build the network. Here, we used different model structures, essentially representing different perturbations. Our approach focuses on correction learning based on prediction differences. Q3: Discuss the performance on other segmentation tasks such as multiple small target segmentation tasks (R1) A3: Evaluating our framework on tasks such as brain tumor segmentation and other small target segmentation tasks is highly valuable. However, due to space limitations, we chose mainstream semi-supervised task datasets for evaluation. We will extend our framework to more diverse datasets in future work. Q4: Review related work, such as [1], and conduct ablation studies without BCP. (R3) A4: We will add the discussion of [1] in the final version. [1] uses differences in learning paradigms of different network structures to achieve perturbation effects, followed by consistency learning through cross teaching. Our approach is based on correction learning from students prediction differences. Due to space and conference limitations, we will improve our framework in future work, demonstrating its effectiveness under the Mean Teacher framework and in combination with other methods. Q5: Reasons for outperforming fully supervised methods and issues regarding Figure 1. (R3) A5: We believe that reducing empirical distribution gap is crucial for semi-supervised learning. However, the inherent bias in BCP can severely affect performance, so we chose BCP to demonstrate our framework’s performance. By effectively utilizing unlabeled data combined with correction learning, we achieve superior performance. When drawing the figure, we structured it to help readers better understand. Increasing the size of the formula text might significantly alter Figure 1 and increase difficulty in understanding. Q6: About the bias issue addressed by BCP and improvements with SDCL. (R4) A6: The advantage of BCP lies in reducing the empirical distribution gap, where the knowledge learned from labeled data is well-preserved. However, there are issues with learning local attributes and low-contrast target segmentation, which are potential bias areas—this is a drawback of BCP. SDCL efficiently solves these bias problems through correction learning based on prediction differences. We sincerely appreciate the reviewers’ valuable feedback and will carefully integrate their suggestions to improve our work. We highly value and are grateful for their time and effort.

Meta-Review

Meta-review not available, early accepted paper.

back to top

SDCL: Students Discrepancy-Informed Correction Learning for Semi-supervised Medical Image Segmentation

Author(s):