Abstract

While semi-supervised learning (SSL) has demonstrated remarkable success in natural image segmentation, tackling medical image segmentation with limited annotated data remains a highly relevant and challenging research problem. Many existing approaches rely on a shared network for learning from both labeled and unlabeled data, facing difficulties in fully exploiting labeled data due to interference from unreliable pseudo-labels. Additionally, they suffer from degradation in model quality resulting from training with unreliable pseudo-labels. To address these challenges, we propose a novel training strategy that uses two distinct decoders—one for labeled data and another for unlabeled data. This decoupling enhances the model’s ability to fully leverage the knowledge embedded within the labeled data. Moreover, we introduce an additional decoder, referred to as the ``worst-case-aware decoder,” which indirectly assesses potential worst case scenario that might emerge from pseudo-label training. We employ adversarial training of the encoder to learn features aimed at avoiding this worst case scenario. Our experimental results on three medical image segmentation datasets demonstrate that our method shows improvements in range of 5.6% - 28.10% (in terms of dice score) compared to the state-of-the-art techniques. The source code is available at \url{https://github.com/thesupermanreturns/decoupled}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2247_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2247_supp.pdf

Link to the Code Repository

https://github.com/thesupermanreturns/decoupled

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Das_Decoupled_MICCAI2024,
        author = { Das, Ankit and Gautam, Chandan and Cholakkal, Hisham and Agrawal, Pritee and Yang, Feng and Savitha, Ramasamy and Liu, Yong},
        title = { { Decoupled Training for Semi-supervised Medical Image Segmentation with Worst-Case-Aware Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    A method is proposed for semi-supervised medical image segmentation. The paper has two main contributions: One, in order to prevent the unreliability of pseudo-labels from deteriorating model performance, the paper proposes to use a shared encoder, but two different decoders - one for labeled and another one for unlabeled images. Two, in order to avoid a so-called worst-case scenario where the model learns only from labeled data while misclassifying unlabeled data, a third decoder is trained adversarially - that is, the decoder seeks to misclassify unlabelled images, while the shared encoder seeks to classify them correctly to match the pseudo labels. Experiments are done on two prostate MRI datasets and one abdominal MRI dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Semi-supervised learning continues to be a popular problem in medical image segmentation, and this paper seeks to tackle an important issue of unreliability of pseudo-labels in this setting.

    2. The idea of using a different decoder for the weak and strong augmentations is an interesting approach for preventing instability arising from unreliability of predictions for strong augmentations.

    3. The paper has experiments on multiple datasets, and several ablation studies are provided.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Several aspects of the adversarial learning “worst-case decoder” are unclear: a. What are the distributions that the adversarial learning is seeking to match? b. In equation 4, should it be that the encoder tries to minimize both loss terms, while the decoder tries to minimize one and maximize the other? If so, the signs of both loss terms should be positive for the encoder’s optimization, and positive for one loss term and negative for the other loss term for the decoder’s optimization.

    2. It is unclear why two strong augmentations are used instead of just one. a. It looks like the two strong augmentations do not interact with one another in any of the loss terms. Is this correct? b. Is there any difference in the two strong augmentations? c. Along the same lines, why does the model performance drop when any one of the strong augmentation losses is dropped (results in table 2)? d. If the two strong augmentations are sampled from the same distributions of transformations, and the two strong augmentations do not interact with one another in any single loss term, I do not see why this should be the case. This seems to indicate that the optimization details (weights for the different loss terms, optimizer details, etc) can be tweaked to get the same performance from using just one strong augmentation loss. For the same reasons, I am not surprised to see that the results in table 4, which show that adding more strong augmentation terms do not make any difference to the performance.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be helpful if the code and details of the training / test splits (subject IDs) were provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Value of the threshold eta?

    2. Following details of experimental results should be provided: a. Results over multiple runs with different initializations. b. Box plots / violin plots over all test subjects rather than just mean results. c. Statistical significance tests.

    3. Why are different percentages of training data used for different datasets?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Methodological novelty is limited
    2. The role of adversarial learning in the proposed setting is not well motivated, nor clearly described.
    3. Experimental details are missing (statistical significance tests, standard deviations / box plots for results).
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I have increased my rating from 2 to 3, in view of the promise to add standard deviation of results, and statistical significant tests.

    However, I still believe that the methodological novelty in the paper is highly limited. In particular, I am still unconvinced about the need of multiple strong augmentations that do not interact with one another. Multiple augmentations are used in contrastive learning because they explicitly interact with one another in the loss function. This is not the case in the loss function used in this paper.



Review #2

  • Please describe the contribution of the paper

    This paper solves two issues in semi-supervised segmentation. The paper is easy to follow.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper solves two issues in semi-supervised segmentation. The paper is easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Why does the proposed idea work? Please give more details. How about compare the proposed method with SAM?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The reproducibility cannot be verified.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Positive: This paper solves two issues in semi-supervised segmentation. The paper is easy to follow. Negative: Why does the proposed idea work? Please give more details. How about compare the proposed method with SAM?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is easy to follow. The novelty is not enough. Moreover, the source code was not provied. Hence, the paper is suggested for boardline.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel SSL method that integrates three learning strategies, utilizing labeled data, unlabeled data, and an encoder to mitigate potential misclassification issues associated with the use of unlabeled data. This hybrid model, equipped with a worst-case-aware module, demonstrates outstanding performance compared to several existing methods. In clinical practice this results relevant because of the well-known problem of limited amount of labeled data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed architecture is coherent with the problems of working with both kind of that: labeled and unlabeled. The method was evaluated in different datasets, leadning to a clear improvement over all these scenarios.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method is difficult to follow, starting from the figure it could be improved. Also the formulation of the model in the text shuold be better stated, the equations are too large and can be easily just simplified with another mathematical notation for example for the output of each encoder.

    The weakly and strong augmentations election is not totally elucidated, because a rotation for a prostate image is actually a strong augmentation because of the anatomical relevance in designing CAD tools from prostate imaging, even if you consider the entire MRI, such augmentation will be more challenging compared to only considering the gland for example.

    In the results it is not well stated which SOTA methods are actually those ones that have proposed the use of pseudo labels.

    The presentation of figures 3, and tables 2-4 is not clear, for isntance what means the headers 3 and 7? maybe is better to use the percentages that 3 and 7 refers to?

    Actually, for this kind of organ segmentation problem the number of data the authors studied reppresents so few (just 20 samples for 13%), instead current datasets have more than one thousand images. Then in important mention the roboustness of the method to data from multiple sourcer for example?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Few suggestions regarding the way the data is presented in the tables, chose more appropiate headers’ names. Also, the discussion of the relevance of the work should be more than just work with a vvery few number of data. The equations can be simplified, reducing the number of parenthesis.. that doesnot allow to see well to which parapeters are applied the functions.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel and appropiate for the formulated problem. And the results present the important impact of the methos when compared with other SOTA approaches.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the Reviewers R3, R4, and R5 for their valuable feedback. All suggested changes and additional details will be included in the final draft.

1.Reproducibility(R3,R4,R5):A well-documented code, data splits and trained models will be made publicly available within one week of paper acceptance.

2.Why method works(R3):Our method works due to a novel decoupled training approach by (i)improving the quality of pseudolabels through separation of psuedolabel generation and separation (ii)increased tolerance to incorrect pseudolabels with our worst-case aware decoder.

3.Comparison with SAM(R3):For a fair comparison, we implemented SAM for medical imaging in three settings: (i)Fine-tuned SAM solely on limited labeled data available in our SS setting from target dataset, (ii)Fine-tuned SAM on both labeled and unlabeled images using a pseudolabeling semi-supervised approach,and (iii)combined SAM with our proposed decoupled training, worst-case-aware decoder, and adversarial training. While (ii)improved performance over (i), it still lagged behind our method. Finally, (iii)achieved the best performance, surpassing our method, highlighting the complementary nature of our method and SAM.

4.Distribution match(R4):The adversarial decoder’s output distribution matches pseudolabels for unlabeled data and actual labels for labeled data.

5.Adversarial learning Loss (R4):The loss term operates in same manner as highlighted by R4.

6.On strong augmentation(R4):a)We enforce strong augmentations to be close to a shared weak view, minimizing their distance and promoting interaction. The strong decoder’s loss integrates loss from both augmentations. b)Two augmentations differ due to the non-deterministic nature inducing randomization, ensuring inequality. c)d)Using two augmentations enhances feature space exploration; dropping one leads to a performance decline. It aligns with principles of contrastive learning, fostering discriminative representations. Continued addition of views saturates performance without introducing new information.

7.On experiments details(R4):We included threshold ablation in Fig. 3, with \eta set at 0.95. Results provided are average of 3 runs. We will add Friedman test in final version.

8.Different% of train data(R4):We have varied train data percentage following common practice in SS literature, which typically varies depending on the dataset size.

9.Novelty(R4):Most methods use shared architectures, leading to lower quality pseudolabels and decreased model quality. In contrast, we introduce a novel decoupled approach, separating pseudolabel generation and training to preserve pseudolabel quality, and propose a worstcase aware decoder to address unreliable pseudolabels. This results in higher quality pseudolabels, enhanced tolerance to inaccuracies, and better performance.

10.Comments on fig, eqn and tables(R5): We will address it in the final draft.

11.On weak and strong augmentations(R5):We agree that rotation may introduce strong change in some of our datasets due to their specific nature (anatomy). However, compared to stronger augmentations like CutMix employed, rotation can be considered as a relatively weaker augmentation since the corresponding ground-truth/pseudo-label also can go through the equivalent transformation. Note that we do NOT perform tweaking of augmentations or training parameters for individual datasets and use identical hyperparameters across different datasets.

12.Existing pseudolabels based methods(R5):SSNet(MICCAI22), DCNet(MICCAI23) are pseudolabels based SOTA methods.

13.On less number of training samples(R5):We use ProstateX and PROMISE12. ProstateX has very few images, while PROMISE has ~1000 images. In line with works in semi-supervised literature, which demonstrate the efficacy of models with fewer supervised data, we also use a reduced percentage of data for our experiments.

14.Minor comments on table, equations, and discussions(R5):This will be addressed in final draft




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top