Abstract

Semi-supervised learning has received considerable attention for its potential to leverage abundant unlabeled data to enhance model robustness. Despite the widespread adoption of pseudo labeling in semi-supervised learning, existing methods often suffer from noise contamination, which can undermine the robustness of the model. To tackle this challenge, we introduce a novel Synergy-Guided Regional Supervision of Pseudo Labels (SGRS-Net) framework. Built upon the mean teacher network, we employ a Mix Augmentation module to enhance the unlabeled data. By evaluating the synergy before and after augmentation, we strategically partition the pseudo labels into distinct regions. Additionally, we introduce a Region Loss Evaluation module to assess the loss across each delineated area. Extensive experiments conducted on the LA, Pancreas-CT and BraTS2019 dataset have demonstrated superior performance over current state-of-the-art techniques, underscoring the efficiency and practicality of our framework. The code is available in the Supplementary Material.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1721_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/ortonwang/SGRS-Net

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanTao_SynergyGuided_MICCAI2025,
        author = { Wang, Tao and Zhang, Xinlin and Chen, Yuanbin and Zhou, Yuanbo and Zhao, Longxuan and Tan, Tao and Tong, Tong},
        title = { { Synergy-Guided Regional Supervision of Pseudo Labels for Semi-Supervised Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {535 -- 545}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a synergy-guided regional supervision of pseudo labels framework, named SGRS-Net, for semi-supervised medical image segmentation. SGRS-Net is built upon a mean-teacher paradigm, with a main focus on improving pseudo labels by partitioning them into three regions, i.e., disregarded, consistent, and inconsistent. The proposed method was evaluated on the LA, Pancreas-CT, and BraTS datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The region-based pseudo label partition seems new compared with previous methods that only consider high-confidence and low-confidence regions.

    2. The method is easy to follow and effective according to the results.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The motivation could more precisely target specific limitations in current semi-supervised approaches, rather than the task-wise challenge (i.e., the noisy pseudo labels).

    2. While the regional pseudo-label partitioning is an interesting contribution, the overall framework relies heavily on existing techniques (e.g., the mean-teacher paradigm and mix-up perturbation), which may limit the scope of novelty.

    3. Regarding the technical implementation, the formulation in Eq. (2) appears problematic. Directly adding images with modifying (pseudo) labels may violate segmentation equivalence, as the pixel-label correspondence could be compromised during the transformation.

    4. Regarding Eq. (13) and Eq. (14), simply introducing smoothing parameters does not necessarily guarantee robust loss functions.

    5. The brain tumor segmentation task is a four-class segmentation problem, yet the authors appear to have simplified it to binary classification?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Did the authors reduce the brain tumor segmentation to a binary classification task?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed framework depends substantially on existing techniques without a sufficiently clear motivation. The manuscript would benefit from focusing more explicitly on the specific limitations of previous methods. Please refer to the weaknesses for more details.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    1) Motivation of the paper: The quality of pseudo labels is a general issue in semi-supervised learning (semi-supervised medical image segmentation). As mentioned in my previous comment, the paper should clearly specify the limitations of prior studies in handling noisy pseudo labels.

    2) Issue in Eq. (2): The authors may refer to MixUp, where the operation is applied to both the image and the corresponding label. Transformation equivalence should be considered in semantic segmentation.

    3) Experiment on BraTS: Simplifying the dataset reduces the persuasiveness of the work.



Review #2

  • Please describe the contribution of the paper

    This paper proposed a Synergy-Guided Regional Supervision of Pseudo Labels to address noise contamination of pseudo labelling methods in semi-supervised learning.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors introduced an adaptation of the mean teacher network and mix augmentation to maintain data consistency and proposed a region loss evaluation module to asses loss across delineated areas to mitigate the impact of noise while fully capitalising on the supervisory signal provided by pseudo label. The authors conducted extensive experiments on three datasets covering two medical imaging modalities and showed superior performance over state-of-the-art. The paper is well organized in terms of the paper structure.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors haven’t mentioned how their proposed mix-up augmentation differs from recent works, e.g., polpypmixnet, etc., or other augmentation methods, e.g., cut mix, puzzle mix. Why authors choose mix-up is unclear and not well explained. Would mixing up cause hallucinated details in input images? One of the major claims of this work is that the proposed SGRS-Net is introduced to overcome noise contamination, which can undermine the model’s robustness, as per the abstract. However, there’s no empirical evaluation of the proposed model’s robustness against recent works. It would have been better if the proposed method had been compared with the most recent (2024 & 2025) augmentation-based semi-supervised learning methods. BraTS 2019 is a multi-class segmentation task, so showing tumour class-wise performance is better.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method’s novelty is limited, yet it is superior performance-wise. The authors’ major claims, such as how the model reacts to noise artefacts and model robustness, are not discussed and evaluated in the experimental section. If the authors can address why this claim hasn’t been addressed, I will gladly upgrade my overall score.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The question about the model’s robustness remains the same even though the authors mentioned a few references. Since the robustness is one of their major claims (mentioned in abstract as well), I believe this needs to be propoerly addressed and explained in the manuscript with some results. Therefore I keep my overall score as Reject.



Review #3

  • Please describe the contribution of the paper

    They propose ​​SGRS-Net​​, a novel semi-supervised medical image segmentation framework featuring ​​Pseudo Label Generation​​, ​​Mix Augmentation​​, ​​Synergy Evolution (SE)​​, and ​​Regional Loss Evaluation (RLE)​​ modules, which significantly improves segmentation accuracy on ​​public datasets​​ with limited labeled data, outperforming state-of-the-art methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes the ​​Synergy Evolution (SE) module​​, which leverages information entropy to distinguish between regions, followed by the ​​Regional Loss Evaluation (RLE) module​​ to impose loss constraints on regional partitions. Extensive experiments demonstrate its superior performance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The manuscript lacks benchmarking against state-of-the-art methods (particularly 2024 publications), which weakens the persuasiveness of its claimed performance advantages.
    2. The paper provides no theoretical or empirical basis for selecting the τ (tau) parameter within the narrow 0.29-0.30 range. Three critical questions remain unaddressed: (i) Would different network architectures require adjusted τ values? (ii) What methodology was used to initially determine this specific range? (iii) Are there systematic strategies to efficiently identify optimal τ ranges when adapting the framework to new architectures?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed semi-supervised framework with region-wise constraints demonstrates innovation, but its persuasiveness is somewhat limited by insufficient comparative methods and inadequate justification for parameter selection.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors compared their method with the latest SOTA and addressed the questions regarding Tau-related settings. I believe the paper is now acceptable for publication.




Author Feedback

To R1: Q1: Mix-Up vs PolypMixNet Our motivation is to adopt a perturbations strategy and assess prediction consistency, which forms the basis of our SE&RLE design. We also tried flipping and rotation, Mix-Up performed slightly better. Its simplicity and global operation makes it well-suited to our method. In contrast, CutMix modifies partial regions, PolypMixNet focuses on lesions areas, which may disrupt global pixel-level consistency during synergy evaluation. A discussion vs PolypMixNet will be added in the revision.

Q2: Hallucinated details? Although Mix-Up may introduce mild artifacts, their impact is minimal and can even improve performance. With Beta-distributed coefficients (λ) typically near 1, the dominant image retains semantics, and the secondary contributes only slight perturbations. These perturbations act as implicit regularization, mitigating overfitting and enhancing robustness. Prior study [23] shows that moderate noise artifacts can improve both adversarial robustness and generalization. Mix-Up preserves global semantic integrity while improving adaptability. Moreover, the SE&RLE modules further reduce the impact of extreme artifacts by selectively evaluating regional loss. Ablation results in Table3 confirm that Mix-Up improves performance. This issue was omitted due to space constraints and will be addressed in the revision.

Q3: Additional comparison We reference results in “Mutual Learning with Reliable Pseudo Labels for Semi-Supervised Medical Image Segmentation” (MIA2024), and “Uncertainty Co-estimator for Improving Semi-Supervised Medical Image Segmentation” (TMI2025). Under the 10% labeled data, comparisons of Dice scores on LA, Pancreas-CT (PA) and BraTS2019 (Br) datasets are summarized: MIA 89.86(LA) 75.93(PA) 84.29(Br) TMI 90.37(LA) 78.53(PA) 85.09(Br) Ours 90.76(LA) 80.55(PA) 85.67(Br) Our method shows improved results. These will be included in the revision.

Q4: BraTS2019 For consistency, it was converted to a binary segmentation task using preprocessed data provided by URPC, all comparative experiments were performed on this version.

To R2: Q1: Pseudo-labeling is a widely used strategy, but the noise it introduces arises from the method itself rather than the task-wise challenge.Our motivation is to mitigate the adverse effects of noise.

Q2: Our core innovations are SE&RLE modules, which are developed upon existing techniques: mean-teacher and Mix-Up. To the best of our knowledge, we are the first to apply this strategy to mitigate pseudo-label noise.

Q3: See R1 Q2

Q4: Smoothing parameters help curbs overfitting and we do not rely on them alone. Instead, it is integrated with our SEM&RLE modules, which are specifically designed to mitigate the impact of noise in pseudo-labels. Comparative experiments and ablation studies further validate the effectiveness of our approach.

Q5: See R1 Q4

To R3: Q1: See R1 Q3

Q2: Although the variation in tau is small, it leads to a more substantial change in confidence values due to the non-linear relationship between entropy and confidence. For instance, tau=0.290 corresponds to a confidence interval of 0.39–0.61, whereas tau=0.296 results in a range of 0.42–0.58.

(i)For different network architectures, there is no require to adjusted tau values as it functions solely on prediction.

(ii)We initially select tau from a confidence near the decision boundary(confidence range from 0.4-0.6, map to tau = 0.292) and converting it to entropy using equation(4). Based on ablation study on the LA dataset, we selected tau = 0.296 for experiments over three datasets. When tau=0.292, it also shows excellent performance according to the ablation study.

(iii)Function H(P)=Entropy(P)/log(n) can normalize entropy to [0,1] for consistent interpretation across tasks. For n-class task, the tau(n) can be derived using the relation tau(n)/log(n)=0.296/log(2). Accordingly, an initial threshold for n-class segmentation can be set as tau(n)=0.296/log(2)*log(n).




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Many concerns presented by reviewers are not well addressed, including the issue of model’s robustness and the unclear motivation.



back to top