List of Papers Browse by Subject Areas Author List
Abstract
Semi-supervised learning addresses the lack of annotations in medical images effectively, but its performance remains inadequate for complex backgrounds and challenging tasks. Multimodal fusion methods can greatly enhance the accuracy of medical image segmentation by offering complementary information. They encounter difficulties in achieving notable advancements under semi-supervised conditions because of issues in effectively utilizing unlabeled data. To address these issues, we propose a novel semi-supervised multi-modal medical image segmentation approach, which leverages complementary multi-modal information to enhance performance even with scarce labeled data. Our framework employs a multi-stage multi-modal fusion and enhancement strategy to fully utilize complementary multi-modal information while reducing feature discrepancies and enhancing feature sharing and alignment. Furthermore, our framework effectively introduces contrastive mutual learning to constrain prediction consistency across modalities, thereby facilitating the robustness of segmentation results in semi-supervised tasks. Experimental results on the multi-modal datasets demonstrate the superior performance and robustness of the proposed framework, establishing its valuable potential for solving medical image segmentation tasks in complex scenarios. The code is available at: https://github.com/DongdongMeng/SMMS.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3439_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/DongdongMeng/SMMS
Link to the Dataset(s)
BraTS2019 dataset: https://www.med.upenn.edu/cbica/brats-2019/
BibTex
@InProceedings{MenDon_SemiSupervised_MICCAI2025,
author = { Meng, Dongdong and Li, Sheng and Wu, Hao and Wang, Guoping and Yan, Xueqing},
title = { { Semi-Supervised Multi-Modal Medical Image Segmentation for Complex Situations } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15967},
month = {September},
page = {491 -- 501}
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors present a semi-supervised multi-modal medical image segmentation approach. The proposed design introduces 3 innovations. (1) a multistage feature fusion strategy, (2) a modality aware feature enhancement and (3) a mutual learning objective. The feature fusion consists of multi-scale layers in which features from each modalities go through concatenation, convolution and activation. The modality aware enhancement adds additional weights to each modality independently after all the fusion layers with the aim of adjusting the weights of each modality. Finally, the mutual learning objective consists of a supervised learning objective that combines the loss from each modality prediction and an unsupervised objective that compares pseudo-labels from one modality to the other.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Overall the method is sound, the 3 enhancement to better fuse and align modalities are sensible albeit not particularly innovative. The results do show an improvement in all but one experiment, and the ablation study supports the usefulness of all 3 components.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The main weakness of this paper is in its innovation and somewhat lack of details. All 3 components make sense but they are adaptation of standard techniques. The fusion layer is described as “a concatenation operation, two convolutional layers, and a non-linear activation function”. What are the convolution parameters, what is the activation function? How many fusion layers are used? For the modality aware what are th transformations psy1 and 2? For the contrasting learning objective why is backprop not performed between some of the outputs?
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Overall, the method is sensible, but not very innovative and the paper lacks important details.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I had two major critiques: (1) lack of novelty (2) lack of details. The authors have fully addressed the lack of details in the rebuttal, and I hope they will modify the article accordingly. The problem of insufficient novelty remains, and the rebuttal instead justifies the importance of the paper through its results, which are clear improvement over SOTA. Overall, I think the qualities of this paper outweigh its weaknesses and I recommend acceptance.
Review #2
- Please describe the contribution of the paper
The authors propose a multimodal semi-supervised medical image segmentation method. As part of a dual branch network, they adopt a multistage multimodal feature fusion, where they align features extracted from the two modalities at different stages of the neural architecture. Then aligned features are fed into an “enhancement” module to choose effective features, where they adaptively adjust the weights of the modalities (in an attention-like approach). Finally, they adopt a multi-modal contrastive learning with supervised and unsupervised losses. In the latter, the segmentation of one modality is used as pseudo labels to optimizing the segmenter of the other modality.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is generally well written and well organized.
- It tackles image segmentation, proposing or adopting architectural design configurations and losses.
- They tested their method on two datasets: public BRATS2019 brain tumours (MRI) and private nasopharyngeal carcinoma (CT).
- They present quantitative results using Dice similarity coefficient (DSC) and Average Surface Distance (ASD) comparing their method to several competing approaches with reasonable results (~1% improvement in accuracy).
- They conduct an ablation study on the fusion, enhancement, and the contrastive steps, showing drops of ~2-3% when certain proposed components are dropped.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper does not provide a computational novel or unique aspects as it basically borrows from existing literature.
- The authors do not motivate why segmentation is necessary for a specific clinical application and what range of accuracy is regarded as clinically acceptable.
- The authors do not describe the different steps with sufficient detail. Most of the description of the method (aside from the contrastive learning step) is high level and lacks low level details that allows for understanding exactly how the method works or appreciate how one may implement their method. For example, they mentioned multistage multimodal alignment several times but they do not provide any equation of how this is done. On other words, what is the equation that covers what is happening in the “Fusion Layer” in Fig. 1.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The authors write that the “modality-aware attention weight” are calculated using eqn 2. Does this mean these are not learned weights but rather calculated directly. Otherwise, clarify what weights are being optimized.
The equations refer to two modalities an and b, but the experiments are run on 4 MR pulse sequences (FLAIR, T1, T1ce, and T2). Does this mean that all n choose 2 combinations of pairs are passed into an architecture like the one in Fig. 1? Please clarify how the >2 modalities are handled.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Good writing, organization, sound method, good evaluation, etc. But does not provide any exciting novel method and the results are positive but not too exciting either.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
My WA accept persists, but the paper will likely be even improved with the revision b y addressing my and other reviewers’ questions.
Review #3
- Please describe the contribution of the paper
The paper addresses the problem of multi-modal fusion for improving medical image segmentation in the presence of two modalities and semi-supervised learning with limited labeled data. They introduce a modality-aware feature enhancement module and also, design a multimodal contrastive mutual learning. They achieved improved performance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is easy to follow. The motivation for components is properly explained(e.g., To reduce the disparity between modalities, we introduce a multi-stage feature fusion strategy to adequately align and fuse low-level visual features. Additionally, we introduce a modality-aware feature enhancement module to emphasize important modality-specific features while ignoring irrelevant information.)
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The main contribution of the paper regards the introduction of a “contrastive mutual learning loss”, but one of the SOTA works also introduces multimodal contrastive mutual learning (MMCML[18]). How the proposed method differs from this specific and other similar methods should be clearly illustrated.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper proposes a novel semi-supervised multimodal medical image segmentation approach. It motivates its proposed components. The evaluation shows the effectiveness of each component. Also, the paper is well-structured. Nevertheless, a clearer explanation of the novel contributions as compared to SOTA should be provided.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors properly addressed my concerns about how their method differs from prior work, especially [18]. Their explanation of the methodological differences, along with stronger experimental support, helps clarify the novelty of their approach. This strengthens my confidence in the paper and supports its acceptance.
Author Feedback
We sincerely thank the reviewers for their constructive feedback. We will revise the paper accordingly and provide more details, as well as release our code upon acceptance. To Reviewer1:
- Comparison with [18] and others. We highlight the following points: 1) Multi-modal Fusion Strategy: We employ feature fusion and alignment to reduce feature discrepancies, whereas [18] does not incorporate such mechanisms, which may exacerbate these differences. 2) Contrastive Mutual Learning Loss: To avoid the risk of networks producing consistent but incorrect predictions [22], we use cross-contrastive mutual learning instead of explicit constraints as [18]. These issues also exist in another multi-modal method [21]. 3) Our method improves DSC by 9.3% over [18] and 4.5% over [21] on the 5% annotated BraTS dataset. Other experiments further confirm its effectiveness. [22] Luo, X., et al. PMLR, 2022. To Reviewer2:
- Innovations. We propose the first method to accurately segment complex structures with limited labels by combining three effective strategies: MMF for feature alignment, MAE for adaptive adjustment, and MCML for robust learning. We highlight MMF, MAE, and MCML strategies can improve the DSC for the T1ce modality by 13.5%, 1.9%, and 2.0%, respectively, and for the T2 modality by 3.7%, 0.8%, and 1.5%.
- We will supplement the following technical details in the final version. 1) Fusion layer details. Each fusion layer uses a 3×3×3 convolution with padding of 1 and stride of 1, followed by PReLU and batch normalization. The activation function is the sigmoid. We employ 4 fusion layers in the encoder and 1 in the bottleneck. 2) Definition of transformations psi1 and 2. As proposed in [4], psi1 and 2 map input features \mathrm{X} \in \mathbb{R}^{H’ \times W’ \times D’} to \mathrm{U} \in \mathbb{R}^{H \times W \times D} through a sequence of convolution, batch normalization, and ReLU activation operations. 3) Stopping the gradient flow. Using pseudo-labels generated by the same branch for backpropagation may lead to overfitting, causing the model to converge in the wrong direction and degrade overall performance. To Reviewer3:
- Unique aspects. Our approach can better exploit multi-modal information and reduce modality discrepancies by multi-stage fusion. In contrast, existing methods rely on consistency losses at the bottleneck [21] or output layers [18], which are less effective. Our contrastive mutual learning loss also helps reduce overfitting and error accumulation. As [18] points out, studies in semi-supervised multi-modal medical image segmentation remain limited. Our approach addresses this gap and is validated by strong experimental results.
- Motivation and clinical feasibility. Tumor segmentation is harder than organ segmentation because of ambiguous boundaries and high heterogeneity [12]. We selected two representative and challenging tumor segmentation tasks where existing methods achieve limited accuracy. [10] indicated that Dice scores above 80% is often considered a practical benchmark for whole tumor segmentation in brain tumor datasets. [23] reported a DSC of 79% and an ASD of 2.0 mm for NPC, which meets the clinically acceptable 3-mm error margin in head and neck radiotherapy, and 89% of the results were rated as satisfactory by experts. Our semi-supervised method performs comparably to the above fully supervised results. [23] Lin, L., et al. Radiology 291.3 (2019): 677-686.
- Fusion layer equation. It is defined as f_{s}^{fused} = F_{sigmoid}(F_{conv2}(F_{conv1}(F_{cat}(f_{s}^{a}, f_{s}^{b})))). For more details, please refer to Question 2 from Reviewer 2.
- Learnable parameters in Eq. (2). The modality-aware attention weight is mainly indirectly optimized by updating the parameters (e.g., weight matrices and biases) in transformations psi1 and psi2.
- Input modalities. While we used only T1ce and T2 in the BraTS experiments, it can potentially handle more modalities through pairwise processing.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
After reviewing this manuscript alongside the diverse evaluations provided by the expert reviewers during the double-blind review process, I note there are varying perspectives on the paper’s contributions. Upon my careful paper reading based on these comments, I believe the authors should be allowed to address these concerns during the rebuttal phase. I recommend that the authors carefully consider the reviewers’ feedback and attempt to resolve these issues in their response for further consideration.
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A