List of Papers Browse by Subject Areas Author List
Abstract
Accurate segmentation of brain images typically requires the integration of complementary information from multiple image modalities. However, clinical data for all modalities may not be available for every patient, creating a significant challenge. To address this, previous studies encode multiple modalities into a shared latent space. While somewhat effective, it remains suboptimal, as each modality contains distinct and valuable information. In this study, we propose DC-Seg (Disentangled Contrastive Learning for Segmentation), a new method that explicitly disentangles images into modality-invariant anatomical representation and modality-specific representation, by using anatomical contrastive learning and modality contrastive learning respectively. This solution improves the separation of anatomical and modality-specific features by considering the modality gaps, leading to more robust representations. Furthermore, we introduce a segmentation-based regularizer that enhances the model’s robustness to missing modalities. Extensive experiments on the BraTS 2020 and a private white matter hyperintensity(WMH) segmentation dataset demonstrate that DC-Seg outperforms state-of-the-art methods in handling incomplete multimodal brain tumor segmentation tasks with varying missing modalities, while also demonstrate strong generalizability in WMH segmentation. The code is available at https://github.com/CuCl-2/DC-Seg.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0653_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/CuCl-2/DC-Seg
Link to the Dataset(s)
brats2020 dataset: https://www.med.upenn.edu/cbica/brats2020/data.html
BibTex
@InProceedings{LiHai_DCSeg_MICCAI2025,
author = { Li, Haitao and Li, Ziyu and Mao, Yiheng and Ding, Zhengyao and Huang, Zhengxing},
title = { { DC-Seg: Disentangled Contrastive Learning for Brain Tumor Segmentation with Missing Modalities } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15967},
month = {September},
page = {140 -- 150}
}
Reviews
Review #1
- Please describe the contribution of the paper
- The authors propose contrastive learning based method for handling missing modalities in brain tumor segmentation. The method encodes each modality independently with anatomical and modality encoders, then the methods disentangles modality from the anatomy by the means of constrastive learning. The method combines anatomical and modality representation together to restore the original image. For anatomical features, segmentation guided regularization is applied.
- the method is validation using two datasets, showing superior performance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- clear defenition of the method, the method is easy to follow.
- novel combination of the losses for the given task, reconstruction task for regularizing contrastive learning objective, introduction of dropout term in the fusign operation in reconstruction loss.
- introduction of segmentation based regularizer for learning anatomical features.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- it’s not clear why the method is bidirectional. what is forward direction and what is reverse direction in this methods? Isn’t the described approach a type of autoencoder?
- What networks has been utilized in the experiments as Encoders and Decoders
- How anatomical encodings are fused?
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
2.1 typo: Leanring
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I believe this is an interesting paper introducing a unique solution for segmentation with missing modalities. However, it misses some key details in the text.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This manuscript proposes a multimodal brain tumor segmentation method with missing modalities. The main contribution is the model disentangles modality-invariant anatomical representation and modality-specific representations by leveraging contrastive learning. The proposed method is evaluated on the public BraTS dataset and a private WMH dataset, which shows the effectiveness of the method.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Support for multiple modalities as well as missing modalities.
- Comparison experiments and ablation experiments are sufficient.
- Good visualization of the feature representations.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- There are at least five components in the final loss (namely ana, mod, rec, reg, and seg), but how to balance them and how to make the training stable are not mentioned.
- The detailed model architecture is not discussed. For example, what are the architectures for E^{ana}, E^{mod}, Fusion, D^{fuse} and D^{rec}, separately?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed disentangle method is effective and experimental validation is sufficient.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper proposes a new deep learning model for multimodal brain tumor segmentation and addresses the challenging scenario of missing image modality. The framework (DC-Seg) encodes the multimodal inputs into modality-specific appearance codes and modality-invariant content (i.e., anatomical) codes. The anatomical representations from different modalities are then fused into a unique feature code for tumor segmentation. In addition, the modality-specific representations are disentangled by reconstructing the original images from the modality-specific code paired with the fused anatomical code. Compared to other multi-model segmentation learning approaches with feature disentanglement, the authors propose to further disentangle the anatomical and modality representations using two separate contrastive losses. The authors validate their method on the BraTS 2020 dataset and an in-house white matter hyperintensity (WMH) dataset, and demonstrate improved performance compared to state-of-the-art multimodal segmentation models.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strength of this work is the inclusion of two contrastive losses for feature disentanglement, which are novel in the context of multimodal segmentation. More generally, multimodal learning is also highly clinically relevant as different modalities typically encode complementary information. The proposed model can be easily applied to other regions of interest (not only brain) and modalities (not only different MRI sequences, but also CT or PET). Finally, the paper presents a solid evaluation of the method: comparison with state-of-the-art on two datasets, ablation study and visualisation of the anatomical and modality representations.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The novelty of the proposed method is limited / incremental, the model is a simple extension of the work of Chen et al. [3], with a weight-shared decoder to segment each modality separately (from Ding at al. [5]), and the addition of the proposed contrastive losses. The performance improvements of the method (especially over Rfnet) are limited, and the lack of statistical analysis does not allow the reader to conclude that the gains are significant. It is also not clear from the ablation study what the effect of each of the component of the loss has on segmentation prediction.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Methodology:
- It is not clear why using an 8-bit vector is the best strategy to encode the modality-specific representation. The authors should expand on this.
- It would be beneficial to include the segmentation decoder D^{Sep} in Figure 1, for reader’s understanding.
Experiments:
- The authors should provide the following hyperparameter values and settings for reproducibility: temperature of contrastive loss, probability associated with Bernoulli indicators, architecture of encoder and decoder networks, and GPU used.
- The authors should indicate which algorithm and associated hyperparametes they used to obtain the visualization of the anatomical and modality representations (Figure 3).
- Does the computation of the contrastive loss increase training time? By how much?
- Why M3AE is not included in WMH dataset experiment (Table 3)?
Results:
- It would be beneficial to compare the results of the multimodal models to unimodal approaches, such as simple nn-UNets (Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z) trained on each modality separately, to assess the performance gains of multimodal learning.
- The results should include other validation metrics (e.g., Hausdorff distance) to ensure the robustness of the method and provide a more complete description of the model performance.
- The authors should comment on which image modality is the most useful for segmenting the different tissues of interest in both datasets. For instance, in Table3, Flair modality lead to the best performance for WMH segmentation. Is it consistent with what is expected?
- The authors should comment on what is the effect of each component of the loss on the segmentation predictions. It is not clear from the ablation study what the effects of the anatomical and modality contrastive losses are.
- The authors should perform a statistical analysis (paired t-tests) of the results (including the ablation study) to ensure the superiority of the proposed method.
- Figure 3, it would be beneficial to compare the anatomical and modality representations with and without anatomical and modality contrastive losses.
Recommendation for future work:
- Explore the performance on other multimodal segmentation problems such as the MICCAI HECKTOR challenge (https://hecktor.grand-challenge.org/). Head and neck tumors segmentation in CT and PET imaging.
Misc:
- Spelling error in Section 2.1 title: Bidirectional Contrastive Leanring → Learning
- In section 3, “Figure 3 illustrates that our method effectively segments brain tumors across various missing modality scenarios”, should it be Figure 2?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(6) Strong Accept — must be accepted due to excellence
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Multimodal learning and missing modality approaches are highly relevant, and the inclusion of modality contrastive loss in such an approach is of interest for future works. The paper proposes a novel framework with solid evaluation on two datasets.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We sincerely thank the reviewers for their time and valuable feedback. Your positive response greatly encourages the medical multimodal research community. Below, we’ll address key reviewer comments and will incorporate necessary revisions and correct all typos in the camera-ready version. Given the general question regarding the architecture, we clarify that 3D U-Net is used as the backbone for both the encoder and decoder components. Reviewer 1: Due to MICCAI’s policies, we cannot include additional experimental results. While we agree that adding other validation metrics, statistical analyses, unimodal method, computation comparison would enhance the work, we are unable to provide them at this stage.
- Clarifying the novelty of DC-Seg and the performance gains: Traditional disentanglement methods relying on reconstruction only ensure the fused anatomical representation is well-learned but do not guarantee modality-invariance for each modality’s anatomical representation. This can degrade performance when modalities are missing. Our bidirectional contrastive learning addresses this by aligning anatomical representations across all modalities, ensuring robustness. Compared to RFNet, our method achieves notable performance improvements on the enhancing core tumor in the BraTS dataset.
- Clarifying the role of each loss component:
The reconstruction loss preserves critical anatomical features by ensuring the fused anatomical representation can reconstruct the original images. Anatomical and modality contrastive learning align representations across modalities, maintaining completeness and consistency even with missing modalities. The regularizer loss prevents over-reliance on specific modalities sensitive to tumor regions.- Clarifying the rationale for using an 8-bit vector for modality-specific representation Compared to anatomical representations, which contain diverse and detailed information unique to each individual, modality-specific representations are less complex. Therefore, an 8-bit vector is a suitable and widely adopted choice [3, 10].
- Supplementary Information on Hyperparameters and visualization:
We have included detailed descriptions of these hyperparameters in the publicly available code. Specifically, the temperature of the contrastive loss is a learnable parameter. We utilize Bernoulli indicators to ensure that each missing modality scenario has an equal probability during training. In total, there are (2^4 - 1 = 15) scenarios, each occurring with equal likelihood. The encoder and decoder architectures employed are based on 3D U-Net. The experiments were conducted using an A800 80G GPU. We use tsne with random seed 42 to obtain visualization (Figure 3)- Why is M3AE not included in the WMH dataset experiment?
M3AE is a self-pretraining method that requires a large amount of data. However, the WMH dataset is not enough to meet this requirement.- Thank you for your recommendation for future work. It is straightforward to adapt our approach to other organs and modalities, and we are confident that our method will perform effectively in such scenarios. Reviewer 2:
- Explanation of bidirectional:
The term bidirectional does not refer to forward and reverse directions. Instead, it signifies two distinct types of contrastive learning: anatomical contrastive learning and modality contrastive learning. In practice, given n samples * m modalities, the bidirectional nature arises from these two types of contrastive learning treating data from two different directions as positive pairs.- Fusion of anatomical encodings:
The fusion of anatomical representations is not the primary contribution of our work. For this, we follow the approach described in [5]. Reviewer 3:- Balancing different loss components to ensure stable training:
The training process is highly stable. We use only one hyperparameter, alpha (0.4), to balance the different loss components, as mentioned in Equation 9.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A