Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm to mitigate neuroimaging analysis algorithms’ reliance on annotated data. However, existing SSL methods for brain MRI often fail to incorporate anatomical priors inherent in brain MRI, limiting their effectiveness. Here, we present Masked Contrastive Language-Image Modeling (MCLIM), a novel SSL framework that integrates knowledge from brain atlases through text-guided representation learning. We first generate structure-specific textual descriptors based on brain atlases, with no need for manually collecting image-text pairs. Then MCLIM employs (1) an image restoration branch that reconstructs randomly masked image patches through an encoder-decoder network, and (2) a cross-modal alignment module that establishes semantic correspondences between image features and atlas-derived text embeddings. These two learning objectives enable the simultaneous capture of fine-grained intensity patterns and whole-brain topological relationships. The proposed method is fine-tuned and evaluated on three brain parcellation datasets with varying granularities and a brain lesion segmentation dataset. Experiment results demonstrate that MCLIM outperforms state-of-the-art SSL methods and reduces annotation effort by at least 40%. Code and pre-trained models will be available at https://github.com/CRazorback/MCLIM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1175_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/CRazorback/MCLIM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiaJia_Masked_MICCAI2025,
        author = { Liang, Jianwen and Lyu, Junyan and Yuan, Yixuan and Tang, Xiaoying},
        title = { { Masked Contrastive Language-Image Modeling For Brain Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {310 -- 319}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The main contribution of this paper lies in the proposal of a novel self-supervised learning (SSL) framework, termed MCLIM, for brain MRI segmentation. The method integrates textual information derived from brain atlases into the visual representation learning process, enhancing the model’s ability to capture semantic context.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) A notable strength of MCLIM is its capability to incorporate text priors and achieve image-text alignment without relying on paired image-text training data, which significantly broadens its applicability in medical imaging scenarios where such paired data is scarce. (2) The strategy of leveraging priors from established brain atlases to guide representation learning, thereby obviating the need for manual annotations, is both innovative and intuitive in the neuroimaging community.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The methodology and experimental setup are not described with sufficient clarity and detail. Specifically, in Section 3.2, the authors state: “For a fair comparison purpose, all evaluated methods are pre-trained from scratch according to their published code with the same network architectures and datasets.” This statement is ambiguous. Does “the same network architectures and datasets” refer to those used in this paper, or those originally used in the publicly available code of the compared methods? For instance, Swin-UNETR introduces architectural modifications that clearly differ from the image encoder used in this paper, making it unlikely that the architectures are the same. Conversely, if the comparison uses each method’s original architecture and dataset, then Swin-UNETR would have been pre-trained on abdominal CT images, which differs significantly from the brain MRI data used in this study, leading to an inherently unfair comparison. It is recommended that the authors clarify the network architectures and pre-training datasets used for each baseline. Without this clarification, the fairness of the results presented in Table I remains questionable.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    (1) It is recommended that the authors provide a more thorough and clearer description of the methodology and experimental setup. For example, the components and symbols in Figure 2 should be explained in the figure caption; the procedure for constructing the Atlas Descriptions Bank (ADB) should be described in greater detail; and the experimental setup for comparison with state-of-the-art methods should be clearly specified. (2) Certain presentation details lack professionalism and should be addressed. For instance, the authors display axial slices with a 180-degree rotation of the head across all figures, which is unconventional and may lead to confusion.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is well-motivated, intuitive and inspiring. However, the experimental results seem to be insufficient to illustrate the superiority of the MCLIM method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces a self-supervised learning framework called Masked Contrastive Language-Image Modeling (MCLIM). It makes the first attempt to combine brain atlas text priors with self-supervised learning. By automatically generating structural description texts (ADB) from the brain atlas and integrating image inpainting and cross-modal alignment tasks, it enables the model to learn both local intensity patterns and global anatomical semantics. In experiments, MCLIM surpasses existing methods in brain region and lesion segmentation tasks (with an average DSC improvement of 0.8%) and notably reduces the annotation requirements by more than 40%.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel combination: MCLIM innovatively combines brain atlas text priors with self - supervised learning, addressing the long - standing issue of anatomical prior utilization in brain MRI segmentation. The integration of related tasks enables the model to capture diverse brain information, leading to better segmentation performance. Performance gains: MCLIM’s superior performance in segmentation tasks.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Patch - based training: Training with only image patches may ignore brain region topological dependencies, causing global segmentation consistency issues. Text description accuracy: Aggregating text descriptions for partial - region image patches can lead to semantic confusion, affecting model performance. Reproducibility: Dependence on a large model for text generation raises reproducibility concerns. Text extraction flaw: Prototype - based text extraction may weaken fine - grained semantic representation. Validation set absence: Lack of validation set design increases the risk of overfitting and limits model generalization. Computational analysis missing: There is no comparison of computational cost and complexity with other methods, hindering comprehensive evaluation

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a self-supervised learning framework based on Masked Contrastive Language-Image Modeling (MCLIM). For the first time, it combines the prior knowledge of brain atlas text with self-supervised learning, aiming to address the problem of insufficient utilization of anatomical priors in brain MRI segmentation. Its core innovation lies in automatically generating structural description texts (ADB) through the brain atlas, and combining the tasks of image inpainting and cross-modal alignment, enabling the model to learn both local intensity patterns and global anatomical semantics simultaneously. Experiments show that MCLIM outperforms existing methods in both brain region and lesion segmentation tasks (with an average DSC improvement of 0.8%), and significantly reduces the annotation requirements (by more than 40%).

    Question 1: Is it reasonable to train only using image patches? Because image patches may ignore the topological dependencies of brain regions (such as cross-regional anatomical constraints), leading to insufficient global consistency in segmentation.

    Question 2: How to ensure the accuracy of the text descriptions of the image patches? The paper mentions that the ADB text descriptions are based on the complete brain structure, while the patches may only cover part of the regions. Directly aggregating the descriptions is likely to cause semantic confusion (such as the mismatch between local features and global descriptions).

    Question 3: The text generation depends on a large model. How to ensure the reproducibility of the paper?

    Question 4: Is it possible that the method of extracting texts through prototype aggregation weakens the fine-grained semantic expression of multi-structural texts?

    Question 5: The design of the validation set is not mentioned in the paper. Evaluating only the test set through the training set makes the model prone to overfitting, and in this way, the generalization ability of the model is insufficient.

    Question 6: There is a lack of analysis and comparison of the computational cost and computational complexity with other methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    Authors propose a Masked Contrastive Language-Image Modeling (MCLIM), a novel self-supervised learning (SSL) framework that integrates knowledge from brain atlas through text-guided representation learning for brain segmentation. The MCLIM consists of an image restoration branch and a cross-modal alignment module to simultaneously capture fine-grained intensity objectives and whole-brain topological relationships. Compared to existing SSL methods for brain MRI, MCLIM can incorporate anatomical prior inherent in brain MRI. The MCLIM is evaluated on three brain parcellation datasets with varying granularities and a brain lesion segmentation dataset, and experimental results demonstrate that MCLIM outperforms other SOTA SSL methods and reduces annotation effort by at least 40%.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novelty: MCLIM is the first SSL approach that systematically incorporates neuroanatomical text priors without requiring paired image-text training datasets.
    • Data-Efficient learning: MCLIM demonstrates the ability to reduce annotation requirements by at least 40% compared to training from scratch, which is a crucial advantage in domains with limited labeled data.
    • The writing logic is very good, and easily read and understood for readers.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Limited of method figure: The reconstructed image patch in method figure is the same as the original input image patch. Please provide the real reconstructed image patch to evaluate the effectiveness of the image restoration branch.
    • No statistical evaluation of results: paired t-tests would give statistical weight to the argument of “superiority” of the proposed method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Please do paired tests to the argument of “superiority” of the proposed method.
    • Please provide the reconstructed image patch in your method figure.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Results have no significant improvement from qualitative comparisons, and the reconstruction results are not shown. So I question the effectiveness of this approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thank you for approving our work. We are deeply grateful for your constructive feedback and will refine the article according to your insightful suggestions.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top