Abstract

Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0143_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0143_supp.pdf

Link to the Code Repository

https://github.com/ZhangxinruBIT/MoME

Link to the Dataset(s)

https://fcon_1000.projects.nitrc.org/indi/retro/atlas.html https://www.isles-challenge.org/ http://www.brainTumoursegmentation.org/ https://wmh.isi.uu.nl/#_Toc122355653 https://portal.fli-iam.irisa.fr/msseg-challenge/ https://www.oasis-brains.org/

BibTex

@InProceedings{Zha_AFoundation_MICCAI2024,
        author = { Zhang, Xinru and Ou, Ni and Basaran, Berke Doga and Visentin, Marco and Qiao, Mengyun and Gu, Renyang and Ouyang, Cheng and Liu, Yaou and Matthews, Paul M. and Ye, Chuyang and Bai, Wenjia},
        title = { { A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript introduces a novel universal foundation model named MoME (Mixture of Modality Experts) for 3D brain lesion segmentation. This model is designed to handle diverse imaging modalities and various types of brain lesions automatically. MoME leverages a set of expert networks, each specializing in a different imaging modality, coordinated by a hierarchical gating network to enhance collaborative decision-making. Additionally, the model incorporates a curriculum learning strategy to prevent the degeneration of expert networks and maintain their modality-specific focus.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The integration of modality-specific expert networks with a hierarchical gating mechanism is a significant advancement, enabling effective handling of diverse imaging inputs.
    • MoME demonstrates superior performance over existing foundation models and task-specific models on multiple brain lesion datasets, suggesting strong generalizability and effectiveness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The sophisticated architecture involving multiple expert networks and a gating network might present challenges in practical implementation and tuning.
    • The model does not explore routing mechanisms between experts, which could potentially lead to further efficiency and performance improvements.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The details of gating network are missing. It is highly recommended to provide the code for easy reproduction. The complexity of the model and the specific tuning required for optimal performance could pose challenges for less experienced practitioners or those without access to similar computational resources.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The model does not explore routing mechanisms between experts, which could potentially lead to further efficiency and performance improvements.
    • It would be beneficial to compare MoME with other recent developments in the field of medical imaging and segmentation, particularly those using similar expert-based architectures.
    • As the authors mentioned in the Introduction that “universal brain lesion segmentation requires the capability of handling the challenge of input modality diversity”, the works related to hanlding various diverse modalities should be properly involved [1] [2] [3]. [1] Zhang Y, He N, Yang J, et al. mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022: 107-117. [2] Zhang Y, Yang J, Tian J, et al. Modality-aware mutual learning for multi-modal medical image segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24. Springer International Publishing, 2021: 589-599. [3] Chen C, Dou Q, Jin Y, et al. Learning with privileged multimodal knowledge for unimodal segmentation[J]. IEEE transactions on medical imaging, 2021, 41(3): 621-632.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend weak accept based on its significant contributions to brain lesion segmentation through the innovative MoME model, which adeptly handles diverse imaging modalities with a unique combination of modality-specific expert networks and a hierarchical gating mechanism. The manuscript is technically sound, well-presented, and extensively validated across multiple datasets, demonstrating superior performance and generalization compared to existing models.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a universal foundation model for the segmentation of different types of brain lesions. The foundational model utilizes a Mixture of Modality Experts (MoME) framework with multiple expert nn-UNets attending to different imaging modalities. Training is carried out through a curriculum learning approach, initially emphasizing the predictions of individual experts and gradually shifting focus towards the combined final mask towards the end of the training. The proposed approach was validated on nine datasets (six public and three private) featuring brain lesions. These datasets collectively include five different imaging modalities and eight types of lesions. Authors propose a validation at image, task and dataset level, where the proposed approach outperformed other state of the art models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Original way to use the data: includes the most relevant data sources that include multiple imaging modalities and brain lesion types
    • The use of a MoME framework is a clever approach to handle variability within a specific domain.
    • Plenty of comparisons with state-of-the-art approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of experiments to justify the Foundational model: Foundational models generally serve as the base architecture for developing task-specific models. The authors could have demonstrated the effectiveness of their method by fine-tuning the model on the unseen datasets with limited training samples and assessing its performance on the test set to see if it matches or exceeds that of the task-specific nnUNets.
    • Although the proposed approach achieves better results in most of the comparisons, the differences are low when compared to task-specific nnUNets. The paper would benefit from including tests to determine whether there exists significant differences between approaches.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The method’s usage would benefit from authors releasing the source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The authors did not provide a reason for excluding the FLAIR modality (present in other used datasets) from the ISLES2022 dataset.
    • In the conclusions, I think the paper would likely have benefited if a statement of how the nature of each lesion and the modalities used affected the performance of the proposed approach.
    • In Figure 1c, it is unclear why there is only one input image yet multiple expert outputs are shown. Shouldn’t there be just one prediction when only one image is available, specifically from the expert specialized in that particular modality?
    • nnUNet automatically configures itself, however, providing details such as the patch size for the 3D setting and the type of normalization applied would be useful.
    • Why did the authors train 14 task-specific nn-Unets when the current standard favors networks that handle multimodal inputs? Indeed, relying solely on a single modality for delineating brain lesions can result in false positives (e.g. DWI’s susceptibility to the T2 shine-through effect). A multimodal approach would likely serve as a more robust baseline. The paper would benefit from a discussion of this.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application of the MoME appears to be a promising approach that could benefit from an expanded dataset. The authors did a great effort with the data collection and with the evaluation However, the comparisons with task-specific nnUnets reveal no significant differences. Furthermore, the use of “Foundational” in the paper’s title may lead to confusion. The authors should consider conducting additional experiments to substantiate the model’s effectiveness or revising the title to more accurately reflect the scope of their research.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a universal foundation model for 3D brain lesion segmentation utilizing the Mixture of Modality Experts (MoME) framework, capable of handling multiple modalities. It employs a gating network to merge expert predictions, promoting expertise collaboration, and incorporates a curriculum learraining strategy during training to prevent the deterioration of individual experts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The MoME model is designed to accommodate multiple modalities of brain MRI.
    2. The experiments encompass a wide range, involving nine brain lesion datasets that cover five imaging modalities and eight lesion types.
    3. The results highlight the effectiveness of MoME, showcasing its superiority over other state-of-the-art (SoTA) methods.
    4. The paper provides ablation study to show the effectiveness of each component.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Figure 1c lacks clarity in depicting the hierarchical nature of the gating network.
    2. The loss function L_MoME requires further explanation, particularly regarding how it facilitates collaboration among experts and its underlying rationale.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Please refer to the comments in weaknesses.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to the comments in weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s writing quality is commendable, and its proposed method’s ability to effectively handle multiple modalities within a single framework, resulting in strong performance, justifies my recommendation for acceptance. Furthermore, the comprehensive experiments conducted contribute to the reliability of the study’s findings. But there are still some points could be improved.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Reviewer 1 (R1), Reviewer 3 (R3), and Reviewer 4 (R4) are highly recommended to provide code for reproducibility. We have already prepared to release our code and updated the GitHub link in the paper, and will transfer the repository from private to public before MICCAI proceedings come out.

R1 is also concerned that the model does not explore routing mechanisms between experts. We would like to clarify that the experts here are designed for 3D brain image segmentation, distinguishing them from sparse Mixture of Experts (MoE) architectures for 2D natural images, where a higher number of experts can be afforded and routing is necessary for expert selection. Most of these route selection methods are implemented using top-K strategies, but are actually non-differentiable (Puigcerver, Joan, et al., ICLR 2023). Due to this and the smaller number of experts in 3D-based MoME, we do not explore expert routing in this work.

R3 wonders whether MoME can demonstrate its effectiveness by fine-tuning the model on unseen datasets with limited training samples to achieve comparable performance with task-specific nnUNets. We would like to clarify that the results in Table 2 demonstrate MoME’s ability to infer directly under unsupervised settings without any fine-tuning, achieving performance that exceeds that of supervised task-specific nnUNets on 2 out of 3 datasets.

R3 expresses concern that the proposed approach achieves better results in most comparisons with other foundation models, but the differences are low when compared to task-specific nnUNets. We would like to clarify that the foundation model will benefit more from more training datasets. Although in the current experimental settings, MoME’s performance does not surpass that of numerous task-specific models with significant differences, when the number of datasets increases in the future, MoME can potentially show superiority. Besides, MoME, being only one foundation model, is comparable to 14 task-specific nnUNets. It is more convenient for clinicians to use MoME in real clinical settings, where managing and applying too many task-specific models can be challenging due to patient bias. And we will better clarify this.

R3 indicates that it is unclear why there is only one input image yet multiple expert outputs are shown. Please note that different imaging modalities can be correlated and provide complementary information. For example, FLAIR appears similar to T2 but with fluid being attenuated. They may both contribute to lesion segmentation with different features, and the outputs of an expert that is not specialized in the input modality may still provide useful complementary information (also shown in Supplementary Figure I.b). We will better clarify this in the paper.

R4 suggests describing the hierarchical nature of the gating network in Figure 1c. We’d like to clarify that it’s challenging to present complex details within limited space, but readers can refer to the specifics in the Section 2.2 “Hierarchical Gating Network”. We will better clarify this in the paper.

We will also address any other minor concerns raised by the reviewers.




Meta-Review

Meta-review not available, early accepted paper.



back to top