Abstract

Automatic brain tumor segmentation using multimodal MRI images is a critical task in medical imaging. A complete set of multimodal MRI images for a subject offers comprehensive views of brain tumors, and thus providing ideal tumor segmentation performance. However, acquiring such modality-complete data for every subject is frequently impractical in clinical practice, which requires a segmentation model to be able to 1) flexibly leverage both modality-complete and modality-incomplete data for model training, and 2) prevent significant performance degradation in inference if certain modalities are missing. To meet these two demands, in this paper, we propose M$^3$FeCon (\textbf{M}issing as \textbf{M}asking: arbitrary cross-\textbf{M}odal \textbf{Fe}ature Re\textbf{Con}struction) for incomplete multimodal brain tumor segmentation, which can learn approximate modality-complete feature representations from modality-incomplete data. Specifically, we treat missing modalities also as masked modalities, and employ a strategy similar to Masked Autoencoder (MAE) to learn feature-to-feature reconstruction across arbitrary modality combinations. The reconstructed features for missing modalities act as supplements to form approximate modality-complete feature representations. Extensive evaluations on the BraTS18 dataset demonstrate that our method achieves state-of-the-art performance in brain tumor segmentation with incomplete modalities, especicall in enhancing tumor with 4.61\% improvement in terms of Dice score.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0067_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0067_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

https://www.med.upenn.edu/sbia/brats2018/data.html

BibTex

@InProceedings{Zen_Missing_MICCAI2024,
        author = { Zeng, Zhilin and Peng, Zelin and Yang, Xiaokang and Shen, Wei},
        title = { { Missing as Masking: Arbitrary Cross-modal Feature Reconstruction for Incomplete Multimodal Brain Tumor Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a model for incomplete multi-modal brain tumor segmentation, which can learn approximate modality- complete feature representations from modality-incomplete data. It employs a strategy similar to Masked Autoencoder and is evaluated on the BraTS18 dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It is the first attempt to learn modality-complete feature representations from the training with modality-incomplete data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • the model is evaluated on a single dataset, the BraTS18 dataset. The BraTS challenge has been releasing a new dataset every year, so it is not clear why the authors have chosen an old one.
    • the improvements reported are marginal for most of the cases (table 2) and no statistical analysis was conducted to check whether the obtained differences are statistically significant. The performance difference is particularly small, when compared to the M3AE model, which has fewer parameters (64M). Also, only the DICE was reported, whereas other metrics are usually reported to evaluate segmentation performance (Hausdorff distance, for example).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is well written and clear, with some small typos and mistakes that can be easily correct Ex: We The backbone network is a 3D UNet [3] architecture (Section 3, second paragraph).

    As a minor comment, on Table 2: since the highest scores are highlighted in bold – last line (all modalities): mmF dice for TUMOR CORE is the highest and should be bold (86.23).

    Scales in Fig.3 are misleading, given the impression that the differences in performance are greater than they really are (center plot and right plot).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper present some novelty on the method, the experiments using a single dataset and reporting only DICE as metric resulted in marginal improvement. Evaluating the model on other datasets, using other metrics and running a statistical analysis would be necessary to draw conclusion regarding the superiority of the method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors added statistical analysis, and improved the discussion on the performance of the method.



Review #2

  • Please describe the contribution of the paper

    M3FeCon Framework: The paper proposes M3FeCon, a method that stands for “Missing as Masking: arbitrary cross-Modal Feature ReConstruction.” This framework is designed to handle the issue of missing modalities in multimodal MRI data, which is a common scenario in clinical practice. M3FeCon treats missing modalities as masked modalities and reconstructs their features from the available ones.

    Cross-modal Feature Reconstruction: The authors introduce a strategy akin to Masked Autoencoder (MAE) to perform feature-to-feature reconstruction across arbitrary modality combinations. This allows the model to learn and approximate the complete feature representation even when some modalities are missing.

    State-of-the-art Performance: The paper claims that M3FeCon achieves state-of-the-art performance on the BraTS18 dataset, particularly in enhancing tumor segmentation, with a 4.61% improvement in Dice score.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Improved Segmentation: The paper demonstrates a significant improvement in segmentation accuracy, especially for enhancing tumor areas, which is crucial for accurate diagnosis and treatment planning.

    Fewer Parameters: M3FeCon has fewer parameters compared to some other methods, which could be an advantage in terms of computational efficiency and the potential for faster training times.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Data set problem: The data set is too single, the data set is older, and the 2018 Brats data set is used instead of the data set after 2020.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The last line of #parameters in Tab 1 is strange and needs to be considered for aesthetics.

    • Uncompared paper RFNet: Region-aware Fusion Network for Incomplete Multi-modal Brain Tumor Segmentation

    • Typo: “We The backbone network is a 3D UNet [3] architecture.’’

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental data set is relatively old, and the verification and comparison are insufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper
    • Reconstruction of missing multimodality in the feature level rather than the image level.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Model is robust to the number of missing modalities
    • Random masking is effective and simple
    • Good ablation study
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • More SOTA methods could be compared using all modalities as the upper bound. (e.g. nnUnet or transformer models).
    • No statistical tests when comparing results of methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The dataset used is public: good for reproducibility

    • No code available and lack of details of the backbone model (3D Unet) and the feature reconstruction network, hence can’t reproduce the results.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Last row of Table 1 highlighted the wrong top performer.
    • Add details of the feature reconstruction network.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is simple and effective. However, the paper lack of details of feature reconstruction network and no statistical tests to support the result analysis.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    @Area Chair, This reviewer did not change the score after the rebuttal, I made a mistake edited the wrong review. Unfortunatly, it cannot be undone.




Author Feedback

We thank all reviewers for their constructive comments, and appreciate that they all agree this is a well-written paper and presents novelty. We notice that all their major concerns are related to experiments. However, we strictly follow the experiments setting used in previous related works published on MICCAI, i.e., the same dataset and same metric for evaluation. 1.R1/R4-experiments only on the BraTS2018 dataset is insufficient: —(1) BraTS2018 is the most widely used benchmark in recent related works on this same task [2,4,9,11,12,14,15], including the latest SOTA work M3AE[9] we compared to. More importantly, the recent related works published on MICCAI—KD-Net6, ACN12, MFI15, mmformer14, all conducted their experiments on this single dataset. So we follow them to use this same benchmark in order to make a direct and fair comparison and it is also helpful for readers to directly reference their results in their papers. —(2) We acknowledge suggestions for other available datasets and we will take them into account in our future work. Meanwhile, our model performance on the BraTS2023 (Adult Glioma Segmentation) has 4.29%, 2.16%, 1.47% Dice gains (average of all missing modality cases) compared to previous SOTA M3AE with p value less than 0.01 on all cases. Nevertheless, the BraTS2018 should suffice to be evaluated on because it is still widely used despite released earlier and it shares the same acquisition and annotation protocol with later BraTS dataset and the image and annotation quality are the same. 2.R1-marginal improvement: Our improvement over M3AE has a significant margin. We have achieved 4.61% Dice gains on average over M3AE for enhancing tumor segmentation, with more than 2% Dice gains on 14 out of 15 missing modality cases. This is a really big margin. Besides, on the harder cases with more modalities or important modality (T1c) missing, our method has even larger improvements. For example, we achieve over 10% gains with only {Flair} or {Flair, T2} for enhancing tumor, over 3% gains with {Flair} or {T2} for tumor core. The signifincant improvements on more challenging tasks and cases demonstrate our method’s superiority. Moreover, M3AE employs a two-stage training process which takes 15h longer training time compared to ours under same setting. 3.R1/R3- statistical tests and other metrics: As most previous related MICCAI papers all used the averaged Dice score as the only metric, we follow their setting. Per the reviewers’ request, we provide more results. For statistical significance, the p values with mmF, MD, M3AE tested seperately under each missing modality case are all less than 0.01 with an average of 6.10e-5, 1.27e-05, 7.42e-4 respectively. The average Hausdorff distance results are 14.12, 12.46, 15.28 on the three types of tumor which are clearly lower (better) compared to previous SOTA M3AE (18.26, 15.41, 16.74). 4.R4-uncompared paper RFNet: RFNet was proposed in 2021, and was shown to be outperformed by M3AE proposed in 2023, according to [9]. Since we have shown that our method is better than M3AE, it should also be better than RFNet. Per reviewer’s request, we compare our method with RFNet, and our method has a 7.12%, 2.04%, 1.58% Dice gains on average over RFNet with p values all less than 0.01. 5.R3-Network detail and reproducibility: We will make the code public once the paper is accepted. The detail of the feature reconstruction network is a standard transformer block with a depth of 4, each consisting a self-attention layer followed by a feed forward network. The hidden dimension is 512. 6.R3-upper bound: The upper bound using all modalities with nnunet is 78.17%, 86.69%, 91.33%. Our results are close to the upper bound which shows its robustness to missing modalities. 7.R1-Figure 3: We have labeled the y-axis values on different plots and will adjust the scale to avoid the misleading. 8.Minor mistakes: We will fix them.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces a model for incomplete multi-modal brain tumor segmentation, capable of learning approximate modality-complete feature representations from modality-incomplete data. While some concerns have been raised and partially addressed in the rebuttal, the overall technical contribution is sound, and it addresses a clinically important question.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper introduces a model for incomplete multi-modal brain tumor segmentation, capable of learning approximate modality-complete feature representations from modality-incomplete data. While some concerns have been raised and partially addressed in the rebuttal, the overall technical contribution is sound, and it addresses a clinically important question.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers largely agree with the novelty of handling missing modalities and the simplicity of the proposed masking strategy.

    The authors generally did a good job in clarifying raised issues (incl. statistical analysis) in the rebuttal. There is one argument from R4 that the data used is a bit too old. I agree with this but at the same time understand the significant additional complexity needed for new benchmark.

    I vote for accept and encourage the authors to include the new clarification in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers largely agree with the novelty of handling missing modalities and the simplicity of the proposed masking strategy.

    The authors generally did a good job in clarifying raised issues (incl. statistical analysis) in the rebuttal. There is one argument from R4 that the data used is a bit too old. I agree with this but at the same time understand the significant additional complexity needed for new benchmark.

    I vote for accept and encourage the authors to include the new clarification in the final version.



back to top