Abstract

Accurate quantification of multiple sclerosis (MS) lesions using multi-contrast magnetic resonance imaging (MRI) plays a crucial role in disease assessment. While many methods for automatic MS lesion segmentation in MRI are available, these methods typically require a fixed set of MRI modalities as inputs. Such full multi-contrast inputs are not always acquired, limiting their utility in practice. To address this issue, a training strategy known as modality dropout (MD) has been widely adopted in the literature. However, models trained via MD still under-perform compared to dedicated models trained for particular modality configurations. In this work, we hypothesize that the poor performance of MD is the result of an overly constrained multi-task optimization problem. To reduce harmful task interference, we propose to incorporate task-conditional mixture-of-expert layers into our segmentation model, allowing different tasks to leverage different parameters subsets. Second, we propose a novel online self-distillation loss to help regularize the model and to explicitly promote model invariance to input modality configuration. Compared to standard MD training, our method demonstrates improved results on a large proprietary clinical trial dataset as well as on a small publicly available dataset of T2 lesions.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2509_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2509_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Nov_Ataskconditional_MICCAI2024,
        author = { Novosad, Philip and Carano, Richard A. D. and Krishnan, Anitha Priya},
        title = { { A task-conditional mixture-of-experts model for missing modality segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors use multiple magnetic resonance (MR) scans with different modalities (T1w pre-contrast, T1w post-contrast, T2w, FLAIR) to automatically segment T2 brain lesions on multiple sclerosis (MS) patients. To perform this task, they use a custom 3D-UNet architecture with mixture-of-expert layers and self-distillation as a loss function. During training, they also investigated a data augmentation strategy called modality dropout to improve the robustness of their model when not all the modalities are available.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written. MS lesion segmentation is a challenging task.
    • By relying on multiple modalities they enhance the contextual information of their model.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This article lacks essential information about the methodology:

    • The main issue of multi-contrast segmentation is the mis-registration between contrasts. This paper does not address this issue. There is not even any mention of this problem. Were all three contrasts already co-registered before running the model training? Without knowing this crucial information, the entire study cannot be properly assessed.
    • Regarding the network, schemes of the architecture with the Mixture-of-Expert (MoE) blocks should be present to help the reader in its understanding.
    • The number of epochs used for training is also missing.

    Results:

    • Segmentation examples should be added in the result section
    • The best dice scores should be written in bold to make them more readable.

    Evaluation:

    • The evaluation could have been done on a different dataset.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The type of registration used to pair the different modalities should be present in the methodology.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Some important methodology aspects are missing to improve the reproducibility of the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See #6

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    reviewers comments are appropriately addressed



Review #2

  • Please describe the contribution of the paper

    The aim of this work is to make MS lesion segmentation work on a range of contrasts in a flexible way by framing the problem as a multi-task problem. A method is created that allows for different mixtures of results from different input modalities within each block, governed by a controlling weight vector. Results are compared to modality dropout and one other method (dynamic stem), using two different datasets and a set of three commonly used modalities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength is the novel approach of considering problem in the multi-task context, which helps to provide a different viewpoint and open up options for dealing with multiple modalities. This is an important topic as there are good improvement gains to be made when combining modalities, as shown by the DSC results in Tables 2 and 3.

    Another strength is in explicitly encouraging invariance to input modality configuration through a self-distillation loss, which is a useful contribution.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness in the work is the limited improvements obtained in the results and the lack of reporting some other commonly used measures of performance, such as F1. Although two other methods were used for comparison, at different points, it is not shown how these compare with SOTA results in this area. Also, although some summary statistics are provided, no statistical tests are performed. Reproducibility is also limited, as some parts of the description are not clear, there is no promised of source code availability, and the only sizeable dataset used is proprietary.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The exact configuration of the network is not described well enough to be sure that someone could reimplement something that is exactly the same. It would certainly be possible to get close, but some details would need to be guessed. For instance, it is unclear in main text or in Figure 1 what exactly E1 … EN represent - they are described as “standard convolutional layers” but the size in terms of the number of channels/filters is not clear, nor is how the stem layer fits with the other blocks and any associated activation or normalisation layers in between. An overall network diagram or numerical/code description would have helped, as would have access to source code, which is not mentioned.

    In addition, one dataset is proprietary, without indication of whether it might be available to other researchers in any way, now or in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The improvement in DSC in Table 1 appears to be statistically significant, although no statistical tests were performed. However, the magnitude of the improvement is quite limited and it is difficult to assess whether this is of real benefit to any downstream use. In addition, other commonly reported metrics for such tasks (e.g. F1) are not reported, which would have been good to also assess if there was benefit beyond this small DSC change.

    Comparison with different methods is limited to Dynamic stem in Table 1 and in-house implementations of MD and DM in Tables 2 and 3. The SOTA methods and performance in these datasets is not provided or referenced, although possibly only available in the second as the proprietary dataset may not have been used before. Hence it is hard to know if the Dynamic Stem, MD and DM models are competitive with SOTA. It is not necessary for these methods to equal or beat the SOTA, as long as their results are roughly in the same range as SOTA then this would not be a negative factor. However, if their results are a long way away from SOTA then that limits the impact of demonstrating the effectiveness of this novel method.

    The statement that it is too impractical to train for each possible input modality is made without backing. Since there are only a few modalities that are commonly used, such as the ones present in this work (T1, T1+contrast, T2, FLAIR, MT) it does not seem that bad to try and train for each one, as the set is not that big. Maybe the authors meant something a little different with this statement, but I do not find it convincing.

    Hyper-parameters alpha and beta are set to fixed values and no explanation given to how these values were obtained beyond that they were “selected heuristically” or what affect they have when varied. It would be good to know a bit more about how they were chosen and whether the authors have any information, even anecdotally, of whether these exact values are crucial or whether the dependence on them is relatively weak.

    It is unclear why random noise was added to modality codes and how important this was.

    It would have been helpful to see the explicit mathematical loss function that combined cross-entropy and Dice or have a reference to this provided.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main improvements in the results that are reported relate mainly to combining modalities rather than the novelties introduced in this work. Since the benefits are relatively limited, and no statistical tests are provided to confirm these small benefits, the interest in this method is likely to be limited.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The rebuttal helpfully addresses many of my queries, such as statistical tests, metrics, SOTA comparison, network details and stability with respect to hyper-parameter variations. This gives me more confidence in the work and has resulted in an improved rating.



Review #3

  • Please describe the contribution of the paper

    This paper addresses the problem of medical image segmentation with missing MRI modalities for accurate quantification of Multiple Sclerosis (MS) lesions. The authors proposed a method where task-conditional mixture-of-expert (MoE) layers are incorporated into the segmentation model to leverage different model parameter subsets for different modality configurations. The model also utilizes an online self-distillation scheme to improve the T2 lesion segmentation performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The incorporation of task-conditional MoE layers for learning different parameter subsets for different modality configurations, along with an online self-distillation loss is novel.
    • The paper establishes an explicit connection between modality dropout and multi-task learning, thereby eliminating the disadvantages of multi-task learning of sharing model parameters.
    • Each expert in MoE learns parameters specific for a modality configuration.
    • The proposed method outperforms the model with modality dropout training strategy.
    • The performance of the proposed method is very close to the dedicated models specifically trained for a given modality configuration. This indicates that the proposed model could be used for many clinical segmentation tasks where the input MRI has certain missing modalities. Hence, this could be a clinical useful model for several disease diagnosis tasks.
    • The paper is well written and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Visualizations/Qualitative comparisons of the predictions from the proposed model with Ground-Truth, MD and DM are missing.
    • More details about the expert learning a specific modality configuration should be provided. In other words, there are “N” experts and “M” different modality configuration. What is the correspondence between these? Is one expert learning one configuration or multiple configurations? Or, is one modality configuration learned by more than one experts or just a single expert? Unless N=M, either of the above cases should be true. This should be explained in more detail.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors must publicly release the code after the review period.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please expand the MS abbreviation when first used in the abstract.
    • Just above Equation 7, in the modality-dropped version, the subscript should be “b” and not “i”.
    • The online self-distillation loss could also be addressed by minimizing the KL-divergence between the full modality and modality-dropped output. The authors could try this in the future and compare with the proposed method.
    • In Implementation (Section 3.2), last paragraph and third last sentence: “Parameters alpha and beta in Equation (8) …”. There is no Equation (9), replace by (8).
    • Please refer to the weaknesses section for more detailed comments.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The strengths of this paper outweigh its weaknesses. The authors of this paper are solving an important and difficult problem. The proposed method could have good clinical usage and hence a good addition to the research community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed all my questions and concerns and have also adequately addressed the concerns of other reviewers. This makes it a good paper in the challenging missing modality segmentation research area. I would rank this paper 2 out of 2 papers in my review stack (excluding early accept/reject papers). The authors should correct the minor errors in the camera-ready version according to my comments in the original review.




Author Feedback

We thank the reviewers for their valuable comments, and appreciate that they found our method interesting and novel.

Methods • Improved clarity of implementation:

  • (R1, R4) We will include a much-improved Figure 1 depicting the exact UNet architecture (including order and type of activation and normalization functions used) and the exact placement of MoE layers.
  • (R1) The number of model training steps was originally indicated in the results section 3.3, but for improved clarity we will move this information to the implementation section 3.2.
  • (R4) We will mention that parameters alpha and tau were found to be insensitive in ranges [0.01,1] and [1,4] respectively. We will provide a reference for the Dice+CE training loss function. • (R4) Comparison with SOTA methods: we are not aware of previously published missing-modality results on the MSSEG dataset. [1] reports that with T1w, T2w, and FLR inputs the SOTA nnUNet model has a median Dice coefficient of 0.76 on a specific train/test split, whereas our corresponding dedicated model has a median Dice coefficient of 0.77 using a 3-fold cross validation. While this comparison isn’t perfect, we hope that we have convinced the reviewer that our baseline UNet model is competitive. • (R1) Pre-processing/co-registration: we acknowledge the essential role of co-registration between contrasts as a pre-processing step. Both the datasets were previously pre-processed by the respective dataset owners. For MSSEG-2016, all pre-processing steps are described in the in-text citation. For OPERA, the pre-processing pipeline included N3 bias field correction and intra-patient MI-based rigid registration using the MINC package. We thank the reviewer for pointing out that these details were absent and we will clarify these important details in the text. We would also like to emphasize that all compared models use the same preprocessed data for training and evaluation, and that any differences in performance cannot therefore be attributed to differences in pre-processing. • (R3) Mixture-of-expert specialization: we observe that experts tend to be sparsely (1 or 2 experts get most of the weight) activated as a function of the task, with other experts contributing with small but non-zero weights. We also observe that single experts may have large weights for more than one task. We will include an additional figure with example expert weight distributions as a function of input configuration in the supplementary file.

Experiments & results • (R1, R3) We will include visual/qualitative segmentation results in the main text. • (R4) Metric: we acknowledge that the Dice coefficient captures only one aspect of performance. In response, in the camera-ready version we will include results using a previously validated [2] multi-metric composite score (including lesion wise true/false positive rate and volume difference) in Tables 2 and 3. We found that using composite score confirms the findings of the paper, with the proposed method outperforming dedicated models on 4/7 modality configurations in both datasets. • (R4) Statistical tests: we will include Wicoxon p-values (< 4e-4 for DSC, < 5e-5 for composite score, comparing MD vs proposed method for both datasets). • (R4) It is stated that “the main improvements in the results relate mainly to combining modalities rather than the novelties introduced in this work”. We are not certain what is meant here by the reviewer. While it’s true that the performance of all models is still sensitive to the choice of input modalities (particularly when segmenting subtle MS lesions), our method shows clear improvements compared to modality-dropout across all modality configurations in both datasets.

References [1] Gentile, G., et al. “BIANCA‐MS: An optimized tool for automated multiple sclerosis lesion segmentation.” Human Brain Mapping (2023). [2] Carass, A., et al. “Longitudinal multiple sclerosis lesion segmentation: resource and challenge.” NeuroImage (2017)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper tackles the issue of medical image segmentation when dealing with missing MRI modalities in order to accurately quantify MS lesions. In the initial review, the reviewers raised several concerns regarding implementation clarity, experimentss, and other aspects. However, these concerns have been effectively addressed in the rebuttal. Therefore, the submission is considered acceptable for publication.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper tackles the issue of medical image segmentation when dealing with missing MRI modalities in order to accurately quantify MS lesions. In the initial review, the reviewers raised several concerns regarding implementation clarity, experimentss, and other aspects. However, these concerns have been effectively addressed in the rebuttal. Therefore, the submission is considered acceptable for publication.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers acknowledged the paper’s novelty to handling missing MRI modalities using a mixture-of-experts model. However, they raised significant concerns about methodology details, evaluation rigor, and the clarity and completeness of the reported results. After rebuttal, they were mostly satistified with the responses from the authors and raised the scores.

    I vote for accept for this paper considering the potential clinical applicability and novel methodological approach.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers acknowledged the paper’s novelty to handling missing MRI modalities using a mixture-of-experts model. However, they raised significant concerns about methodology details, evaluation rigor, and the clarity and completeness of the reported results. After rebuttal, they were mostly satistified with the responses from the authors and raised the scores.

    I vote for accept for this paper considering the potential clinical applicability and novel methodological approach.



back to top