Abstract

Multiple Sclerosis (MS) is a chronic and severe inflammatory disease of the central nervous system. In MS, the myelin sheath covering nerve fibres is attacked by the self-immune system, leading to communication issues between the brain and the rest of the body. Image-based biomarkers, such as lesions seen with Magnetic Resonance Imaging (MRI), are essential in MS diagnosis and monitoring. Further, detecting newly formed lesions provides crucial information for assessing disease progression and treatment outcomes. However, annotating changes between MRI scans is time-consuming and subject to inter-expert variability. Methods proposed for new lesion segmentation have utilized limited data available for training the model, failing to harness the full capacity of the models and resulting in limited generalizability. To enhance the performance of the new MS lesion segmentation model, we propose a self-supervised pre-training scheme based on image masking that is used to initialize the weights of the model, which then is trained for the new lesion segmentation task using a mix of real and synthetic data created by a synthetic lesion data augmentation method that we propose. Experiments on the MSSEG-2 challenge dataset demonstrate that utilizing self-supervised pre-training and adding synthetic lesions during training improves the model’s performance. We achieved a Dice score of 56.15±7.06% and an F1 score of 56.69±9.12%, which is 2.06% points and 3.3% higher, respectively, than the previous best existing method. Code is available at: https://github.com/PeymanTahghighi/SSLMRI.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0920_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0920_supp.pdf

Link to the Code Repository

https://github.com/PeymanTahghighi/SSLMRI

Link to the Dataset(s)

https://portal.fli-iam.irisa.fr/msseg-2/ https://portal.fli-iam.irisa.fr/msseg-challenge/

BibTex

@InProceedings{Tah_Enhancing_MICCAI2024,
        author = { Tahghighi, Peyman and Zhang, Yunyan and Souza, Roberto and Komeili, Amin},
        title = { { Enhancing New Multiple Sclerosis Lesion Segmentation via Self-supervised Pre-training and Synthetic Lesion Integration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Summary: This paper introduces an approach that integrates a self-supervised learning model based on image inpainting with synthetic lesion augmentation to train a segmentation model for multiple sclerosis lesions. The authors conducted experiments using a publicly available dataset and reported a 2% point improvement in Dice score. However, the description of the method lacks clarity, and the comparative experiments appear to be unfairly conducted. Despite these issues, the experimental codes and models have been made publicly available.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is a new method composed of two modules, applied to the problem of Multiple Sclerosis Lesion Segmentation

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty of combining a self-supervised learning model based on image inpainting with synthetic lesion augmentation is limited. Both modules are already exist. Merely combining these methods does not necessarily create a novel or innovative approach without demonstrating significant new insights or improvements.
    2. The details of the experimental setup are unclear. Additionally, there is concern about potential overlap between the dataset used for training the self-supervised model (MSSEG) and the experimental dataset used for training and testing (MSSEG-2). This overlap could influence the validity of the results.
    3. The manuscript, particularly the Methods section, is poorly written and difficult to understand.
    4. The comparisons of results appear to be biased. The performance metrics for the pre-activation UNet are cited from its original publication, whereas the performances of Coact and SNAC do not align with the metrics reported in their respective papers. This difference suggests that the data splits between this study and the original studies are not consistent, potentially leading to unfair comparisons and conclusions.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provided relevant codes, but the model checkpoints are not available to download.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. When referring to improvements of 2.06% and 3.3%, the authors should write as increases of 2.06% points and 3.3% points in Dice score.
    2. The study utilizes part of the MSSEG-2 dataset for testing and the MSSEG dataset for self-supervised pre-training. However, potential dataset overlap could exist, which poses a risk of data leakage. According to the official MSSEG-2 website, all images from GE scanners were excluded from the MSSEG-2 training set. It is crucial to address whether similar exclusions or precautions were taken to prevent data leakage in this study.
    3. The manuscript states that the MSSEG-2 dataset includes 40 MRI scans from 40 patients, each having scans from two time points. It is essential for the authors to clarify how these scans were utilized in the experiments. Specifically, were both time points used in the training and testing processes, or was only one time point considered?
    4. The Methods section of the manuscript is poorly written and difficult to understand. It needs better organization and clearer explanation. For instance, the role of self-supervised learning appears to involve reconstructing the original image from a blurred image, with operations op1 and op2 suggesting some form of cycle consistency. If this is accurate, a more straightforward explanation could greatly enhance readability and comprehension. Please revise it to ensure it is accessible and understandable to readers.
    5. Figure 2 in the manuscript is not very clear due to its complex line connections. It may be more beneficial to use cleaner, simpler diagrams to better convey the intended information.
    6. The structure of the paper could be improved by streamlining the Methods section and expanding the sections devoted to result analysis and discussion.
    7. The authors report the performance metrics for pre-activation UNet from its original publication, but the performance of the Coast model does not align with its originally reported metrics. This difference suggests a potentially unfair comparison, likely due to differences in data distributions used in this study compared to the original studies. It is important to ensure that all models are evaluated under comparable conditions to provide a fair and accurate comparison.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The Methods section of the study lacks clarity, and the comparison experiments do not seem to be conducted fairly. Therefore, the purported improvements are questionable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thank you for the author’s responses. Regarding the potentially unfair comparison, theoretically, the author should train PreUNet on your training split rather than using the results from the original paper, as the data splits will differ from your experiments. However, since the results are derived from cross-validation, so it’s kind of ok. For the potential overlap between the MSSEG and MSSEG-2 datasets, I agree with the authors’ explanation, and it should be fine. I suggest the authors revise their manuscript to enhance clarity.



Review #2

  • Please describe the contribution of the paper

    (1) A self-supervised pre-training paradigm is proposed to enhance the performance of the lesion segmentation model. (2) A new augmentation strategy is proposed for creating synthetic white matter lesions. (3) It achieves state-of-the-art results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The method proposed in this paper achieves good performance. (2) The self-supervised pre-training scheme based on image inpainting is proposed for the new lesion segmentation task. (3) The code is available publicly.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) It lacks innovative methods and appears more like an engineering word. (2) The language expression is unclear. (3) The method framework diagram is difficult to understand. The diagram does not explain the meaning of each formula or variable.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The self-supervised pre-training method should more innovative.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See main strengths and main weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this work a new pre-training scheme for detecting the appearance of new multiple sclerosis (MS) brain lesions in follow-up FLAIR MRI is proposed. The pre-training scheme consists of a self-supervised method that uses image inpainting as a pretext training task. Additionally, a data augmentation strategy is used to include new white matter lesions, using synthetically generated and dataset lesions. In the lesion segmentation model, a difference map is used for longitudinal detection of the MS lesions, and a new consistency loss between symmetric lesions is introduced. The MSSEG dataset was used for model pre-training, and five-fold cross validation was performed with the MSSEG-2 dataset. Results show improved Dice performance of 2.06%, and F1 of 3.3%. for detection of new MS lesions compared to previous methods (Coactseg and SNACT).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors propose a new self-supervised pre-training method in the context of longitudinal MS lesion segmentation. They introduce a composite loss function that considers: image reconstruction error, consistency between symmetric lesions and reconstruction performance explicitly in the inpainted areas.
    • The ablation study demonstrated the contribution of the pretraining and the loss optimization terms.
    • The method and implementation is clearly explained, making the work readable and reproducible. Loss functions are defined intuitively.
    • Aim of the work, literature review and discussion are written with clarity.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Given the standard deviations and the magnitude of the improvements, in comparison with previous methods (Table 1). The authors should consider showing the performance distribution across folds or across patients to get a better overview of the results. Statistical analysis would also be encouraged.
    • The data augmentation method is not compared with other approaches for synthetic lesion generation. Previous methods should be mentioned in the references (e.g. https://doi.org/10.1117/12.2613283).
    • Lack of clarity: authors should mention how the alternative methods were implemented and evaluated, it is not clear if they used the same cross-validation scheme as their method.
    • The performance of the model without synthetic lesions is not presented.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Good description of the dataset preparation, pre-training methods and evaluation scheme are provided, as well as code-repository for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • For future work, as mentioned by the authors, the application of the proposed scheme could be extended for detection of changes/disappearance of already existing lesions.
    • Future hyperparameter optimization is recommended to evaluate optimal weighting of the multiple loss terms.
    • Evaluating multiple configurations for downstream training the pretrained model (e.g. freezing weights of certain modules) could be explored for better performance.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors make a solid development of a new approach for segmentation of MS lesions in longitudinal brain MRI. They address the relevant issue of data-scarcity and they leverage self-supervised deep learning for improving SOTA results. The method is promising for future evaluation in other larger longitudinal MS lesion MRI studies.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    My concerns about the evaluation against SOTA methods were clarified. I agree with the other reviewers that Figure 2 could be simplified. The proposed pre-training and data augmentation tasks for longitudinal datasets are interesting for the community.




Author Feedback

We thank all the reviewers for their constructive feedback, we have grouped concerns raised by the reviewers and provided answers to each below: Innovation of the proposed methods (R1, R3): While R4 commented positively about the novelty of our work, R1 and R3 raised concerns. Masking part of the input and asking the model to reconstruct the masked parts exists as a pretext task [23,3] targeting single 2d or 3d image classification or segmentation as the downstream task. The loss function is usually a simple l1 loss. The novelty of our method is in formulating this pretext task for finding the changes between the baseline and follow-up MRIs. Here, instead of a single masked input that the model had to reconstruct the masked parts, we had two inputs. For one, we masked random parts (eq1) and randomly added Gaussian noise, blur and Gibbs noise. Then, the task of the model was to highlight these masked regions, which indicate the regions that changed (op1 and op2 in eq2). This is done with a novel combination of loss functions (eq3-6) and training procedure (sec 2.1). To our knowledge, this has not been studied before . Our downstream task then utilized two timepoint MRI scans available in the MSSEG-2 train set to train and evaluate our model to detect new lesions (sec 2.3). There are several works on synthetic lesion augmentation in MRI scans. In LesionMix [10] and CarveMix [4], they used already available lesions in the dataset to add synthetic lesions, and in [5], they utilized GAN to generate synthetic lesions, which is not feasible to do during the training loop. However, our method did not depend on the dataset’s already available lesions. It could generate random lesion shapes and textures on its own, placing them randomly on white matter. Since we utilized basic image processing operations (sec 2.2), we could use this method during the training loop. Perhaps the most similar paper to our method is “Label-Free Liver Tumor Segmentation” Hu et al. proposed for synthetic tumor generation in CT scans, in which we modified their method to generate synthetic white matter lesions. Fairness in comparison with other methods (R3 and R4): Since we only had access to the MSSEG-2 training dataset, we evaluated our method using 5-fold cross-validation on this dataset. In experiments, we compared our method with three different research studies, all of which have published their code. However, only PreUnet has published their 5-fold cross-validation results on the training set of MSSEG-2. Coact reported their result on a single fold of train set of MSSEG-2, and SNAC utilized the test set of MSSEG-2, which we did not have access to. Hence, we decided to use the same folding split as PreUnet, which was done with the MONAI CrossValidation function with a seed=42 to avoid retraining their method. We utilized this folding split to train our model, Coact and SNAC. Hence, we assure R3 and R4 that all the comparisons are based on an equal folding split. We will also add these details in the final version and thank R3 and R4 for pointing out this matter. Potential overlap between MSSEG-2 and MSSEG dataset (R3): The reviewer’s concern is valid since having an overlap between the pretraining and training datasets could invalidate the results. We have considered this observation in our experiments and emailed organizers of the MSSEG-2 challenge (Emails available on the challenge website), and they confirmed that there was no overlap between the MSSEG-2 and MSSEG dataset. We thank R3 for pointing out this matter and will include this detail in the final version. Clarity: While R4 praised the clarity of the writing, R3 and R1 expressed concerns. We thank R1, R3, and R4 and assure all reviewers and area chairs that we will make detailed revisions, focusing on the issues raised by R1 and R3, to simplify the methods section and diagram in the final version, enhancing both clarity and quality.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper addresses a significant task - the longitudinal lesion segmentation problem. The authors did a good job in clarifying the concerns from R3 and R4. The technical novelty of pre-training and data augmentation tasks is a bit limited but I agree with R4, for longitudinal datasets, they are interesting for the community. I vote for accept considering the relevance in applications and reasonable techiuques.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper addresses a significant task - the longitudinal lesion segmentation problem. The authors did a good job in clarifying the concerns from R3 and R4. The technical novelty of pre-training and data augmentation tasks is a bit limited but I agree with R4, for longitudinal datasets, they are interesting for the community. I vote for accept considering the relevance in applications and reasonable techiuques.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I have two major concerns about this work. First, I agree that there is some level of novelty, but the novelty is moderate. The idea of pretraining is not new, as acknowledged by the authors. The authors have developed a new but heuristic pretext task for the longitudinal segmentation task. Similarly for the synthetic lesion generation, image mixing is also not a new idea, and the authors again develop a heuristic strategy for the generation. Second, I have some concerns about the ablation results. Without the consistency loss, the Dice score is only 51.25, which is worse than both Coact and SNAC. Does this suggest that the other three contributions are not effective? Would combining Coact or SNAC with the consistency loss yield even better results?

    Despite these concerns, the paper addresses the longitudinal lesion segmentation problem with an interesting perspective and some level of evidence of efficacy. I am slightly more inclined to recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I have two major concerns about this work. First, I agree that there is some level of novelty, but the novelty is moderate. The idea of pretraining is not new, as acknowledged by the authors. The authors have developed a new but heuristic pretext task for the longitudinal segmentation task. Similarly for the synthetic lesion generation, image mixing is also not a new idea, and the authors again develop a heuristic strategy for the generation. Second, I have some concerns about the ablation results. Without the consistency loss, the Dice score is only 51.25, which is worse than both Coact and SNAC. Does this suggest that the other three contributions are not effective? Would combining Coact or SNAC with the consistency loss yield even better results?

    Despite these concerns, the paper addresses the longitudinal lesion segmentation problem with an interesting perspective and some level of evidence of efficacy. I am slightly more inclined to recommend acceptance.



back to top