Abstract

Existing machine learning methods for brain image analysis are mostly based on supervised training. They require large labeled datasets, which can be costly or impossible to obtain. Moreover, the trained models are useful only for the narrow task defined by the labels. In this work, we developed a new method, based on the concept of foundation models, to overcome these limitations. Our model is an attention-based neural network that is trained using a novel self-supervised approach. Specifically, the model is trained to generate brain images in a patch-wise manner, thereby learning the brain structure. To facilitate learning of image details, we propose a new method that encodes high-frequency information using convolutional kernels with random weights. We trained our model on a pool of 10 public datasets. We then applied the model on five independent datasets to perform segmentation, lesion detection, denoising, and brain age estimation. Results showed that the foundation model achieved competitive or better results on all tasks, while significantly reducing the required amount of labeled training data. Our method enables leveraging large unlabeled neuroimaging datasets to effectively address diverse brain image analysis tasks and reduce the time and cost requirements of acquiring labels.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3820_paper.pdf

SharedIt Link: https://rdcu.be/dY6f9

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_40

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3820_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Kar_An_MICCAI2024,
        author = { Karimi, Davood},
        title = { { An approach to building foundation models for brain image analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {421 -- 431}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This is an exciting step towards developing foundation models for medical image analysis to enable more label-efficient and generalizable AI solutions. The proposed innovations are well-motivated and the empirical results make a convincing case for the approach. Some open questions remain around scaling and theoretically grounding certain components, but overall this work moves the field forward in an important direction. Further refining the methodology and scaling up the models could be very impactful.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Innovative approach to building foundation models for brain image analysis by leveraging large unlabeled datasets and self-supervised learning. Has potential to significantly reduce labeled data requirements.
    2. Proposes a novel high-frequency feature encoding method using random convolutional kernels to improve the transformer’s ability to model fine image details. Ablation studies demonstrate the importance of this component.
    3. Impressive extrinsic evaluation on five diverse downstream tasks spanning segmentation, denoising, lesion detection, and brain age prediction. Shows the foundation model can match or outperform task-specific supervised models while using much less labeled data.
    4. Training and evaluation methodology is thorough and well-designed. Model exhibits strong generalization.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the results are promising, the model size and training data used are still relatively modest compared to cutting-edge foundation models in NLP/vision. Scaling up could yield further gains but will require significant compute.
    2. Some key implementation details are missing, like the exact architecture of the transformer blocks. This could hamper reproducibility.
    3. More theoretical analysis of why the high-frequency feature encoding approach works would strengthen the paper. Insights may help refine the technique.
    4. Downstream tasks cover important applications but even more diverse tasks would further demonstrate the power of the foundation model approach for neuroimaging.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This is an exciting step towards developing foundation models for medical image analysis to enable more label-efficient and generalizable AI solutions. The proposed innovations are well-motivated and the empirical results make a convincing case for the approach. Some open questions remain around scaling and theoretically grounding certain components, but overall this work moves the field forward in an important direction. Further refining the methodology and scaling up the models could be very impactful.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see above

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper builds a transformer that can perform different tasks after fine-tuning. The neural network is trained based on two steps: self-supervised learning to learn the image information and the fine-tuning process to adapt to specific tasks. This paper also discusses maintaining the high-frequency feature using random convolutional kernels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The idea of building the transformer to do different jobs is quite interesting. (2) The transformer can also learn a specific job and achieve better accuracy than other published methods under limited data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The lowercase “n” is used twice for different things. One is for the number of small cubes, and the other is for the number of random kernels. (2) The positional encoding P1 leads to some confusion. First, the meaning of introducing “position” is unclear as these random kernels are generated to minimize the mutual relationship. My understanding is that ideally, they should be interchangeable. Second, as the author mentioned, the idea is trying to reduce the dimension of the neural network input. It seems the dimension got reduced from (n+1)*d^3 to d^3, so I am wondering if P1 is a fixed variable, then what is the meaning of the mentioned “positional”? A large P1 means that certain kernels contribute more to the input. (3) The author tries to reduce the brain boundaries effect by applying the model in different directions. What is the meaning of “different directions”? Different ways to construct sequences, like changing from “abcdef…” to “cbafed…”? or rotating the 3D image like data augmentation? Consider a boundary effect where only a small portion of cubes are inside the brain. Why does applying the model in different directions help?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Except for the weaknesses 2 and 3, I don’t have any other comments on the paper’s reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Except for the weakness part, I don’t have other comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well written. The method of creating a general model for multi-tasks is interesting and successful. Though it contains some unclear statements. Compared with previously published papers on MICCAI, which only focus on one job such as “Multiple Prompt Fusion for Zero-Shot Lesion Detection”, “Few Shot Medical Image Segmentation with Cross Attention Transformer” or “Text-Guided Foundation Model Adaptation for Pathological Image Classification” , this work deserved published as it has abilities combined multiple jobs.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a foundation model for brain MRI image analysis, which was built upon 10 public datasets and evaluated on 5 various downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The writing and organization of the paper is generally good.
    2. Training a foundation model for brain image analysis benefits multiple downstream tasks, as demonstrated in the extensive experiments in the paper.
    3. The proposed method, as well as the experiments in the paper is solid.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of some parts of the proposed method and experiments remains unclear.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    For a better understanding of the paper, the authors might want to clarify the following points:

    1. The implementation details of high-frequency encoding - how was the set of candidates selected and encoded? How was the hyperparameter (n=8, N=100) chosen? Convolutional layers also play a role as patch embedding layers in some transformer models. In the ablations, what was the setting of ‘disabling high-frequency’ - was another model with pure patch embedding trained?
    2. In task 5 (the age prediction task), i.e., an image understanding task, the output of the model should be a scalar instead of image-shaped tensors. What modification was applied to the last layer of the foundation model (I do not quite understand “For Task 5: a scalar to represent the brain age” in the paper)? For instance, is it using a token? Or a global average of all the output tokens?
    3. The symbols describing the high-frequency encoding should be replaced - both ‘n’ and ‘N’ have appeared before and already have meanings. The meaning of the convolution symbol should be specified. The equation in the second last line of the paragraph lacks ‘n’ above the summation symbol.
    4. Is the model also working on CT images? Under the current experimental settings, the authors should consider narrowing the title to include “brain MRI image analysis”.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a foundation model for brain MRI image analysis. The experiments are thorough and solid. However, some details of the paper are not clear enough for the readers.

    For the reasons above, I suggest ‘Accept’ for the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top