Abstract

Emerging evidence from advanced neuroimaging study suggests common neurological bases across different brain disorders (BD) throughout the human lifespan. Researchers thus aim to create a general neuroimaging-based diagnosis model for population-scale screening for multiple BDs. Existing models predominantly use the transfer learning paradigm for BD tasks based on either out-of-domain models pre-trained with large-scale but less-related data and tasks or in-domain models pre-trained on healthy population brain data with auxiliary tasks such as age prediction. The former approach has few recognition of inter-individual variations and BD-related features in the population-scale brain data, while the latter relies on weak implicit association between the proxy and BD tasks. In this work, we propose a two-stage vision-language model adaptation strategy to incorporate novel knowledge into the out-of-domain well pre-trained model (e.g., BLIP) by aligning basic cognition and brain structural features for accurate diagnosis of multiple BDs. First, using life-span Human Connectome Project data, we textualize the demographics and psychometrics records and construct knowledge-injecting textual prompts (with important cognitive science contexts). The model is expected to learn the alignment between brain structure from images and cognitive knowledge from texts. Then, we customize knowledge-reactivating instructions and further tune the model to accommodate the cognitive symptoms in each BD diagnosis task. Experimental results show that our framework outperforms other state-of-the-art methods on three BD diagnosis tasks of different age groups. It demonstrates a promising and feasible learning paradigm for adapting large foundation models to the cognitive neuroscience and neurology fields.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1779_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1779_supp.pdf

Link to the Code Repository

https://github.com/openmedlab/BrainSCK

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_BrainSCK_MICCAI2024,
        author = { Wang, Lilong and Liu, Mianxin and Zhang, Shaoting and Wang, Xiaosong},
        title = { { BrainSCK: Brain Structure and Cognition Alignment via Knowledge Injection and Reactivation for Diagnosing Brain Disorders } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript proposes a two-stage vision-language model adaptation strategy for incorporating cognitive science contexts into pre-trained models: consisting of knowledge-injecting prompt for pretraining, and knowledge-reactivating for downstream task fine-tuning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Using textual prompts for injecting cognitive science context into pre-trained models is a creative and reasonable way to leverage neuroscience knowledge.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have two major concerns about this manuscript:

    1. There are no references or details provided for selecting subjects from the ABIDE, ADHD-200, and ADNI datasets. This lack of information could pose a problem for reproducibility.
    2. The significance of the results is questionable. The proposed method did not consistently outperform the comparative methods on the three classification tasks, and the improvements are not significant, particularly in terms of the F1 score, which is more critical than accuracy for imbalanced classification problems. Additionally, this work used 70% of the samples from the target datasets for training, and therefore it should be compared with methods in traditional settings, such as [1], which uses a linear and interpretable method that achieved over 72% accuracy for Autism classification on ABIDE. Moreover, it is unclear whether the train-test split was randomized and repeated. My understanding is that it was not, as no standard deviation was reported. Therefore, the results are less convincing.

    [1] Kunda, M., Zhou, S., Gong, G., & Lu, H. (2022). Improving multi-site autism classification via site-dependence minimization and second-order functional connectivity. IEEE Transactions on Medical Imaging, 42(1), 55-65.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    My comments on weaknesses: 1.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Compare with simple classification methods in a traditional setting.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting, but the experiments are questionable.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a paradigm of brain disorder classification that involves vision-language models (VLM). T1w MRI is encoded via an image encoder, while demographic information and cognitive test scores / psychometrics are converted into text to train the VLM. The VLM was pre-trained on multiple HCP datasets (development, young adults, aging) and evaluated on multiple datasets of brain disorders (ADHD, Autism, Mild cognitive impairment).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Model pre-trained on large datasets, evaluated on multiple datasets of brain disorder
    • Fresh approach for brain disorder classification, few/none have used VLMs for this task
    • Introduction is nicely written, considering that they are proposing an approach that deviates from the norm
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Technical novelty is arguably limited (in the sense of it being an application of existing VLM to a new task)
    • Technical details are not presented fully (e.g. what exactly does L_{ITC} and L_{LM} involve)?
    • Result analysis could have been more robust (see details below).
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    -

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The point that BrainSCK can outperform SOTA models even with 30% of training data seems good, but conversely it also makes one wonder: why model performance did not improve much when more than 30% of the training data was used?
    • The authors mentioned that “absolute levels of metrics are not high” and attributed it to site effects, “difficulty of MCI diagnosis” and limited pre-training data size. These could have been backed with further experiments. Specifically, (i) if site-effect is thought to be an issue, it is straightforward to perform the experiment on only the largest site from the dataset, (ii) it is understandable that MCI is difficult to predict from T1w but it should be noted that existing papers using T1w data from ADNI could achieve slightly better results [1].

    Typos

    • Abstract: “predominately” -> predominantly
    • Section 2.3, paragraph 2: “When inference” -> During inference
    • Section 3.3 “Evalidation metrics” -> Evaluation metrics
    • Page 7, paragraph 2 “remarkable promotion” -> notable improvement

    [1] Wen, Junhao, et al. “Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation.” Medical image analysis 63 (2020): 101694.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the design of the architecture is sound, experiments are sufficiently robust and the proposed approach is fresh - it is an interesting departure from the usual approach in the existing literature. I would think that it will inspire numerous extensions.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Introduce LLM and multimodality data of text and image for diagnosing brain disorders.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Propose a novel knowledge-injecting and knowledge-cognitive strategy to enhance model performance on out-of-domain data. 2.Incorporate textual data from psychometric records to construct prompts.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Since the paper did not provide the experimental settings for other methods, the comparison with other methods, such as UniFormer and Med3D, may not be fair. It is unclear whether text information was used when implementing other methods and, if so, how it was utilized.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the idea of utilizing multimodality data and LLM is inspiring and interesting. The experimental results also demonstrate the effectiveness of the proposed method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The trend of using LLM and multimodality for medical image analysis is undeniable, and the idea and method proposed in this paper could be inspiring for the field. I will update the rating according to the rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the efforts and high evaluations from the AC and three reviewers. We have read all the constructive suggestions from the reviewers and will make the following revisions in the final version of this paper.

  1. We will revise the typo and the other writing issues.
  2. The descriptions about data, subject selection, and technical details will be enhanced. Together, we will include the GitHub link to the code repository with more detailed reproduction guidance in the camera-ready paper.
  3. We will add more discussion on the limited performance of our method. The proposed method is currently preliminary, which has space for further improvements and more comprehensive evaluations.
  4. Taking this chance, we would like to respond to some specific comments. Review #1: (i). We include all data from ADHD-200. For ABIDE, we select all “Autism” and “Typical development” subjects. For ADNI, we select the latest scan of “HC” and “MCI” subjects. We will release the subject list in the GitHub. (ii). F1-score is a comprehensive metric but emphasizes positive class more, as it is based on recall and precision. Here, we consider Cohen’s κ, which measures inter-rater consistency for categorical items, which is expected to be more informative. In terms of Cohen’s κ, our method achieves top performance in all the tasks. (iii). The Ref. [1] is identifying autism based on functional connectivity while our work is utilizing T1 images. The information about brain dynamics from functional connectivity could be more sensitive to reflect autism, being long classified as psychiatry, than structural changes. Therefore, the modality could be the origin of the performance gap. We will also try to extend our framework to fMRI/functional connectivity, which is interesting and promising.

Reviewer #3: (i). The performance is actually slightly improved after using 30% of the training data. However, we will locate the bottleneck that prevents the improvements after more than 30% of the training data was used and revise our method to achieve significant and continuous growth. These future works will likely appear in an extended journal version. (ii). Using only the largest site from the dataset could avoid the site-effect but could also significantly reduce the sample size for testing, which is not in line with our initiative to test the model generalization. (ii). The main focus of this study was to improve the transfer of the general foundation model to multiple brain tasks. In Ref. [2], most of the methods are specifically designed for AD. As a result, we may face a trade-off between generalizability and specificity, which demands further studies. We also note that Ref. [2] found that more than half of the surveyed papers may have suffered from data leakage and thus reported biased performance. Considering other deviations, such as data pre-processing and validation scheme, the baseline performance of MCI identification in our dataset and setting may not be as high as the intuition. This is indeed a significant issue in method comparison and development. We appreciate the sharing of Ref. [2] and would try the proposed validation scheme for a reasonable comparison.

Reviewer #5: UniFormer and Med3D are based solely on vision information (T1), while BLIP-2 is based on both vision and text information (identical to BrainSCK). We will supplement more details about the comparison methods in the camera-ready version.

[1] Kunda, M., Zhou, S., Gong, G., & Lu, H. (2022). Improving multi-site autism classification via site-dependence minimization and second-order functional connectivity. IEEE Transactions on Medical Imaging, 42(1), 55-65. [2] Wen, Junhao, et al. “Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation.” Medical image analysis 63 (2020): 101694.




Meta-Review

Meta-review not available, early accepted paper.



back to top