Abstract

Using functional Magnetic Resonance Imaging (fMRI) to construct the functional connectivity is a well-established paradigm for deep learning-based brain analysis. Recently, benefiting from the remarkable effectiveness and generalization brought by large-scale multi-modal pre-training data, Vision-Language (V-L) models have achieved excellent performance in numerous medical tasks. However, applying the pre-trained V-L model to brain analysis presents two significant challenges: (1) The lack of paired fMRI-text data; (2) The construction of functional connectivity from multi-modal data. To tackle these challenges, we propose a fMRI-Text Synergistic Prompt Learning (fTSPL) pipeline, which utilizes the pre-trained V-L model to enhance brain analysis for the first time. In fTSPL, we first propose an Activation-driven Brain-region Text Generation (ABTG) scheme that can automatically generate instance-level texts describing each fMRI, and then leverage the V-L model to learn multi-modal fMRI and text representations. We also propose a Prompt-boosted Multi-modal Functional Connectivity Construction (PMFCC) scheme by establishing the correlations between fMRI-text representations and brain-region embeddings. This scheme serves as a plug-and-play preliminary that can connect with various Graph Neural Networks (GNNs) for brain analysis. Experiments on ABIDE and HCP datasets demonstrate that our pipeline outperforms state-of-the-art methods on brain classification and prediction tasks. The code is available at https://github.com/CUHK-AIM-Group/fTSPL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2709_paper.pdf

SharedIt Link: https://rdcu.be/dY6kR

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_53

Supplementary Material: N/A

Link to the Code Repository

https://github.com/CUHK-AIM-Group/fTSPL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_fTSPL_MICCAI2024,
        author = { Wang, Pengyu and Zhang, Huaqi and He, Zhibin and Peng, Zhihao and Yuan, Yixuan},
        title = { { fTSPL: Enhancing Brain Analysis with fMRI-Text Synergistic Prompt Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {564 -- 574}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a method to generate matched texts from a given fMRI BOLD signal and thus allows the application of the recently proposed vision-language (VL) model for fMRI analysis. Texts were generated using two-tiered thresholding of BOLD intensity and degree centrality of the conventional functional connectivity (FC) matrix. There is a growing interest in applying VL models in neuroimaging analysis and this paper tackles an important research topic. With the aligned text–imaging features, the authors constructed multimodal functional connectivity for two downstream tasks of disease classification (autism vs. normal) and prediction of cognitive score. The proposed approach fared better than existing methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of generating matching text for the fMRI BOLD signal is intuitive and clever. Constructing the multimodal FC has an impact as well.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) Details regarding the image encoder and text decoder are largely missing. It seems they are based on the BiomedCLIP [24]. Since there is limited space in the main text, perhaps the authors can add the details in the supplement. 2) The multimodal FC has two additional columns (image modal supplement F_I and text modal supplement F_T) compared to conventional FC. Since the authors have computed rich multimodal information, perhaps there is a better way to use this information besides adding two columns. 3) Autism classification is heavily affected by co-variates of age, sex, and site factors. Those co-variates need to be regressed out or some form of correction needs to be applied. 4) The performance measure of Table 1 is unclear. Is it the mean of cross-validations?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

They did not implement some of the comparison methods (BNT and Com-BrainTF) and reused the previous results from existing papers. The two comparison methods used splits of train: validation: test = 7:1:2, while their approach (and other compared methods) used cross-validation. There needs to be consistent data splits for comparison.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see my comments above.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This study represents the pioneering application of the Vision-Language (V-L) model to multi-modal brain analysis. The authors have introduced Activation-driven Brain-region Text Generation (ABTG), which provides text descriptions at the instance level for fMRI data. Additionally, they have proposed Prompt-boosted Multi-modal Functional Connectivity Construction (PMFCC) as a novel method for constructing multi-modal functional connectivity.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1.This paper innovatively applies the V-L model to multi-modal brain analysis, extending its scope to the medical domain. 2.It specifically explores brain activation assessment, processing fMRI data to extract image and text modalities, and integrating them into the V-L model. 3.The study demonstrates excellent performance through the experimental results, showcasing the effectiveness of the proposed approach.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The experimental results of the paper perform well, especially on the ABIDE dataset with 1035 subjects. However, the reproducibility of this paper needs improvement. In particular, there are no pseudo-codes or codes provided, so there are doubts about the results at present. Therefore, a Weak Accept rating is temporarily given. The score will be modified based on the author’s response regarding reproducibility in the future.
2. Much of this paper is presented in insufficient detail, such as the selection of threshold coefficients for brain regions with different activation states, how the adjacency matrix are obtained in GNN, and the specific form of the loss function.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

The experimental results of the paper perform well, especially on the ABIDE dataset with 1035 subjects. However, the reproducibility of this paper needs improvement. In particular, there are no pseudo-codes or codes provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. The experimental results of the paper perform well, especially on the ABIDE dataset with 1035 subjects. However, the reproducibility of this paper needs improvement. In particular, there are no pseudo-codes or codes provided, so there are doubts about the results at present. Therefore, a Weak Accept rating is temporarily given. The score will be modified based on the author’s response regarding reproducibility in the future.
2. Much of this paper is presented in insufficient detail, such as the selection of threshold coefficients for brain regions with different activation states, how the adjacency matrix are obtained in GNN, and the specific form of the loss function.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is somewhat innovative, and it is the first paper to apply the V-L model to multi-modal brain analysis. However, the current version lacks some important details, and reproducibility needs to be improved.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

Design a template to describe the global state in individual fMRI data as texts. Propose to construct the multimodal functional connectivity with helps of the pre-trained V-L model and multi-modal prompts to improve ASD diagnosis and reading ability prediction.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Propose a framework to leaverage the power of pre-trained V-L model to improve functional network analysis. The paper is well-writen.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The method comparison can be improved.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The “fMRI original” in Fig. 2 are all T1 images.

What is the dimension of F_m? N by N+2 or N+2 by N+2?

How were the data preprocessed?

Comparisons to more conventional and learning-based functional network construction methods are suggested. Current compared methods are relevant but not majorly focusing on the functional network construction.

The current text generation are quite simple. The y essentially only describes the activation status of the brain regions. Will a “y” from direct encoding of such limited “global status” (can represented as a feature vector) achieving similar result?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The clarity of paper writing. The novelity of adopting V-L model.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

fTSPL: Enhancing Brain Analysis with fMRI-Text Synergistic Prompt Learning

Author(s):