Abstract

We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space and lead to overfitting with limited training data which is common in fMRI fields. In contrast, we design a hierarchical prompt structure that transfers the knowledge learned from high-resource tasks to low-resource ones. This structure, equipped with a Deeply-conditioned Input-Prompt (DIP) mapping module, allows for efficient adaptation by updating only 2% of the trainable parameters. The framework enhances semantic interpretability through attention mechanisms between inputs and prompts, and it clusters prompts in the latent space in alignment with prior knowledge. Experiments on public resting state fMRI datasets reveal ScaPT outperforms fine-tuning and multitask-based prompt tuning in neurodegenerative diseases diagnosis/prognosis and personality trait prediction, even with fewer than 20 participants. It highlights ScaPT’s efficiency in adapting pre-trained fMRI models to low-resource tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2127_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2127_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

https://www.humanconnectome.org/study/hcp-lifespan-aging https://adni.loni.usc.edu/data-samples/access-data/ https://www.ukbiobank.ac.uk/

BibTex

@InProceedings{Don_Prompt_MICCAI2024,
        author = { Dong, Zijian and Wu, Yilei and Chen, Zijiao and Zhang, Yichi and Jin, Yueming and Zhou, Juan Helen},
        title = { { Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    1) design a hierarchical prompt structure that evolves soft prompts into three levels. 2) propose a shared Deeply-conditioned Input-Prompt (DIP) mapping module for both source and target training. 3) the proposed attention mechanism between inputs and prompts offers significant semantic interpretability. 4) achieve superior results with high parameter efficiency and improved performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors introduce Scaffold Prompt Tuning (ScaPT), the first prompt-based adaptive framework for fMRI pre-trained model with remarkable parameter efficiency and superior performance using limited downstream training data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1)The writing of the article is not clear and easy to understand. 2)The article mentions that soft prompts lack clinical interpretability, but the experimental results do not demonstrate any fundamental improvements in clinical interpretability by the proposed solutions. 3)Key terms lack explanations, such as: modular prompt, phenotype prompt, and vertex prompt. 4)The reproducibility details of the comparative methods are unclear.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1)Please clearly describe your research motivation in the article. 2)If the research is focused on clinical issues related to fMRI, please elaborate on how this work contributes to current clinical protocols. If the research is focused on addressing the inherent flaws of soft prompts, then please extend the DIP method to a more general solution, rather than validating it solely on fMRI datasets.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clarity and coherence of the research methodology and written argumentation are lacking

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Based on the author’s response, I decide to raise my rate.



Review #2

  • Please describe the contribution of the paper

    The authors propose a prompt-based framework to adapt pre-trained models to downstream tasks. The novel methodology presents a promising approach to addressing the challenges associated with fine-tuning. The proposed approach outperforms the baseline methods across different sizes of training datasets, showing scalability with the volume of training data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper has a clear contribution and motivation. The authors propose a novel framework for tackling the issue of learning from small datasets.The proposed modules are addressing problems with fine tuning/catastrophic forgetting. Building the models on the current largest foundation model of fMRI. The proposed approach is compared to multiple baselines and ablation studies are performed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Preprocessing is an important aspect of the study, and including more detailed information about it could greatly enhance the clarity and reproducibility of the results. The manuscript is lacking important details on selection of datasets and downstream tasks.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Preprocessing details are missing. It has been shown that even different version of the same software makes a huge difference in fMRI analyses (Bowring et al.), it is important to include these details in the manuscript.

    Bowring, Alexander, Camille Maumet, and Thomas E. Nichols. “Exploring the impact of analysis software on task fMRI results.” Human brain mapping 40.11 (2019): 3362-3384.

    Dataset: A motivation is given for why the experiments are conducted on the selected datasets (no) All datasets drawn from the existing literature are accompanied by appropriate citations. (yes) All datasets drawn from the existing literature are publicly available. (yes)

    Computational experiments: This paper specifies the computing infrastructure used for running experiments (hardware and software), including GPU/CPU models; amount of memory; operating system; names and versions of relevant software libraries and frameworks. (partial) This paper states the number of algorithm runs used to compute each reported result. (no) Analysis of experiments goes beyond single-dimensional summaries of performance (e.g., average; median) to include measures of variation, confidence, or other distributional information. (yes) The significance of any improvement or decrease in performance is judged using appropriate statistical tests. (yes) This paper lists all final (hyper-)parameters used for each model/algorithm in the paper’s experiments. (yes) This paper states the number and range of values tried per (hyper-) parameter during development of the paper, along with the criterion used for selecting the final parameter setting. (yes)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Given there is a large battery of behavioral measures in any of the datasets used, why choose neuroticism as the regression task?

    Typo in title adaption → adaptation

    Unclear why the authors call this a language model, a clarification could be helpful. Aligned with this question, what are the prompts?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is an interesting contribution, leveraging recent advancements in the field and of interest to the community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I think the contribution is novel and of interest to the community. I am satisfied with the authors’ rebuttal.



Review #3

  • Please describe the contribution of the paper

    This manuscript proposes a method for adapting large-scale pre-trained functional magnetic resonance imaging (fMRI) language models to downstream tasks with a hierarchical prompt structure and a Deeply-conditioned Input-Prompt (DIP) Mapping module. The experiment results on public datasets demonstrate its effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A new DIP module for matching prompts with input embedding space.

    Using attention scores for interpreting prompts provides insights for the model and prompts.

    The improvement over baselines is significant, especially for the classification tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The criteria and details for selecting the number of training data from ADNI and UKB are unclear.

    Missing citations, e.g. Nilearn

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The prompt interpretation needs to be further validated.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This manuscript proposes a novel framework for adapting pre-trained fMRI language model and improvement over selected baselines is significant.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I have reviewed the comments from other reviewers and the authors’ feedback. While some concerns have been addressed in the rebuttal, I agree with another reviewer that many details are missing and the writing needs improvement. Therefore, my score remains unchanged.




Author Feedback

We appreciate the insightful feedback from the reviewers and are encouraged by the positive comments: “novel framework” with “clear motivation” (R1), “improvement is significant” with “insights provided” (R3), “remarkable parameter efficiency and superior performance” (R5).

Q1 Preprocessing details (R1). We adhered to the preprocessing pipeline of the pretrained fMRI model [15]. Data was preprocessed using fMRIPrep (20.2.3) with the default settings, followed by spatial smoothing, detrending, high-pass filtering, and regression to remove confounders. The data were parcellated using Nilearn (0.10.4) with the function “nilearn.datasets.fetch_atlas_difumo”.

Q2 Selection of the datasets and tasks (R1&R3). The brain-behavior phenotypes established include three domains: cognition, personality, and social emotion. We hypothesized that ScaPT would perform well in tasks relevant to these domains. We first test if our model can perform NC/MCI and amyloid-/+ classification (related to cognition), most challenging tasks related to early diagnosis and prognosis of AD. We selected ADNI as it is the most widely used fMRI dataset for AD. To assess the model in both personality and social emotion, we selected Neuroticism as our third task and chose UKB, the largest public fMRI dataset. Regarding the amount of training data, we follow a similar setting in [15], using limited data to showcase the adaptation performance and varying the size to demonstrate how performance scales.

Q3 Clarification on language model and prompts (R1&R5). We call the fMRI model a “language model” because it is a GPT-2, initially for NLP. It learns to understand brain dynamics by modeling sequences of activity, similar to how text is processed. A soft prompt is a trainable embedding that directs the model’s responses without changing its architecture. We developed a hierarchical prompt structure with three levels: 1) Modular Prompts (MoP), encode abstract information and fundamental skills required for fMRI understanding. 2) Phenotype Prompts (Phep), combine MoP, each tailored to a specific phenotype. 3) Vertex Prompts, direct the model for downstream tasks by integrating Phep with a target prompt.

Q4 Typo and citation (R1&R3). We will revise “adaption” in title to “adaptation” (R1), and add the citation of Nilearn to the revised version (R3).

Q5 Prompt interpretations (R3&R5). We are the first to explore the prompt space in fMRI analysis, where we observed phenotype-based clustering of prompts (Fig.2. (1)). Furthermore, ScaPT introduces “semantic” interpretability to fMRI prediction, through the input-prompt attention. It intuitively highlights important aspects for different tasks (Fig.2. (2)). While ScaPT is not a model designed for interpretation, it mitigates the black-box nature of prompt.

Q6 Implementation of other methods (R5). We implement baselines using their open-source codebases in the same environment as ours. Batch size settings are the same for classification as ours. For other key hyperparameters (batch size for regression (chosen from 8/16/32 for the best performance), learning rate and dropout): SPoT: (8, 1e-4, 0.3); MP2: (16, 1e-4 for ST and 1e-3 for TT, 0.2 for both ST and TT); ATTEMPT: (16, 1e-4 for both ST and TT, 0.1 for ST and 0.2 for TT).

Q7 Clarification on motivation and clinical contribution (R5). Our paper presents “a clear contribution and motivation” (R1) by “proposing a novel framework for adapting fMRI language models” (R3). We focus on fMRI as we model its time series akin to text representation. We have demonstrated the efficacy of ScaPT by focusing on two most challenging tasks in AD, i.e., early diagnosis and prognosis with limited training data and computing resources. Moreover, our work enhances our understanding of brain-behavior mapping for cognition and social&emotional functions, crucial for clinical practice. ScaPT could pave the way for disease prognosis and treatment planning in neuropsychiatric disorders.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Post-rebuttal, reviewers are in agreement that the work should be accepted, noting the interest in the new prompt-tuning method for fMRI and impressive improved performance with limited data in downstream tasks. While I agree with acceptance, one particular concern is confusion regarding clarity in writing - please address the reviewers comments regarding clarifying ideas (eg. types of prompts). Also, specifically I find the “language model” naming rather confusing - as the authors explain it they use a GPT-2 style architecture pertained using NLP inspired techniques, but to call it an “fMRI language model” makes it sound like there is language involved in the model and there is not. This naming needs to be addressed in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Post-rebuttal, reviewers are in agreement that the work should be accepted, noting the interest in the new prompt-tuning method for fMRI and impressive improved performance with limited data in downstream tasks. While I agree with acceptance, one particular concern is confusion regarding clarity in writing - please address the reviewers comments regarding clarifying ideas (eg. types of prompts). Also, specifically I find the “language model” naming rather confusing - as the authors explain it they use a GPT-2 style architecture pertained using NLP inspired techniques, but to call it an “fMRI language model” makes it sound like there is language involved in the model and there is not. This naming needs to be addressed in the final version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top