Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Traditional neural decoding methods heavily rely on fully annotated brain data, which are both expensive to produce and scarce in availability. This limitation hinders the development of accurate and generalizable decoding models. Drawing inspiration from the success of foundational AI models in reducing dependency on annotated data in fields such as natural language processing, we introduce a novel foundation model that leverages the inherent spatiotemporal covariation of functional brain networks, which enables effective neural decoding with minimal annotation requirements. Our framework incorporates three key innovations: 1) A spatiotemporal importance-guided augmentation strategy is designed to capture the synergistic relationships between brain regions and their dynamic changes; 2) A progressive spatiotemporal-aware encoder is proposed to learn local-to-global brain interaction information; 3) A fine-grained consistency optimization technique is developed to enhance the representations of overall brain function. Evaluations on publicly available fMRI datasets demonstrate that our proposed framework not only achieves superior decoding performance but also exhibits strong generalizability and reveals patterns of nervous activity. Our research advances brain representation learning and provides an innovative solution for universal neural decoding models.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2630_paper.pdf

SharedIt Link: https://rdcu.be/eHwRZ

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_58

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiZiy_Spatiotemporal_MICCAI2025,
        author = { Li, Ziyu AND Zhu, Zhiyuan AND Bai, Yang AND Li, Qing AND Wu, Xia},
        title = { { Spatio-temporal Pre-trained Foundation Model for Neural Decoding with Fine-grained Optimization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {609 -- 618}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a spatiotemporal pre-training foundation model for neural decoding. This model designs an efficient spatiotemporal representation extractor to capture more comprehensive and flexible brain representations. The author validated the effectiveness of the method on the HCP dataset.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Considering spatiotemporal information in neural decoding is an interesting idea.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

（1）This paper is hard to follow. The framework diagram of this paper is also very difficult to understand. （2）Some details of the method are missing. How was data augmentation implemented? How was fine-tuning carried out?
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The presentation of this paper is very poor. The formulas lack necessary explanations, and the flowcharts are also hard to understand.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

a. The authors propose a Spatio-Temporal Importance Guided Augmentation (STIGA) strategy to capture the synergistic relationships and dynamic changes of brain regions, which provide a reference in the design of Spatio-Temporal augmented views. b. The authors develop a Progressive Spatio-Temporal Aware encoder (PSTA) as an efficient extractor to obtain local-to-global brain representations. c. The authors introduce Fine-Grained Consistency Optimization (FGCO) as an efficient method to mitigate the suboptimal representation problem caused by negative samples. d. Based on the above techniques, the authors propose a Spatio-Temporal Pre-Training foundation model (STPTF) with state-of-the-art performance and strong generalizability in neural decoding, which is validated on publicly available fMRI datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

a. The proposed STPTF integrates some advanced techniques to better leverage the inherent spatio-temporal covariation of fMRI time series, offering novel insights in addressing main challenges in graph self-supervised decoding. b. Effectiveness of STPTF and its modules are thoroughly assessed through extensive experiments. c. Compared with existing graph self-supervised decoding approaches, STPTF shows state-of-the-art performance and strong generalizability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

a. Sometimes, the authors employ multiple terms for the same concept, for example in section 3.2 (“Comparative Results”), introducing ambiguity and hinders readability. Some abbreviations (e.g., “UL”) lack full definitions upon first use, violating academic writing conventions. b. Some results lack discussion and analysis, e.g., “most graph self-supervised methods exhibit a substantial performance gap compared to spatio-temporal supervised methods” c. The ablation results seem to indicate that STPTF utilizing the proposed STIGA achieves small improvements compared to that using RSA+RTA.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

a. The proposed STPTF framework introduces novel components, STIGA, PSTA, FGCO to model spatio-temporal dependencies in fMRI data, addressing key challenges in graph self-supervised learning with clear technical advancements. b. The method design is technically rigorous, with modules like PSTA effectively combining local-global feature extraction. However, ablation studies show small gains from STIGA versus baseline augmentations (RSA+RTA). c. Insufficient discussion of some experiment results.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal was effective, and I have no additional questions.

Review #3

Please describe the contribution of the paper

This paper targets at effective neural decoding with minimal annotation requirements. To achieve that, the paper proposes an importance-guided augmentation strategy, a progressive spatiotemporal-aware encoder, and a fine-grained consistency optimization technique. Experiments demonstrate the superiority of their decoding performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

++ The motivation of this paper is interesting, which focuses on the spatio-temporal features for brain information. This is motivated by previous evidence that the temporal dependencies between ROIs play a pivotal role in neural activities.

++ The first method to integrate graph foundation models with spatio-temporal brain information.

++ The performance improvement compared previous methods is significant.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

– The framework part of this paper mainly utilizes existing methods based on GNNs for the brain representation learning. As a result, the technical contribution / novelty of this work is a bit limited. It is suggested to better highlight the technical contributions of this paper for the brain representation learning.

– No related works discussions.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper’s motivation on spatio-temporal brain representation is interesting, the improvement over previous methods is significant. However, the technical contribution is a bit limited and the organization of the paper is missing related works part. Therefore, the reviewer is leaning towards a weak accept and will make the final decision based on the authors’ rebuttal and feedback from other reviewers.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #4

Please describe the contribution of the paper

The paper proposes a new spatiotemporal pre-training Foundation Model for neural decoding based on functional MRI data. The proposed method leverages unlabeled data for pre-training and can be fine-tuned using small amounts of labeled data. The pre-training strategy is evaluated on both resting-state and task-based data from the Human Connectome Project (HCP). Authors demonstrate the impact of each component, the robustness and transferability of the pre-trained model. The proposed method achieves significant improvements in unsupervised brain decoding.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper proposes an innovative pre-training strategy by leveraging the spatio-temporal characteristics of fMRI data. It incorporates three key innovations: a spatiotemporal guided data augmentation to capture the synergistic relationships between brain regions and their dynamic changes, a progressive spatiotemporal-aware encoder to learn local-to-global brain interaction information; an optimization technique to enhance the representations of overall brain function.
2. The paper tackles two important challenges: the lack of labeled data in fMRI for brain decoding which necessitate expert knowledge for precise labeling and the GCN that overlooks the temporal dynamics within brain networks.
3. The decoding performance obtained by the model are satisfying, and a thorough evaluation is performed to better understand the impact of each component and the impact of different sample sizes.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The description of the framework can be hard to follow for non-experts. The authors provide a schematic representation of their framework but never refer to it when presenting the different components of their framework.
2. The description contains a large number of abbreviations, which makes it very hard to follow.
3. The authors claim that they evaluated their framework on multiple publicly available datasets, but only the data from the Human Connectome Project are used. This dataset is highly standardized and homogeneous, and thus might fail to represent the variability of fMRI data.
4. The authors compare their framework with different competitors, but do not explain the rationale behind this choice of competitors. It would be good to present these competitors to better understand the differences with the proposed framework.
5. Moreover, the reported metrics are not explained. Which task is evaluated?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
I advise the authors to re-work the Method and Experiments parts of their paper:
- Make use of the workflow description figure to enhance clarity of method description ;
- Remove the abbreviation, or simplify them ;
- Better explain the decoding tasks and competitors.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper provides an innovative framework for pre-training of neural decoding models in an unsupervised fashion. The proposed methodology tackle relevant challenges related to the lack of annotated fMRI data and the neglection of temporal dynamics by GNN. However, the paper lacks clarity on the proposed methodology and experiments performed. The authors should better explain the overall framework, helping the reader to understand the relationship between the different components of the framework. Moreover, choice of competitors must be justified.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their insightful feedback. We are pleased that the reviewers have rated the clarity and organization of our paper as “Satisfactory” (R1, R2, R5), acknowledged the motivation behind our work (R2), recognized its innovation (R1, R5), highlighted its superior performance (R1, R2, R5), appreciated the comprehensive evaluation (R5), and emphasized the novel insights provided (R1). Besides, we have carefully considered the concerns raised by the reviewers:

Technical Contributions and Innovation (R2) Our STPTF does not merely adopt existing GNNs. Instead, it optimizes the pre-training strategy to address two critical challenges: the scarcity of fully labeled fMRI data and the complex spatiotemporal dynamics within brain networks. Specifically, STPTF introduces three key innovations: i) Strategy innovation through STIGA; ii) Architectural innovation via PSTA; iii) Optimization innovation using FGCO. These innovations have been validated by R1 and R5.

Discussion of Related Works (R2) and Competitor Selection Rationale (R5)
Our STPTF falls under the graph self-supervised learning (GSSL), which can be mainly classified into generative-based and contrastive-based approaches. A representative example of the generative-based approach is BrainGSL, designed for diagnosing brain diseases. For contrastive-based approaches, GATE is one of the pioneering methods for diseases diagnosis. Recently, HSGPL with hierarchical signed graph pooling was proposed for gender classification. However, due to the difficulty in reproduction caused by the lack of details about its Balanced and Unbalanced Embedded Modules in HSGPL, we finally chose BrainGSL and GATE as our competitors.
As for graph supervised methods, BrainGNN and STGCN share the same goal as ours. They specialize in spatial and spatiotemporal brain representation mining, respectively.

Discussion and Analysis of Experimental Results (R1) Regarding performance gap, it may stem from the neglect of temporal dynamics and the limitations inherent in the backbone (GNNs), which restrict the effectiveness of GSSL. Regarding the ablation results, although the improvements introduced by STIGA might appear modest, they demonstrate consistency across metrics and datasets, underscoring STIGA’s robustness. Furthermore, even slight improvements can bring it closer to practical applications.

Clarity of Framework Diagram (R4, R5) The framework diagram in Figure 1 outlines the overall architecture of STPTF, where (a) depicts the three-stage process and (b) elaborates on the components of PSTA. In the final version, we will enhance the captions and accompanying text to ensure better clarity.

Scalability of STPTF and Explanation of Metrics and Tasks (R5) We utilized both resting- and task-state data (all 7 tasks were evaluated) from the HCP dataset. Despite originating from the same source, they represent distinct paradigms and cognitive states, providing a diverse evaluation. We apologize for omitting the introduction of the reported metrics. We will provide detailed explanations of these in the final version.

We recognize the importance of managing abbreviations carefully to enhance readability. In the final version, we will minimize the use of abbreviations and ensure consistency throughout the paper. (R1, R5)

It seems that R4 has misunderstood our paper. First, R4’s classification and description of our work differ significantly from the other reviewers. Second, while other reviewers praised the clarity and organization of our paper (Satisfactory), R4 rated it as “Poor”. Third, with respect to methodological details, such as data augmentation and fine-tuning, we have provided detailed descriptions in Section 2 that are sufficient to achieve reproducibility affirmed by R2. Regarding the framework diagram, please refer to 4. In the final version, we will try our best to offer more clarifications to assist R4 in better understanding our work.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

Based on the reviewers’ comments, I advise the authors to improve discussion on related works and better highlight the methodological contributions of the paper.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper lacks clarity on the proposed methodology and experiments performed. Only HCP dataset is used in the paper. A foundation model should be trained and validated using multiple large datasets and with diverse tasks.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors have addressed key reviewer concerns, particularly clarifying their methodological contributions. The empirical evaluations across diverse resting- and task-state fMRI datasets demonstrate meaningful and consistent performance gains over established baselines, supporting their practical relevance. Overall, given the paper’s solid execution, clearly articulated innovation, and responsiveness to reviewers, acceptance is warranted.

back to top

Spatio-temporal Pre-trained Foundation Model for Neural Decoding with Fine-grained Optimization

Author(s):