Abstract

Understanding brain dynamics through functional Magnetic Resonance Imaging (fMRI) remains a fundamental challenge in neuroscience, particularly in capturing how the brain transitions between various functional states. Recently, metastability, which refers to temporarily stable brain states, has offered a promising paradigm to quantify complex brain signals into interpretable, discretized representations. In particular, compared to cluster-based machine learning approaches, tokenization approaches leveraging vector quantization have shown promise in representation learning with powerful reconstruction and predictive capabilities. However, most existing methods ignore brain transition dependencies and lack a quantification of brain dynamics into representative and stable embeddings. In this study, we propose a Hierarchical State space-based Tokenizer, termed HST, which quantizes brain states and transitions in a hierarchical structure based on a state space-based model. We introduce a refined clustered Vector-Quantization Variational AutoEncoder (VQ-VAE) that incorporates quantization error feedback and clustering to improve quantization performance while facilitating metastability with representative and stable token representations. We validate our HST on two public fMRI datasets, demonstrating its effectiveness in quantifying the hierarchical dynamics of the brain and its potential in disease diagnosis and reconstruction performance. Our method offers a promising framework for the characterization of brain dynamics, facilitating and improving current models of metastability.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0118_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

ADHD: https://neurobureau.projects.nitrc.org/ADHD200/Introduction.html

BibTex

@InProceedings{YanYan_Hierarchical_MICCAI2025,
        author = { Yang, Yanwu and Wolfers, Thomas},
        title = { { Hierarchical Characterization of Brain Dynamics via State Space-based Vector Quantization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {394 -- 404}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a hierarchical framework that jointly models brain states and transitions. The hierarchical approach separately quantizes brain states and transitions (in discrete space), which differs from traditional methods that may ignore transition dependencies between neural states.

    The method provides interpretable quantization of brain dynamics.

    The refined cluster VQ-VAE with error feedback addresses the trade-off between quantization precision and stability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main novelty in this work is that the authors use discrete tokenization rather than continuous space modeling for brain dynamics.

    Furthermore, the model explicitly considers the brain states and transition, built upon the concept of metastability.

    Each token in the codebook can represent a specific brain state or functional configuration.

    The paper claims their hierarchical approach achieves better reconstruction with fewer embeddings

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper’s approach to modeling brain dynamics through discrete representations is noteworthy; however, it overlooks significant foundation models for brain dynamics in the field, particularly BrainJEPA (Dong et al, NeurIPS 2024) and BrainLM (Caro et al, ICLR 2024). While the discrete representation in HST does align well with the neurobiological concept of metastable brain states, the authors fail to discuss/compare their approach with established models that successfully utilize continuous representation spaces. A thorough discussion and comparative analysis are extremely important to position this work within the existing literature, and avoid any potential misleading insight.

    VQ-VAE typically benefit from a two-stage process: pretraining on large unlabeled datasets with self-supervision, followed by fine-tuning for specific downstream tasks. The authors train their model end-to-end on a single dataset without prior pretraining needs further justification. Furthermore, the sample size (~200) is substantially smaller than those used in previous foundation models (~50K), raising concerns about whether this dataset is sufficient to demonstrate the proposed advantages of discrete representation space.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper fails to address significant related work in brain dynamics foundation models. These omissions undermine the authors’ ability to properly contextualize their contribution and establish its conclusion. A in-depth discussion and thorough comparison with these established models is essential to validate the claimed advantages of discrete representation over continuous approaches.

    The small sample size may be insufficient to convincingly demonstrate the superiority of discrete representation spaces. This limitation fundamentally compromises the paper’s central claims.

    Given these significant shortcomings, the reviewer recommends rejection of its current form.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors are strongly suggested to include the addtional discussion about related works in the later version to better position this work.



Review #2

  • Please describe the contribution of the paper

    The authors propose a variation of a vector-quantized VAE to capture brain states and state transitions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem addressed by the authors is a significant challenge, making the paper’s focus valuable for advancing the field. The paper demonstrates rigor through comprehensive experiments that consistently outperform baseline methods. The findings reveal meaningful connections between persistent brain states and hyperactivity symptoms, adding clinical relevance beyond technical improvements.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Some citations do not support the claims. e.g. 18&22 here > “Increasing the size of the codebook tends to improve reconstruction performance, but it can lead to unstable and fragmented token representations that are sensitive to noise in neural signals [18,22]”

    Data preprocessing details are missing, the authors should provide either more details or a citation to the detailed parameter selection in preprocessing to improve the reproducibility of their paper.

    The discussion lacks a thorough examination of the method’s limitations, which would strengthen the paper’s scholarly contribution and help guide future research directions.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    While it make sense, I’m intrigued by your classification of RNNs, LSTMs, and GRUs as state-space models, as this framing isn’t commonly used in the literature I’m familiar with. Including a reference to an explanatory paper (such as ‘Learning Latent Dynamics for Planning from Pixels’) would be helpful for readers to better understand this interesting perspective.

    RQ-VAE abbreviation appears without assignment.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite minor presentational issues, the core innovation and potential impact on both technical and clinical fronts justify acceptanc. However some important details (see weaknesses) are missing from the manuscript requiring a rebuttal.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    After careful consideration, reading all other reviews and the rebuttal in full, I recommend rejection. The authors have not addressed my primary concerns regarding data processing and model limitations. Another significant issue is the absence of code availability. Without access to these details, the community will face challenges in validating and building upon this work. While the proposed methodological contribution shows promise conceptually, the lack of code / information sharing, combined with the unaddressed technical questions prevents a thorough assessment of the method’s capabilities and contributions. For these reasons, I cannot recommend this manuscript for publication at MICCAI in its current state.



Review #3

  • Please describe the contribution of the paper

    The authors proposed a Hierarchical State space-based Tokenization network (HST) to quantize brain states and transitions in a hierarchical structure. They also introduced a refined clustered Vector-Quantization Variational AutoEncoder (VQ-VAE) that incorporates quantization error feedback and clustering to improve quantization performance while facilitating metastability with representative and stable token representations. Such an approach offers a promising framework for the characterization of brain dynamics, which could facilitate the analysis of metastability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed a state space-based tokenization method for the quantization of time series fMRI data that enables hierarchical mapping of brain states and transitions simultaneously. Also, they introduced a refined cluster VQ-VAE that incorporates quantization error feedback and clustering to improve quantization performance while facilitating metastability with stable discretized embeddings. Lastly, they evaluated their methods on two public datasets for disease diagnosis and reconstruction performance which supports the advantages of their approaches.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    One major weakness is that for the ADHD-200 evaluation, it seems the performance improvement of HST over RQ-VAE is quite small, especially when the number of embeddings is greater than 32 (Fig. 2A). Also, the highest accuracy for ADHD classification is 66.28%, which is improved compared to other approaches but still not very high.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. It seems there is no correction for multiple comparisons for statistical analysis.
    2. The citation of the references is not in order.
    3. Please clarify what the brain maps in Fig. 3 actually show. Do they show brain activation in HC or ADHD? Why use two different color schemes for the upper and lower figures?
    4. No information is given about the total number of brain states and transitions.
    5. What does brain transition 3 mean? Does it refer to transition from state 2 to 3 or from state 3 to 4?
    6. Section 4 of the Result: It is stated that “brain state 4 is associated with reduced activation of brain function. This suggests that patients with ADHD are more likely to experience activated brain function states, …” If the patients with ADHD are associated with reduced brain activation, how can the authors get the conclusion that ADHD is more prone to activated brain states. It is very confusing.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the weakness, I think it represents a good contribution to the field (see major contribution and major strength).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have sufficiently addressed my concerns, so I recommend acceptance.




Author Feedback

We thank the reviewers for their valuable and thoughtful comments. #Reviewer-1: Q1: wrong citations: We apologize for the incorrect citations. We will carefully correct this by referencing works such as [28] and “Representation Collapsing Problems in Vector Quantization”.

Q2: Inaccurate or missing description: We will incorporate the recommended corrections, including the data preprocessing, the limitations, and fixing typos. Regarding the term “state-space models,” we referred to recurrent models based on their latent dynamics and hidden state transitions. Thank you for your suggestion. It is helpful to include the paper to explain concerns.

#Reviewer-2: Q1: Performance. This is partly due to the lossy transformation from continuous signals to discrete representations, which sacrifices some accuracy but improves interpretability-important in fMRI analysis. Despite this trade-off, our model remains competitive with strong baselines.

Q2: No correction After applying FDR correction, the states reported remain significant except transition state 3. We will revise the corresponding description accordingly.

Q3: Fig. 3, brain transition 3 and brain state 4 1)The brain states are not specific to any group. Our goal is to discretize recurrent brain activity into identifiable states and analyze their occurrence and functional roles across conditions. The top panel shows average activation at a single TR, while the bottom shows functional changes across time points, with colors showing functional activations and changes. 2)Unlike prior work on transition probabilities, our method explicitly models the functional transformation between two timepoints. These continuous transitions are also discretized. 3)The total number of tokens in our model is 16. Among these, half are reserved for error feedback. As a result, we obtain 4 tokens for brain states and 4 tokens for transition states. 4)Brain state 4 reflects reduced activation, and ADHD patients show lower occupancy in this state, suggesting a shift toward more active states and supporting the link to hyperactive dynamics. We will revise the manuscript to clarify these interpretations.

Reviewer-3: Q1: Comparison with other works. BrainJEPA and BrainLM are well-established models for continuous brain dynamics learning. Our study takes a different direction, i.e., discretizing brain activity into interpretable states. Since foundation models are trained on large-scale data with general-purpose, it would be unfair to compare them with our model. Our work is not positioned as a foundation model. Though tokenization is a step toward that, our work serves as an early-stage evaluation of discrete brain state modeling.

Regarding metastable states, our work is inspired by metastability but follows a data-driven approach. Related methods include autoencoder+KMeans[1], KMeans[2], and HMM[3]. To maintain conceptual clarity, we use the term “brain states” rather than “metastable states”. For the comparison, due to space limitations, we were unable to include a comprehensive comparison across all methods. Instead, we showed the practical utility of our model through significant group-level differences in a clinical application (e.g., ADHD).

[1] Modes of cognition: Evidence from metastable brain dynamics. Neuroimage. [2] Dynamical exploration of the repertoire of brain networks at rest is modulated by psilocybin. NeuroImage. [3] Brain network dynamics are hierarchically organized in time. PNAS.

Q2: Pretraining and data size. For a fair comparison, we pretrain and fine-tune the model only on the training set. Regarding the data size, sequential slicing could mitigate overfitting to some extent. We test both ADHD (large) and SchizoConnect (small) datasets to show generalizability. For cross-site generalization to datasets with varying TRs, it is a limitation of our study. As noted in the paper, future work will build a large model for brain dynamics quantization on large-scale datasets.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers noted the strengths of this work included the proposed methodology incorporating brain state and transition modeling and the comprehensive experiments with two public datasets. However, there were also concerns regarding missing preprocessing details and limitation discussion, moderate performance improvement, and lack of comparison to large foundation models for brain dynamics. Many concerns were sufficiently addressed in the rebuttal, e.g., I agree that the large brain foundation models are a complementary direction and its lack of inclusion here does not detract from the methods and experiments presented in this paper. However, some concerns still remain regarding needed clarifications and lack of code sharing. Still, I believe the presented model, which incorporates interpretability through modeling of brain states and transitions, would be a solid contribution to the miccai and would be of great interest to the neuroimaging community. Thus, I recommend this paper for acceptance.

    The authors should please include the requested information regarding implementation details and limitations, and share the code for reproducibility of possible.



back to top