Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Functional Magnetic Resonance Imaging (fMRI) is an advanced neuroimaging method that enables in-depth analysis of brain activity by measuring dynamic changes in the blood oxygenation level-dependent (BOLD) signals. However, the resource-intensive nature of fMRI data acquisition limits the availability of high-fidelity samples required for data-driven brain analysis models. While modern generative models can synthesize fMRI data, they often underperform because they overlook the complex non-stationarity and nonlinear BOLD dynamics. To address these challenges, we introduce T2I-Diff, an fMRI generation framework that leverages time-frequency representation of BOLD signals and classifier-free denoising diffusion. Specifically, our framework first converts BOLD signals into windowed spectrograms via a time-dependent Fourier transform, capturing both the underlying temporal dynamics and spectral evolution. Subsequently, a classifier-free diffusion model is trained to generate class-conditioned frequency spectrograms, which are then reverted to BOLD signals via inverse Fourier transforms. Finally, we validate the efficacy of our approach by demonstrating improved accuracy and generalization in downstream fMRI-based brain network classification. The code is available at \href{https://github.com/htew0001/T2I-Diff.git}{repository}

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3042_paper.pdf

SharedIt Link: https://rdcu.be/eHwR2

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_61

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/htew0001/T2I-Diff.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{TewHwa_T2IDiff_MICCAI2025,
        author = { Tew, Hwa Hui AND Loo, Junn Yong AND Tan, Yee-Fan AND Tang, Xinyu AND Ombao, Hernando AND Noman, Fuad AND Phan, Raphaël C.-W. AND Ting, Chee-Ming},
        title = { { T2I-Diff: fMRI Signal Generation via Time-Frequency Image Transform and Classifier-Free Denoising Diffusion Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {640 -- 650}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a framework named T2I-Diff, which employs time-frequency image transformation and a classifier-free denoising diffusion model to generate functional magnetic resonance imaging (fMRI) signals. The proposed method aims to address the challenges of resource-intensive fMRI data acquisition, small sample sizes, and the inability of existing generative models to effectively capture the complex non-stationarity and nonlinear temporal dynamics of BOLD signals.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper presents an innovative T2I-Diff framework that effectively captures the complex dynamics and spectral characteristics of fMRI signals through time-frequency image transformation and a classifier-free denoising diffusion model. The proposed method generates high-quality synthetic data, improving downstream task performance while demonstrating some generalizability and biological plausibility.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The idea of leveraging time-frequency image representations for fMRI and modeling them via diffusion models is interesting. However, as shown in Table 1, the proposed method does not outperform baseline generative models in terms of fMRI signal quality, which significantly undermines the claimed contributions of this work.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the proposed method demonstrates some novelty, its failure to outperform baseline models in key performance metrics raises concerns about the fundamental motivation and justification for this approach.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper presents T2I-Diff, a novel framework for functional MRI (fMRI) BOLD signal generation based on a time-frequency image transformation and a classifier-free diffusion model. The key innovation lies in transforming BOLD signals into time-frequency spectrograms using a windowed Fourier transform (WFT), allowing effective modeling of temporal and spectral dynamics in fMRI data. The authors further employ a classifier-free denoising diffusion probabilistic model and EDM sampling to generate class-conditioned spectrograms that are then inverted back to time-domain BOLD signals. Experiments on the REST-meta-MDD dataset demonstrate improved performance in synthetic data generation and downstream MDD classification.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novel methodology: The transformation of BOLD signals to spectrograms for diffusion modeling is original and offers a fresh perspective compared to traditional time-domain approaches.

Classifier-free diffusion: Eliminating the need for external guidance (e.g., a classifier) during the diffusion process reduces training complexity and potentially improves generalization.

Biological plausibility: The paper includes a functional connectivity analysis showing that generated signals preserve disease-related connectivity differences, which supports the clinical relevance of the approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Generalizability across datasets: The study is based solely on the REST-meta-MDD dataset. The method’s performance on other datasets or brain disorders remains unclear.

Lack of code/data release at submission: Although the authors plan to release code upon acceptance, currently there is no link provided for reproducibility verification.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Including example spectrograms or visualizations comparing real and synthetic data per class (e.g., HC vs. MDD) might strengthen interpretability. A comparison with models that leverage graph-based representations (e.g., GraphVAE, BrainGNN) could provide further context, especially given the structured nature of brain data.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper demonstrates technical novelty, practical relevance, and solid empirical validation. The idea of translating temporal fMRI signals into a 2D image domain for diffusion modeling is promising and well-supported. Despite some areas that could be clarified or extended, the current version of the paper already makes a valuable contribution.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

After carefully reviewing the authors’ rebuttal, I find that they have addressed the key concerns raised by the reviewers in a clear and technically sound manner.

Review #3

Please describe the contribution of the paper

The paper introduces T2I-Diff, a generative framework for fMRI BOLD signals that integrates a time-to-frequency image transform with DDPM framework. By converting fMRI time-series into spectrogram images, T2I-Diff captures both spectral and temporal dynamics, thereby reducing model complexity and training overhead. This method aims to provide high-quality synthetic data and enhance downstream brain disorder classification tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The use of windowed Fourier transforms to convert BOLD signals into spectrogram images is an innovative approach that leverages both spectral (frequency) and temporal information for generative modeling of fMRI signals.
2. By employing a classifier-free diffusion process, the framework avoids reliance on an additional classifier for conditional guidance.
3. Empirical results shows competitive performance in generating time-frequency images, and the original data augmented with diffusion-generated data show superior performance in downstream brain disorder classification.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Lack of Reference to Related Prior Work: The paper does not cite or compare against to a closely related recent work (NeurIPS 2024) “Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series” by Naiman et al. It is unclear why this closely related study was not cited as or used as a baseline? A quick review suggests that the methodologies in both papers are quite similar, with the main difference lying in their application.
2. Use of LDW Estimator for Connectivity: While the paper estimates the connectivity matrix using the LDW estimator, standard fMRI analyses commonly employ Pearson correlation. The authors do not explain why LDW was chosen over more conventional correlation-based methods, leaving confusion about this estimator’s specific advantages or justification within the context of fMRI connectivity analysis.
3. Effect of TR: How does this work account for different repetition times (TRs) across fMRI protocols, given that TR directly affects the total number of acquired time points. While the ablation experiments show the model’s performance at sequence lengths of 24, 64, 128, and 256, it remains unclear whether the framework incorporates or adjusts for varying TR values.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea of converting BOLD signals into time-frequency spectrogram images and its classifier-free diffusion process is innovative with promising empirical results. However, the following unresolved issues remain and the paper would benefit from addressing these: it is unclear why this work was not compared against other time-frequency related works. Furthermore, clarifying why the LDW estimator was chosen over the more standard Pearson correlation and how different repetition times (TRs) in fMRI protocols are handled or accounted for in this work should be addressed.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Thank you to the authors for addressing my comments. Therefore, I recommend to accepting the paper. Great work!

Author Feedback

R1 Reproducibility & Generalizability: The codes and preprocessed data will be made available after acceptance.We believe current experiments on a large major depression disorder fMRI dataset is sufficient to show the methodological advantages and application of our method.Further experiments on other brain disease dataset to evaluate generalizability will be reserved for future work. R2 Lack of Reference to Related Prior Work: Thanks for pointing out related work by Naiman et al. which we will compare and cite in our paper.First, Naiman has not investigated the synthesis of fMRI signals as in our work.There are also major differences in methodology between our work and Naiman et al., as follows: First, we build upon their unconditional generative framework by introducing class-conditioned generation, which enables the synthesis of class-specific samples essential for brain disorder analysis.In particular, we incorporate classifier-free guidance into the existing EDM pipeline to learn both unconditional and conditional scores, which are combined during sampling using a guidance weight to balance diversity and class alignment.Additionally, we empirically select key WFT hyperparameters (signal length, hop size, and number of frequency bins), and apply frequency-wise normalization to better align with the temporal resolution and spectral features of BOLD signals relevant to brain activity.These adaptations for class-conditioned fMRI BOLD generation are not explicitly addressed in the work of Naiman et al. R2 LDW Connectivity Estimator: We clarify that the LDW shrinkage method is a regularized variant of standard correlation-based methods that offers improved estimation stability.We apply LDW to estimate high-dimensional covariance or correlation matrices.This is particularly important when the number of brain regions or network nodes is comparable to the number of timepoints (N=116, T=232 in our case), where the sample covariance matrix becomes ill-conditioned.LDW addresses this by shrinking the sample covariance toward a well-conditioned target, thereby minimizing mean squared error and yielding stable and reliable connectivitity estimates. R2 Effect of TR: Our diffusion model operates in the time-frequency domain, making the repetition time (TR), or equivalently the sampling frequency an important hyperparameter to our framework.Since TR determines the maximum resolvable frequency according to the Nyquist theorem, it directly influences the spectral content of fMRI signals.Our framework can be tuned to match the frequency characteristics associated with a given TR, allowing compatibility with varying fMRI acquisition protocols.In contrast, time-domain diffusion models typically overlook TR and its impact on the spectral properties essential for accurately modeling fMRI dynamics. R3 Performance on fMRI Signals Reconstruction: We clarify that Context-FID and Correlational are similarity metrics computed on reconstructed time-domain signals, and are therefore sensitive to artifacts introduced during the spectrogram-to-signal reconstruction process.In particular, the overlap-add (OLA) method employed for reconstruction can introduce phase shifts and spectral biases, to which these metrics are especially susceptible.Consequently, models that generate representative spectrograms in the frequency domain may still yield degraded time-domain evaluation scores.Nevertheless, our method consistently outperforms state-of-the-art baselines on downstream task-based discriminative and predictive metrics.Unlike signal reconstruction metrics such as Context-FID and Correlational, these task-based metrics evaluate the overall preservation of high-level temporal dynamics and structural features, and are more robust to low-level reconstruction artifacts that do not compromise overall semantic fidelity.Notably, despite the absence of explicit time-domain loss supervision, our method still achieves competitive performance on time-domain metrics.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Reviewers note the relevance and interest in the novelty of the proposed approach and the strength of the empirical validation of using the synthesized data in downstream tasks. There were concerns regarding lack of comparison to related work, unclear method choices/descriptions, and relatively poor performance for some reported metrics. The strongest concern regarding performance relates to reconstruction metrics, but downstream analysis showed improvement and synthesized functional connectivity showed biological plausibility. As such, I believe the rebuttal addressed most of the concerns, and that the interesting approach and solid experimental analysis make this paper a good contribution to MICCAI.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

T2I-Diff: fMRI Signal Generation via Time-Frequency Image Transform and Classifier-Free Denoising Diffusion Models

Author(s):