Abstract

Generating realistic images to accurately predict changes in the structure of brain MRI can be a crucial tool for clinicians. Such applications can help assess patients’ outcomes and analyze how diseases progress at the individual level. However, existing methods developed for this task present some limitations. Some approaches attempt to model the distribution of MRI scans directly by conditioning the model on patients’ ages, but they fail to explicitly capture the relationship between structural changes in the brain and time intervals, especially on age-unbalanced datasets. Other approaches simply rely on interpolation between scans, which limits their clinical application as they do not predict future MRIs. To address these challenges, we propose a Temporally-Aware Diffusion Model (TADM), which introduces a novel approach to accurately infer progression in brain MRIs. TADM learns the distribution of structural changes in terms of intensity differences between scans and combines the prediction of these changes with the initial baseline scans to generate future MRIs. Furthermore, during training, we propose to leverage a pre-trained Brain-Age Estimator (BAE) to refine the model’s training process, enhancing its ability to produce accurate MRIs that match the expected age gap between baseline and generated scans. Our assessment, conducted on 634 subjects from the OASIS-3 dataset, uses similarity metrics and region sizes computed by comparing predicted and real follow-up scans on 3 relevant brain regions. TADM achieves large improvements over existing approaches, with an average decrease of 24% in region size error and an improvement of 4% in similarity metrics. These evaluations demonstrate the improvement of our model in mimicking temporal brain neurodegenerative progression compared to existing methods. We believe that our approach will significantly benefit clinical applications, such as predicting patient outcomes or improving treatments for patients.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2281_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/MattiaLitrico/TADM-Temporally-Aware-Diffusion-Model-for-Neurodegenerative-Progression-on-Brain-MRI

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Lit_TADM_MICCAI2024,
        author = { Litrico, Mattia and Guarnera, Francesco and Giuffrida, Mario Valerio and Ravì, Daniele and Battiato, Sebastiano},
        title = { { TADM: Temporally-Aware Diffusion Model for Neurodegenerative Progression on Brain MRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces the Temporally-Aware Diffusion Model (TADM) to accurately predict future brain MRI by learning structural changes over time, significantly outperforming existing methods in mimicking neurodegenerative progression.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel method and this approach is its ability to predict brain MRI changes over time with higher accuracy by using a comprehensive conditioning strategy that includes the latent representation of baseline scans, the specific age gap between scans, and patient-specific data such as cognitive status and age at baseline. This allows for a more nuanced understanding of disease progression across different age groups without the need for age-balanced datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This model could be its dependency on the quality and sophistication of the pretrained encoder and the correctness of patient-specific data. Errors or biases in these inputs could significantly affect the model’s predictions. Additionally, while conditioning on the age gap improves generalizability across different ages, it may not fully capture the complex interactions between age-related changes and disease progression in individual patients.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The description of reproducibility is poor.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The contributions are not emphased in the section 1. It would have been beneficial to have verification of its clinical usability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is difficult to expect reproducibility of the experiments in the paper, and clinical validation is challenging.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper details a new AI model, the Temporally-Aware Diffusion Model (TADM), designed to predict the progression of brain MRIs by learning intensity differences and incorporating age-related changes. TADM shows promising results, with significant improvements in accuracy when compared to existing methods, as demonstrated by its performance on the OASIS-3 dataset. The contributions of the paper include the introduction of TADM, a method for capturing intensity variations between MRIs, the use of age gap conditioning to reflect temporal changes, and the integration of a Brain-Age Estimator to produce age-consistent MRI predictions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper’s introduction of a diffusion model-based image prediction pipeline is an innovative approach that has shown promising results in performance evaluations.
    2. The authors demonstrated conditioning the diffusion model on age gap and baseline image rather than chronological age improves prediction performance, which may be interested to the community.
    3. The paper is well written, well organized, and clear.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper’s primary weakness lies in its evaluation methodology, which currently does not provide adequate evidence to substantiate all the claims made about the proposed work.
    2. The paper lacks crucial information regarding the training of the Bayesian Autoencoder (BAE) and specifics about the study population, which are essential for replicability and validation of the results.

    These issues will be further elaborated upon in the detailed comments section.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. A significant concern is the lack of evidence supporting the claim that the proposed pipeline aids in early disease detection and clinical intervention. Merely predicting future scans does not confirm sensitivity to disease progression. A suggested experiment could involve comparing structural changes between baseline and predicted future scans across different groups, such as MCI versus control or AD versus control, to validate the pipeline’s clinical utility.

    2. The use of global quantitative metrics like SSIM and PSNR to evaluate image consistency does not directly assess the quality of predictions across specific brain regions. The regional size error measures offer only an indirect evaluation. Error maps in Figure 2 suggest suboptimal performance in posterior brain regions over an 8-year span compared to DDM. Providing SSIM and PSNR metrics for distinct brain tissues, such as gray matter, white matter, and CSF, would offer a more nuanced understanding of the pipeline’s efficacy across various brain regions.

    3. While the qualitative evaluation in Figure 2, showcasing a single case, is helpful for visualization, it provides limited evidence of overall performance. A more robust evaluation could include average error maps for all subjects, leveraging their alignment in MNI space, to illustrate the mean performance of different methods.

    4. The paper omits essential details about the methodology and study population: (a) There is no thorough explanation of the BAE training and evaluation process. (b) The OASIS-3 dataset lacks basic demographic and clinical information, such as the number of subjects per group, age, cognitive scores, and follow-up duration. Additionally, it is unclear if the training, validation, and test splits are representative of the various diagnostic groups.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodology is novel and interesting, but the evaluation needs to be improved to support the claims.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my comments. This paper is methodically sounds and will be interested to the community. Clinical evaluation in the journal paper as followup work is appropriate.



Review #3

  • Please describe the contribution of the paper

    The authors propose the Temporally-Aware Diffusion Model (TADM), a diffusion model based on DDPM, with the objective of generating residual maps for future MRIs. TADM is constrained by three conditions: i) age differences; ii) baseline age; iii) baseline feature (extracted using a pre-trained encoder based on RRDB [15]). Age differences are explicitly constrained using loss based on the output of a pre-trained Brain-Age Estimator (BAE) trained on the same dataset as DDPM. The authors validated the proposed method using SSIM, PSNR, and region size errors on three brain regions and the whole brain, outperforming three tested methods from the literature. Additional analysis demonstrates the utility of the three conditions and the integration of BAE.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Flexibility of included conditions: The framework can incorporate arbitrary relevant conditions to guide DDPM learning.
    • Sufficient validation: The authors compared the proposed method with three other methods using image-level similarity metrics (SSIM and PSNR) and volumetric metrics (region size error). An ablation study was also conducted to assess each element of the method
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of mention of reproducibility: The authors did not mention releasing the source code.
    • Minor issues: DML is not defined on page 6. - Limited to 2D: The method is currently restricted to 2D, although the authors suggest it is easily extendable.
    • Limited discussion of separate training: Given the involvement of three models (Encoder, BAE, DDPM), it would be beneficial to explain the rationale behind separate training and explore the potential for an end-to-end approach.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    They utilized a publicly available dataset; however, they didn’t mention releasing the code. Moreover, insufficient information is provided for others to reproduce the method, such as the steps for implementing DDPM.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    ● Summary: “The authors propose the Temporally-Aware Diffusion Model (TADM), a diffusion model based on DDPM, with the objective of generating residual maps for future MRIs. TADM is constrained by three conditions: i) age differences; ii) baseline age; iii) baseline feature (extracted using a pre-trained encoder based on RRDB [15]). Age differences are explicitly constrained using loss based on the output of a pre-trained Brain-Age Estimator (BAE) trained on the same dataset as DDPM. The authors validated the proposed method using SSIM, PSNR, and region size errors on three brain regions and the whole brain, outperforming three tested methods from the literature. Additional analysis demonstrates the utility of the three conditions and the integration of BAE.” ● Strength: “- Flexibility of included conditions: The framework can incorporate arbitrary relevant conditions to guide DDPM learning. - Sufficient validation: The authors compared the proposed method with three other methods using image-level similarity metrics (SSIM and PSNR) and volumetric metrics (region size error). An ablation study was also conducted to assess each element of the method.” ● Weaknesses: “- Lack of mention of reproducibility: The authors did not mention releasing the source code. - Minor issues: DML is not defined on page 6. - Limited to 2D: The method is currently restricted to 2D, although the authors suggest it is easily extendable. - Limited discussion of separate training: Given the involvement of three models (Encoder, BAE, DDPM), it would be beneficial to explain the rationale behind separate training and explore the potential for an end-to-end approach.”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    They have introduced a compelling diffusion-based approach for synthesizing future MRIs, incorporating individualization through baseline MRI representation condition. Additionally, they integrated BAE to explicitly minimize age gap differences in loss. Their validation process also involved comparisons with recently developed methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their useful comments. We are glad to see that all the reviewers highlighted the novelty of our method. However, we respectfully disagree with R4, who criticized the reproducibility and the lack of clinical usability. We provide below a point-by-point response to the main concerns.

R3, R4: [“The authors did not mention releasing the source code” and “The description of reproducibility is poor”] We respectfully disagree with this, as we provided all the required details in the paper and, as specified in the introduction, we will publicly release the code upon acceptance.

R1, R4: [Details on the training of BAE and the OASIS dataset] We believe R1 misunderstood the acronym BAE, which stands for Brain-Age Estimator, as introduced in the abstract. The BAE was introduced in previous works [17], and the training details can be found there. Regarding the OASIS dataset, some statistics are already specified in Section 4. However, we will add all the requested data in the revised version of the paper.

R1, R4: [“lack of evidence supporting the claim that the proposed pipeline aids in early disease detection and clinical intervention” and clinical usability] The clinical usability of brain progression simulators is well-known in the literature. They can be used to recover missing images in longitudinal data, as a potential virtual placebo or for patient stratification[11,13,18]. However, their clinical validation is difficult to be evaluated and it is out of the scope of our work. For these reasons, we will substitute this claim with a discussion on the possible clinical applications.

R1.1: [Computing SSIM and PSNR metrics for distinct brain tissues] Thank you for your suggestion. Although we believe this can improve the evaluation, we highlight that we used metrics that are the standard evaluation protocol for this task [13,16].

R1.2: [Evaluate across different groups] Although the suggested evaluation provides insight into clinical usability, we believe it is more suitable for a journal paper.

R1.3: [Suboptimal performance in posterior brain regions and evaluation with average error maps] We agree with the comment, but we point out that by looking at the rest of the brain and for all the other cases, our method performs better. We understand the difficulty in evaluating a single image result; for this reason, as you suggested, we have computed the average error maps and will insert them in the camera-ready version to show the average overall performance. 

R3.1: [Explore end-to-end training] Thank you for the interesting suggestion. We will explore the potential of an end-to-end training in the journal version of the work.

R4.1: [Contributions not emphasized] Please see the last paragraph of the introduction as we have provided a detailed list of contributions there.

R4.2: [“it is unclear if the training, validation, and test splits are representative of the various diagnostic groups”] Yes, they are representative of the diagnostic groups. The dataset was carefully split to ensure that each diagnostic group was proportionately represented in the training, validation, and test sets.

R4.3: [“This model could be dependent on the quality and sophistication of the pretrained encoder and the correctness of patient-specific data”] We agree with the reviewer. However, we would like to point out that this should not be considered a weakness of our method since it is a common characteristic of any deep learning model, as explored by previous literature. 

R4.4: [Individualisation of predictions] The individualization of predictions is achieved by adding patient-specific data, such as the patient’s cognitive status and age at baseline.

[17] Jónsson et al. “Brain age prediction using deep learning uncovers associated sequence variants”. Nature communications 2019 [18] Young et al. “Data-driven modelling of neurodegenerative disease progression: thinking outside the black box”, Nature Reviews Neuroscience 2024




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors’ rebuttal has comprehensively addressed reviewers’ concerns within the limited space. The author’s rebuttal to the negative score from reviewer is reasonable.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors’ rebuttal has comprehensively addressed reviewers’ concerns within the limited space. The author’s rebuttal to the negative score from reviewer is reasonable.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top