Abstract

Benefiting from longitudinal pair-wise brain 18F-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) images, disease progression characterized by the generative model may assist the baseline visit prediction of early Alzheimer’s Disease. However, most existing methods focused on diagnosing disease from single-timepoint scans or a simple stacking of sequential images, which ignore the importance of disease progression and are not in line with actual clinical scenarios. Moreover, decoupling the low-level disease representations is quite challenging for similar changes between normal aging and neurodegenerative changing. In this paper, we propose a classifier-induced generative model to generate the next-timepoint brain images. Then, we design a statistical prior knowledge vision transformer to extract features from the generated next-timepoint images for disease diagnosis. The main contribution is to build a disease progression model that can effectively improve diagnosis performance from single-timepoint images. Meanwhile, we provide pixel-level disease representations for explanation. Experiments on ADNI datasets demonstrate that our method outperforms other state-of-the-art techniques.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2728_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{GaoXin_ProgressionAware_MICCAI2025,
        author = { Gao, Xingyu and Guo, Runyi and Zhao, Zhe},
        title = { { Progression-Aware Generative Model Enhancing Baseline Visit Prediction of Early Alzheimer’s Disease } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {498 -- 507}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a prior-knowledge vision transformer to learn longitudinal progression patterns from FDG-PET and use them to synthesize future timepoints, enabling classification tasks from baseline data based on the predicted future timepoint. The performance of the method is compared against existing techniques for classifying disease stages.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The major strength of the paper is the development of a longitudinal model that can be used for classification from a single timepoint and evaluation against alternative methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The main weakness of this paper are the lack of a completely held out test set, the evaluation on just one task, and the very limited information about how the alternative methods were implemented.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach appears to be interesting but the evaluation was quite limited.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper should be accepted as the method is novel and interesting and the authors answered most of the reviewer concerns in the rebuttal.

    In future it would be interesting to explore R2 more by testing the performance of the first approach in Table 2 on synthetic data. For R8, further experiments could include classification of CN-CN vs MCI-AD classification over other time windows than 3 years (e.g. 1, 2, 4, 5 years), and a simple demonstration that the model can also perform well for the simpler task of CN-CN vs AD-AD classification. A held out test set, e.g. in a newer phase of ADNI (ADNI 3 or ADNI4) would also be beneficial to demonstrate generalisability, as would testing in an external dataset.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a classifier-induced generative model that synthetises future brain PET scans to better capture Alzheimer’s disease (AD) progression. By incorporating a vision transformer with statistical priors, the method extracts features from generated future images to enhance AD diagnosis. Results suggeset benefits of image generation on the differentiation between patients and healthy subjects (above the baseline scans) and thus proposes an interesting introduction of time domain within the disease prediction.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The topic is highly clinically relevant, addressing a critical need for early diagnosis and treatment planning in Alzheimer’s Disease (AD) patients.
    • The study uses the well-validated ADNI dataset for both training and testing.
    • The integration of temporal domain into image generation for diagnostic prediction is both innovative and crucial for progressive diseases, and it could be expanded to other conditions (e.g., MS, ALS) or even neurodevelopmental outcomes in early life, incorporating brain maturation trajectories. It would improve the impact of the study if the authors discussed these possibilities in more detail and explored potential applications in other areas.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Given the focus on incorporating typical aging from disease progression, it would be helpful to better understand the age distribution across both typical and patient subjects. Are there significant age-related differences, and is image generation consistent across these groups, especially considering different ages at the time of scanning?
    • In Section 4.3 :’ These methods are trained and tested using the baseline visit images. Specifically, the first approach is trained and tested on original images, while the last three are trained and tested on synthetically generated 36-month images from original images’. There seems to be confusion between the first and second sentences. Does this imply that the first row in Table 2 is trained on original images, while the others use synthetic 36-month images? If so, this distinction should be clarified in the table. Moreover, why not test the first model with synthetic images as well? Without this, the claim that “methods trained on generated 36-month images perform better than those trained on original images” is misleading, as it only shows that the three models outperform Ashtari-Majla et al., 2022.
    • While the introduction is thorough, it would benefit from a more balanced discussion, particularly in the results and limitations sections. The discussion of the results should delve deeper into failure cases and diagnostic uncertainty. How does the model perform with borderline or low-quality images, and are there biases in misclassifications? Additionally, how does image generation quality impact diagnostic performance?
    • Using a single dataset—especially one as commonly used as ADNI—raises questions about the generalizability of the proposed method.
    • The manuscript would benefit from another round of proofreading to address some minor grammatical issues.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Incorporating a generative module to model disease prediction for enhancing AD diagnosis is both innovative and promising. The presentation would be strengthened by more extensive results and discussion, as well as validation on additional datasets beyond ADNI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The proposed method looks promising and would be interesting to the audience.



Review #3

  • Please describe the contribution of the paper

    The main contribution of the paper is the development of a classifier-induced generative model for predicting the next-timepoint brain PET image in early Alzheimer’s Disease, based on a baseline scan. This predicted progression (generated image) is then used by a statistical prior knowledge vision transformer to improve disease diagnosis. Additionally, it provides pixel-level disease representations, offering interpretability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method introduces a novel approach by generating the next-timepoint brain PET image from a single baseline scan, enabling the modeling of disease progression without requiring longitudinal input at inference time. This aligns closely with clinical practice, where only baseline scans are often available. The use of a classifier to guide the generative model ensures that the generated images are not only realistic but also informative for diagnosis. Incorporating prior statistical knowledge into the Vision Transformer architecture is a novel design that enhances feature extraction from the generated images, potentially improving generalization and interpretability. The model provides pixel-level outputs that highlight disease-relevant regions, contributing to interpretability and aligning with the growing demand for explainable AI in medical imaging.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The study relies solely on the ADNI dataset without any mention of external clinical datasets, expert evaluation or potential integration into real clinical workflows. This limits the assessment of its clinical feasibility and robustness in real-world settings. There is no analysis of how the model performs across different disease stages, scanner types or imaging protocols. Without such tests, claims of generalization should be careful. The paper does not include dedicated evaluation metrics of the quality and realism of the generated next-timepoint PET images (e.g., using SSIM, PSNR, etc.). The contribution of each component (e.g., the generative model, classifier guidance, statistical prior in the transformer) is not quantified through ablation studies. This limits understanding of what drives the performance gains.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper is well written, well organized and presents an interesting approach to modeling Alzheimer’s disease progression by generating future PET images from a single baseline scan. The integration with a classifier and Vision Transformer is promising and shows potential. However, in my opinion, there are some aspects that need further clarifications and refinements.

    • The title is somewhat confusing and could be clarified to better reflect the core contribution of the paper
    • It is unclear what the original resolution of the PET images is, but the input size used in the model was heavily downsampled (76x94x76). The authors should clarify the original image dimensions and justify the extent of downsampling, as this may lead to loss of relevant clinical detail and affect the model’s diagnostic performance.
    • The authors don’t specify how the dataset was split between training and test sets. This information is essential to assess the reliability and generalizability of the reported results.
    • In the classification task, the negative class (CN) has significantly fewer samples (88) than the positive class (pMCI) (192), yet this class imbalance is not addressed or discussed. This imbalance can strongly affect performance metrics such as accuracy. The authors should report class distribution explicitly and clarify whether any strategies (e.g., class weighting, resampling) were used to mitigate its impact.
    • In Section 4.2, when referring to “From the first two results” in Fig 3, the authors should be more explicit about which cases they are referring to (128 and 072 or 128 and 037?)
    • Drawing conclusions such as “the residual images from each patient are similar” based solely on visual inspection should be avoided. Visual analysis can be subjective and insufficient to support such claims.
    • In Section 4.3, it is unclear why the first approach is trained and tested on a different data compared to the other three methods.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a promising approach for early Alzheimer’s disease diagnosis by integrating a generative model with a classifier and a Vision Transformer, using single-timepoint PET scans to model disease progression. The method is well-motivated and the results on the ADNI dataset are competitive. However, there are some issues that limit the strength of the contribution, including the exclusive use of heavily downsampled data, lack of detail on data splits and class imbalance. Despite these weaknesses, the proposed framework introduces a novel integration of components and shows potential. With clearer experimental design and more rigorous evaluation, it could be a valuable contribution to the field.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

R1: Are there significant age-related differences, and is image generation consistent across these groups, especially considering different ages at the time of scanning? As shown in Fig3, the residuals between ID:128_S_0227 (80-year-old) and ID:072_S_1211 (87-year-old) are different. The residuals between ID:037_S_0327 (70-year-old) and ID:035_S_0048 (78-year-old) are also different. Within the same age cohort, the residuals are different in general. Image generation is quantified by L1(Eq.3) and SSIM(Eq.4) metrics between longitudinal images and image generation consistency is qualitatively evaluated by comparing real and generated residuals.

R2: Is the first row in Table 2 trained on original images, while the others use synthetic 36-month images? Yes. Our initial idea is to compare our methods (the last two models in Table 2) with one competitive method on original images and another one on synthetic 36-month images. It is unfair for the first method, its results on synthetic images should be added.

R3: How does the model perform with borderline or low-quality images, and are there biases in misclassifications? And how does image generation quality impact diagnostic performance Compared with the borderline or low-quality of PET images, consistency of image quality is the major factor affecting model performance. For paired longitudinal images, variations in image quality will degrade the target image representation and lead to misclassification. If there are all low/high quality images, the performance of the generative model would not be affected.

R4: What is the original resolution of the PET images? Does down-sampling affect the model’s performance? The original resolution is 182x218x182. In our experience, this down-sampling operation can accelerate model training without compromising performance.

R5: How was the dataset split between training and test sets? 5-fold cross-validation was used to test the performance of our method. We collected all subjects with CN-to-CN and MCI-to-AD between 36 months from the ADNI dataset.

R6: How to deal with the class imbalance of the train and test set? As stated in Section 3, we randomly selected an equal number of samples from the two classes for the test set. For the training set, we adopted a resampling strategy on the CN samples to maintain class balance.

R7: “From the first two results” in Fig 3, should be more explicit. The first two results refer to subjects with ID of 128_S_0227 and 072_S_1211, which should be clarified in the manuscript.

R8: This paper suffers from two limitations: the lack of a completely held-out test set and the evaluation on just one task. This review is crucial and gets straight to the point. We worked hard to collect all available paired PET images from the ADNI dataset. There are reasons for not including additional data or tasks: 1) longitudinal MRI scans suffer from poor image consistency, which has not yet been resolved. 2) the CN-to-MCI and CN-to-AD groups contain few paired samples. Our model and paradigm will be validated on other diseases in the future.

R9: The title is somewhat confusing and could be clarified to better reflect the core contribution of the paper. The core contribution is the use of longitudinal paired images to train a generative model for representing disease progression. We then applied this model to improve prediction at the cross-sectional baseline visit image.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top