Abstract

Longitudinal medical records offer crucial insights into disease progression, including structural changes and dynamic evolution, essential for clinicians in treatment planning. However, existing disease forecasting methods are hindered by irregular data collection intervals, negligence in inter-patient relationships, and a lack of case-reference capabilities. We introduce tHPM-LDM, a glaucoma forecasting framework leveraging continuous-time attention within a historical condition module to capture disease progression from irregularly acquired records. Notably, our approach integrates population memory, enabling personalized forecasting through relevant population patterns. Empirical evaluations on the SIGF glaucoma longitudinal dataset demonstrate the significant improvements of our approach in image prediction and category consistency compared to state-of-the-art methods. Furthermore, our approach provides interpretable individual-population patterns and showcases robust performance despite missing visits.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1410_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/yhf42/tHPM-LDM

Link to the Dataset(s)

https://github.com/XiaofeiWang2018/DeepGF

BibTex

@InProceedings{FanYuh_tHPMLDM_MICCAI2025,
        author = { Fan, Yuheng and Xie, Jianyang and Luo, Yimin and Meng, Yanda and Madhusudhan, Savita and Lip, Gregory Y.H. and Cheng, Li and Zheng, Yalin and Zhao, He},
        title = { { tHPM-LDM: Integrating Individual Historical Record with Population Memory in Latent Diffusion-based Glaucoma Forecasting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {622 -- 632}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work introduces a diffusion model-based framework for sequential fundus image generation and glaucoma forecasting. The authors propose two novel modules: the Continuous-Time Multi-Scale Historical Feature module (t-MSHF) and the Population Memory Query Module (PMQM). The t-MSHF incorporates an ODE solver for continuous temporal modeling, while the PMQM builds a memory pool to store evolution patterns from the training set, enabling case retrieval during the testing phase. All experiments are conducted on the SIGF database, and the model’s performance is evaluated on both image prediction and category forecasting tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Glaucoma forecasting is a critical topic in disease prognosis. Decoupling future state generation from prediction enhances the interpretability of this task.
    2. Incorporating ODEs into the feature representation to transform discrete time points into a continuous time range is an interesting approach for modeling longitudinal fundus images.
    3. The authors perform a detailed ablation study on the proposed modules, particularly focusing on case studies of memory retrieval results, which emphasize the effectiveness of the PMQM.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. My primary concern lies in the authorization and accessibility of the SIGF database. According to the latest information provided by the authors of DeepGF, which is the first work utilizing the SIGF dataset for glaucoma forecasting, the dataset is currently undergoing an ethics review and is not publicly available. I am curious whether this database has since become publicly accessible. (https://github.com/XiaofeiWang2018/DeepGF)
    2. My second question pertains to the input sequence embedding, which includes sequential images, time information, and corresponding labels. Including ground truth labels for each accessible visit introduces strong prior knowledge, which may significantly impact the predictive model’s performance. Notably, prior work like C2FLDM did not use such labels as input. To ensure a fair comparison, I suggest either removing this component or adding single-visit labels during the reimplementation of C2FLDM for consistency.
    3. The quantitative results reported in Table 2 differ significantly from those presented in the C2FLDM paper. Specifically, the AUC for C2FLDM in the original paper is 95.5, whereas it is reported as 82.06 in this paper. Given that both studies utilize the same dataset and split, such a large discrepancy appears unusual and warrants further investigation.
    4. The proposed tHPM-LDM framework requires capturing temporal variations across sequential fundus images. However, based on the observations in Fig. 3(a), the center of the optic disc varies significantly across visits, showing noticeable displacement. I am curious whether the authors applied preprocessing steps like rigid registration to align the images in the sequence. Directly inputting unregistered images may introduce noise that adversely impacts the image generation process.
    5. My final concern pertains to the imbalanced distribution of the SIGF dataset. Did the authors implement any specific strategies, such as weighted loss functions, balanced sampling, or data augmentation, to address this issue during the training stage?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary concern is the availability of the SIGF database. Additionally, some results presented in this paper differ significantly from those reported in the referenced paper. However, the authors have proposed a promising solution for the glaucoma forecasting task. Addressing the aforementioned concerns would further enhance the quality and transparency of this work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed my concerns.



Review #2

  • Please describe the contribution of the paper

    This paper aims to enhance glaucoma forecasting by integrating individual historical medical records with broader population data. It addresses the challenges of irregular data collection inherent in longitudinal studies, which complicate the modeling of disease progression. While traditional methods have made strides, they often do not fully exploit longitudinal data or accurately predict visual outcomes. The proposed tHPM-LDM employs a continuous-time attention mechanism to capture the dynamic evolution of glaucoma from these irregular records, improving personalized forecasting and image prediction accuracy. By referring to population patterns during forecasts, the framework allows for more individualized forecasts. Evaluations using the SIGF glaucoma dataset show significant improvements over existing methods in image prediction and consistency across categories. Furthermore, the model demonstrates robustness against missing data and provides interpretable insights into disease patterns at both individual and population levels. The paper emphasizes the importance of incorporating historical and population information in forecasting models to enhance the early detection and treatment of glaucoma.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1: Utilizes continuous-time attention mechanisms to address irregular data collection issues inherent in longitudinal studies. 2: Effectively integrates individual historical data with broader population memory, allowing for more personalized predictions.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1: The SIGF dataset is too small and the image data is imbalanced, with only 30 glaucoma patients. Evaluating the model on a larger and more balanced dataset would more effectively demonstrate its capabilities. 2: The C2FLDM achieved 49.86% accuracy in experiments, significantly lower than the 94.4±0.5% reported in the original paper. What accounts for this discrepancy in accuracy, and how might the calculation of accuracy differ between the two studies?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a well-written paper. Could you please provide further details regarding the time embedding technique employed and the specifics of the training process? Additionally, what GPU was utilized for training, and what was the approximate total GPU hours required for convergence?

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Accept



Review #3

  • Please describe the contribution of the paper

    This paper introduced tHPM-LDM, a conditional latent diffusion model that can encode longitudinal glaucoma images and generate future glaucoma images. The authors proposed to use two new methods - t-MSHF and PMQM to help the information fusion. The t-MSHF module is a transformer that can encode multi-scale features from multi-visit fundus images. The PMQM module is a group memory pool trained with SwAV objectives that encodes popular condition.

    The authors demonstrated the promising experimental results of using the trained model to do image synthesis, glaucoma forecasting. The authors also conducted ablation studies showing the choice of the population pool, the effectiveness of each module, and the possibility to use the model to predict with missing visit analysis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is well written and self-contained. The figure demonstration looks intuitive and easy to understand.
    2. The proposed methodology looks interesting and novel - the way the authors designed to encode time information is inspiring.
    3. The experimental results look promising. The ablation studies reasonably validated the effectiveness of the proposed design.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are also some limitation of this paper:

    1. Despite the authors claimed to publicize the code and checkpoint if being accepted, it is still a bit of worrying.
    2. The Conv & Norm module in PMQM, as far as I understand, was not covered with details in the paper. The author should clarify on this.
    3. It would be great if the authors can provide more details about the training details.
    4. I am curious: The cross-attention in PMQM is only a 1 layer design? Did the author try increase parameters here?
    5. The authors should discuss what is the limitation of their models.
    6. I think from a very strict perspective, the irregularity hidden in the data is not solved yet, the model is trying to embed time-varying trajectory into a unified step-wise representation space. I think the authors might bring discussion on this and propose what might be the solution to further quantify the learnt representation.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I think the author could have efforts on publicizing an anonymous link to provide more evidence about the work.

    Also, I want to hear how authors will think about the limitation of this work too.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is easy to read, the idea is interesting, the results look promising, the ablation studies looks reasonable, and the conclusion is self-contained.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed most of my concerns. I am glad to see the release of the related / code / checkpoint, and potentially the release of the dataset upon request.




Author Feedback

We sincerely appreciate the reviewers and ACs. Responses are as follows (R: Reviewer, Q: Question):

Data concerns: [Availability R3Q1] We obtained SIGF dataset before the ethic review, and SIGF is currently available again upon request. [Imbalance R1Q1, R3Q5, R3Q5] For the imbalance of SIGF, we followed the conventional SIGF process method as C2FLDM(MICCAI24) and GLIM-Net(TMI23), that augmented records into clips. Our classifier is trained with image augmentations (rotation, blurring, etc.) and the balanced BCE loss (ratio=0.64:0.36). These attempts partially mitigate imbalance concern for generation and classification. Future work will focus on evaluation using large-scale datasets. [Registration R3Q4] We recognize the value of registration and will incorporate this method in future work. Currently, we follow the same methods in C2FLDM that use cropped images provided by SIGF, which are primary aligned with OC/OD regions and could achieve satisfactory quality.

Results: [Discrepancy R3Q3, R1Q2] Discrepancy comes from different classifier settings of two papers and our adjustment of issues in reimplementation. 1) Our category forecasting is based on the generated images, while C2FLDM is from the features generated by diffusion model. Besides this input difference, the structure and training of classifiers may vary, which also account for the discrepancy. We also show our classifier on real test images, which serves as the same upper-bound performance of all compared methods. 2) Moreover, the source code of C2FLDM may leak the “encoded target image” (not available in inference) during the fine stage (openaimodel1.py line 779). We replaced the “encoded target image” with the “coarse prediction” as described in their paper. This setting may also enlarge this discrepancy in our reimplementation. [Input label R3Q2] We note that C2FLDM code also adapts the labels in the coarse stage (train_vqldm.py line 238), our reimplementation keeps this setting. Thus, this is a fair comparison. The variant of our approach without labels as input could also achieve a notable result (AUC 81.34, PSNR 18.97).

Implementation Details: We apologize for not showing more details. Our code will provide more information, we have sent the anonymous link to ACs but cannot share here due to Rebuttal Guide. [R1, R2Q3] Experiments were conducted on a NVIDIA A4500 Ada 24G GPU. LDM optimized by AdamW with batch size=20 and initial lr=5e-5. Training time: LDM for 24.98h(700 epochs), VQGAN for 21.80h(300 epochs), classifier for 2.01h(300 epochs). [R1] We add sinusoidal encodings of times into image patches. [R2Q2] The target of Conv & Norm in PMQM is to project sub-embeddings of T(N/2)dm into two 1*dm vectors and ensure comparability, so we do not focus on its design. We first reduce the (N/2)-dim using 3 layers of Conv1d with kernel sizes of 5, 3, 1 and then use learnable weights to sum over the T-dim, finally applying L2 normalization. [R2Q4] Since the cross-attention in PMQM primarily focuses on retrieving individual-related population memory, we only adapted 1 layer design, but it is interesting in exploring scaling capacity on large-scale datasets.

[Limitation R2Q5] 1) Our work is primarily limited by the lack of larger-scale data validation. 2) We currently focus only on a single disease and have not yet addressed multi-disease cases. 3) The current category prediction results are limited by the pre-trained image classifier. [Discussion R2Q6] To address data irregularity, we compute the dynamic attention via continuous functions fitted from discretely observed features. While this approach might not be the optimal way in resolving the challenge, it offers a new technical perspective compared to previous methods, such as generative longitudinal completion (e.g., LDGAN, MLMI20) and Transformer-based modeling (e.g., GLIM-Net, TMI23). One possible qualitative evaluation of its effectiveness is through visualizing the fitted results in the feature space.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has convinced the one reviewer who rejected the paper in the first round.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top