Abstract

Accurately predicting visual field progression is critical for early intervention and personalized treatment of glaucoma. However, existing methods struggle with both predictive accuracy and reliable uncertainty quantification. This paper introduces a framework that leverages diffusion models and conformal risk control to generate robust and interpretable forecasts of visual field deterioration. We first train a diffusion model to predict future visual fields based on a patient’s past examinations. To ensure trustworthy predictions, we design a novel archetypal-based conformal risk control method, which provides finite-sample coverage guarantees on intervals of archetypal contributions to the prediction uncertainty. This framework captures the underlying structures within the uncertainty, enabling clinicians to interpret a range of potential progression patterns rather than a single deterministic outcome. Experimental results illustrate that our method achieves the target archetypal contribution coverage while providing tighter prediction intervals than baselines. Visualizations show how archetypal visual field patterns contribute to prediction uncertainty, offering interpretable insights into disease progression. By combining diffusion models with conformal methods, our framework enhances the reliability of AI-assisted visual field forecasting, ultimately supporting improved clinical decision-making.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1065_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/averysi224/abci.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SiWen_Reliable_MICCAI2025,
        author = { Si, Wenwen and Lin, Vivian and Sun, Bo and Jang, Kuk Jin and Xing, Rubo and Saeed, Almiqdad and Nagatani, Rina and Sokolsky, Oleg and Lee, Insup and Al-Aswad, Lama},
        title = { { Reliable and Interpretable Visual Field Progression Prediction with Diffusion Models and Conformal Risk Control } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {550 -- 559}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    A diffusion model to predict visual field progression for glaucoma, and achieving good results.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. novel idea to use generative model for visual field prediction.
    2. Good validation on the proposed method.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. How can you guarantee that the diffusion model has learned the distribution of the desired data, since the visual field is both hard in patterns, unclear in medical mechanisms, and low in data scale.
    2. Comparison to the papers, especially those using fundus images for glaucoma forecast, e.g., ‘DeepGF: Glaucoma Forecast Using the Sequential Fundus Images’ should make the contribution and novelty more clear.
    3. Ablation study in needed
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above for weakness clarification

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The authors have limited knowledge towards the task of generation itself, according to their response to my major concern. This is more suitable to design a deterministic prediction task using sequential input images, rather than using a generative model, especially diffusion. Also, the lack of proper literature review to the medical application of using sequential VF/fundus images in timely and early diagnosis also make the novelty unclear. From both the two perspective, I persist a clear reject.



Review #2

  • Please describe the contribution of the paper

    The authors propose ABCI, a novel uncertainty quantification framework for visual field prediction that integrates archetype-based modeling with conformal risk control. Unlike existing approaches, ABCI constructs archetypal-based intervals that simultaneously ensure the selection of clinically meaningful visual field loss patterns and the valid coverage of their contributions. This enables more interpretable and reasonable uncertainty quantification. The method is evaluated on two large-scale glaucoma VF datasets, demonstrating consistent coverage control and clinically relevant uncertainty representation across various settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clinical Significance: The use of recognized VF archetypes to represent uncertainty enhances the interpretability of the predicted outcomes. This approach has the potential to improve the practical utility of AI models in clinical decision-making, even though the manuscript does not explicitly discuss this benefit in detail.
    • Methodology: ABCI is grounded in conformal risk control, and its adaptation to archetypal representation is particularly suitable in the context of VF data, which inherently exhibit structured and compressible patterns. The authors provide a meaningful contribution by extending statistical archetypal analysis to conformal risk control domain.
    • Evaluation: ABCI is validated on two large-scale glaucoma datasets, both in within- and cross-dataset scenarios, and stratified by disease severity and different α/β configurations. ABCI consistently maintains acceptable risk levels and robust performance, supporting its generalizability and clinical applicability.
    • Visualization: The visualizations included in the manuscript are clear and informative.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The motivation for using a diffusion model as the backbone for VF prediction is not clearly explained. It remains unclear whether the accuracy of the VF prediction model substantially influences the quality or reliability of the uncertainty quantification.
    • Key implementation details, particularly regarding archetype selection mask and archetypes selection, are omitted. An improved description is essential for understanding the approach.
    • In Table 2, under Moderate stage with fixed β = 0.15, the reported pixel interval size under ABCI increases from 0.258 to 0.292 as α increases from 0.25 to 0.30. This appears to contradict standard expectations in conformal control, where a higher α generally allows for smaller or comparable intervals. The authors are encouraged to clarify whether this reflects any properties of the archetypes-based design?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite some gaps in methodological clarity, this paper addresses an important problem in clincial/VF AI. The proposed method is well-motivated, and the evaluations are comprehensive.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors’ feedback has adequately addressed my concerns. I agree to accept this paper for presentation at MICCAI 2025.



Review #3

  • Please describe the contribution of the paper

    This paper proposes ABCI, a novel pipeline based on diffusion model and Conformal Risk Control to predict future vision fields for Glaucoma. This paper introduces clinical prior archetypal to decompose the uncertainty of the predicted VFs, which is more practical and interpretable. Furthermore, the CRC process is introduced and improved, in order to quantify the uncertainty of VF progression.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper presents a novel integration of the risk control framework and archetype analysis in Glaucoma, enabling it to be used in a plug - and - play manner to evaluate the generated results of generative model.
    2. This paper noticed and focused on uncertainty in VF prediction. A CRC framework is employed to predict an interval which takes uncertainty into account.
    3. Prior clinical knowledge are introduce through archetypal analysis, which helps to improve interpretability and accuracy.
    4. Structrual information is maintained due to archetypal analysis, which can be a major concern in risk-controlling-triggered image pattern analysis.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. “Archetype” may be a little unfamiliar to MICCAI community. A breif introduction to this concept can be given.
    2. Further description and analysis can be made on horizon conditioning.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper shows novelty in model design and shows great application potential.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We propose ABCI, an archetype-based conformal method built on diffusion models (DM) for VF progression. Unlike pointwise predictions, it guarantees coverage of both archetypal contributions to the ground truth and archetype selection. Reviewers agree we tackles an important clinical problem with a novel, interpretable uncertainty quantification framework (R1, R2). They emphasize the use of known VF archetypes and clinical priors as key to improving interpretability and practical value (R1, R2). They recognize our methodological extension of Conformal Risk Control (CRC) to archetypal representations—well-suited to VF data (R1, R2)—and our enhancements to CRC, enabling structured, clinically meaningful uncertainty decomposition (R2). The validation is on two large-scale glaucoma datasets with consistent results across settings, supporting generalizability and clinical utility (R1, R3).

  1. How to verify a DM learned the true data dist, despite… (R3) A: We do not assume the DM captures the ground truth distribution. Instead, CRC[2] guarantees coverage on the ground truth by scaling the diffusion predicted uncertainty. Also, noisy data indicate high uncertainty, not unlearnability—which motivates our use of uncertainty quantification for reliable predictions.
  2. Comparison with DeepGF (R3) A: We carefully reviewed DeepGF and found it proposes a deep learning method for glaucoma diagnosis–a binary classification task. In contrast, VF progression is image-to-image. The methods are not comparable.
  3. Motivation for diffusion backbone (R1) A: VF progression prediction is an image-to-image generation task. DMs excel at modeling images, surpassing VAEs and GANs[25]. Being probabilistic, they also capture uncertainties inherent in VF data. Thus, DMs are ideal for our task. [25] Dhariwal, P. & Nichol, A. (2021). DMs Beat GANs on Image Synthesis.
  4. VF Model Performance vs. UQ & ablation study (R1 & R3) A: CRC [2] is model-agnostic, thus its coverage guarantee (i.e., reliability) holds regardless of the backbone. Yet, backbone performance affects the quality of UQ intervals—a weaker model typically yields larger intervals across conformal methods, including baselines (pixelwise and PUQ). Further ablations are limited by space but a promising extension.
  5. Details on archetype selection (R1) A: Archetype selection is guided by reconstruction risk (Eq. 4). For each input, the DM generates multiple samples (n = 200), yielding the predicted average and centered uncertainty y_c_hat (line 11, p. 5). y_c_hat is projected onto the archetypal space and normalized, producing weights w as scores (indicating the predicted significance). On calibration, threshold λ1 is set to select the top-K archetypes. We compute the centered ground truth y_c and include top archetypes that reconstruct y_c within a target similarity (Eq. 4). For instance, beta = 0.15, q=0.9 means 90% of pixels differ by <0.15. We select the largest valid λ1.
  6. Interval size trend in Table 2 (R1) A: We jointly control two risks via three params: α for coefficient coverage, and β, q for archetype selection (Eq. 6). Indeed, larger α yields tighter coefficient intervals, yet notice the pixel-wise interval size also jointly depends on selected archetypes. In experiments, we set q = 0.9–0.95 (line 6, p. 6). Since CRC is marginal, large α may result in no coverage for some samples. Thus, we adjust q to select more archetypes, which balances the tighter coefficients and result in larger final intervals.
  7. Introduction to AA (R2) A:Archetypal analysis [6] identifies extremal patterns in the data, forming base patterns. Clinically, the 16 VF archetypes [6] are widely used for diagnosis, supported by strong biological relevance. We’ll include this in Section 2.1.
  8. Horizon conditioning (R2) A: The horizon is encoded via MLP and fed to the DM. We jointly sample start/end VFs with their horizon from the dataset, so that it aligns with clinical distributions. We’ll add this in the revision.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While Reviewer #3 initially maintains a ‘weak-reject’ stance, I find the authors’ rebuttal to have comprehensively addressed these points. Regarding the first new objection—that a deterministic task design would be preferable to a generative (diffusion) approach—I find the authors’ methodology enhances interpretability in a way that meaningfully addresses this critique. For the second new concern (citation gaps), while additional references could strengthen context, this does not invalidate the paper’s novel contribution.Given the convincing resolution of pre-rebuttal concerns, the paper’s demonstrated strengths in [e.g., innovation/impact/practicality], and alignment with the other reviewers’ post-rebuttal support for acceptance, I override the reject recommendation. Given the appropriate response to critiques—alongside strong alignment with the other reviewers’ post-rebuttal acceptance recommendations—I recommend acceptance.



back to top