List of Papers Browse by Subject Areas Author List
Abstract
Glaucoma is a leading cause of irreversible blindness, and early diagnosis is crucial for effective treatment. However, AI-assisted glaucoma diagnosis faces challenges in fairness and data scarcity, because AI model biases can lead to disparities across demographic groups. To address this, we propose GlaucoDiff, a diffusion-based generative model that synthesizes SLO images with precise control over the vertical cup-to-disc ratio. Unlike previous methods, GlaucoDiff enables bidirectional synthesis, generating both healthy and glaucomatous samples of varying severity, thus enhancing the dataset diversity. To ensure anatomical fidelity, GlaucoDiff leverages real fundus backgrounds while generating the optic nerve head regions. We also introduce a sample selection strategy that filters generated images based on the alignment agreement percentage, compared with target optic structures, ensuring the high-quality of the synthetic data. Experiments on two public ophthalmic datasets demonstrate that GlaucoDiff outperforms state-of-the-art approaches in both diagnosis and fairness measurement settings. Two independent ophthalmologists’ evaluations confirm the clinical relevance of the generated images, highlighting GlaucoDiff’s potential for improving AI-driven glaucoma diagnosis. Our code is available.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1939_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/WANG-ZIHENG/GlaucoDiff
Link to the Dataset(s)
Harvard-FairSeg Dataset: https://github.com/Harvard-Ophthalmology-AI-Lab/FairSeg
Harvard-FairVLMed dataset: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
BibTex
@InProceedings{WanZih_FairnessAware_MICCAI2025,
author = { Wang, Ziheng and Yang, Shuran and Chen, Wen and Zhang, Zhen and Wang, Mengyu and Zhou, Feixiang and Tian, Yu and Wang, Meng and Zhao, Yitian and Zheng, Yalin and Meng, Yanda},
title = { { Fairness-Aware vCDR-Controlled Generation for Glaucoma Diagnosis } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15968},
month = {September},
page = {255 -- 265}
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors propose a novel diffusion-based generative model, GlaucoDiff, to synthesize scanning laser ophthalmoscope (SLO) images with precise control over the vertical cup-to-disc ratio (vCDR). The model is capable of generating both healthy and glaucomatous images. Additionally, the authors introduce a sample selection strategy to automatically filter and retain high-quality generated images. In their evaluation, GlaucoDiff outperforms existing approaches in classification accuracy and fairness metrics.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The work addresses two important areas in medical imaging AI: synthetic image generation and fairness in model performance, both of which are highly relevant for improving glaucoma diagnosis.
-
GlaucoDiff demonstrates the ability to generate realistic SLO images with precise conditioning on vCDR.
-
The authors employ an automated process to retain reliable synthetic images, enhancing the overall quality of the generated dataset.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The clinical utility of the study is limited by its exclusive focus on vCDR, while neglecting other important glaucoma-related features such as disc hemorrhage, neuroretinal rim thinning, and retinal nerve fiber layer (RNFL) defects. Notably, the same vCDR can have different clinical implications depending on disc size.
-
The ground truth labeling process is inadequately described. The manuscript states that “two independent ophthalmologists annotated FairSeg with healthy and suspected glaucoma labels,” but does not clarify the labeling criteria, how inter-rater disagreements were resolved, and the accuracy of the human annotations.
-
Cup segmentation is known to be challenging, especially in cases with poorly defined boundaries. The authors should clarify how they handled ambiguous cases during segmentation.
-
The Methods section lists both ControlNet and GlaucoDiff, but their relationship remains unclear.
-
The claim that “ophthalmological expert evaluations confirm the clinical relevance of the generated images” lacks definition. How was clinical relevance assessed?
-
Key demographic details, including race, gender, and age distributions of the datasets, are missing and should be reported.
-
The assumptions underlying the evaluated methods are not explicitly stated. It is important to know whether those assumptions were met in the evaluation setting.
-
Both evaluation datasets (FairSeg and FairVLMed10k) are from the same institution, which limits generalizability. Validation using other datasets from different institutions would strengthen the findings.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Relying solely on vCDR to guide synthetic image generation risks introducing bias into glaucoma-related deep learning models. Glaucoma diagnosis often involves multiple structural indicators beyond vCDR, including disc hemorrhage, neuroretinal rim features, and RNFL defects. The authors are encouraged to incorporate a more comprehensive set of glaucoma features when designing synthetic image generation frameworks.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors answered my questions successfully.
Review #2
- Please describe the contribution of the paper
This paper proposes a stable-diffusion-based image generation algorithm, which can precisely control the vCDR of the generated SLO fudus images. It also provides a selection strategy to choose high quality images from the synthetic image set. By adding the synthetic images to the training set, the overall classification precision and fairness are improved.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method is simple but effective.
- The motivation is clear.
- The application is novel and the topic of fairness is important.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Some of the references are not up-to-date, for example, [7] FairSeg is accepted by ICLR 2024, and [13] FairDiff is accepted by MICCAI 2024.
- Sec 3.1, does the SAM model pretrained on SLO fundus images? The reference is more like a survey rather than a methodology paper.
- I am curious about whether the comparsion is fair, as the proposed method uses more data (by generating more training data) compared to its competitors. Besides, I am curious about the ratio among different attributes for the synthetic images (Male : Female = 1 : 1?)
- The demographic parity metric requires that the number of samples predicted as positive should be equal among subgroups. Do the authors construct the test set following this rule?
- What are the units in Table 1 and 2? Are DEOdds and Group-wise AUC percentage values?
- Some ablation studies are missing, which makes me hard to distinguish the effectiveness of sample selection strategy.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The comparison between the proposed method and its competitors is not fair. Some of the key experiments is missing so I cannot ensure the effectiveness of each part.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The comparison between the proposed method and its competitors is not fair. The author’s response did not fully address my concerns, so I decline to accept this article.
Review #3
- Please describe the contribution of the paper
This study proposes a diffusion-based generative model (GlaucoDiff) alongside a sample selection mechanism to generate synthetic data for glaucoma disease. The proposed method uses an existing model, ControlNet [14], and constructs a generative model that synthesizes Optic Cup (OC) and Optic Disc (OD) regions by controlling the vertical Cup-to-Disc-Ratio (vCDR) in SLO images. Unlike the existing literature, the generated images are bidirectional as healthy/glaucomatous, and this makes it possible to have full control over the synthesis procedure for the classification task. A sample selection mechanism is also proposed to filter the synthetic images based on the alignments of target OC and OD structures. The experimentation is carried out on two datasets, FairSeg and FairVLMed10k, to demonstrate the performance improvements in accuracy and fairness metrics such as DPD and EOdds.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The study generates OC and OD regions in SLO images in a controlled way over vCDR for the first time. The generated images are also bidirectional making them applicable for modeling glaucoma disease. The proposed sample selection mechanism also helps to pick proper images compatible with OC and OD structures. The experimental results show a significant improvement in classification performance and fairness over existing methods. The experimentation is also supported by two ophthalmologists.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The methodology is not explained in detail. This is relatively short and needs to be expressed comprehensively. Major differences with ControlNet should also be highlighted so that the contribution can be better understood. The leveraging of ControlNet is also not very obvious; there is a preliminary subsection for introducing ControlNet and one statement in the conclusion to highlight ‘ControlNet-guided’. The application of the use case on ControlNet should be explained in a more comprehensive way over Fig. 1 (the training subsection in 2.2 can be enriched). There are also some unclear points in the experimentation such as how the other methods in comparison have been constructed using datasets and the proposed pipeline.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper has novelty in synthetic medical data generation, and this may help to address bias issues in situations where data imbalance exists. The study is a good demonstration of synthetic data generation in the medical domain. The benefits of the study are also verified by performance improvements on two different medical datasets.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
As authors promise for the concerns regarding the model and evaluation to be addressed, the final opinion may be switched to ‘Accept’. It’s important the concerns to be fixed in order of the paper to be clarified and followed/reproduced by future readers.
Author Feedback
We thank reviewers R1, R2, and R4 for their reviews and recognition of our work’s novelty and strengths. Notably, we provided anonymized link to the source code, dataset, or any other dependencies.
Reply to R1#Q1:Our focus is on data imbalance and AI fairness. vCDR, widely used in previous works [17,18,24] as a sole biomarker for glaucoma diagnosis, is followed for its efficiency in large-scale screening rather than precise personalized diagnosis (Gao et al., 2024). Reply to R1#Q2,#Q5:Two ophthalmologists annotated the FairSeg dataset, labeling healthy and suspected glaucoma cases based on clinical guidelines. They assessed vCDR, disc margin clarity, morphology, and disc cupping. The Intraclass Correlation Coefficient (ICC) was calculated to evaluate inter-rater reliability (Koo & Li, 2016), exceeding 90% and confirming strong agreement between raters. Additionally, ophthalmologists evaluated the optic cup and disc regions, assessing margin clarity and vascular structure.The vCDR values of synthetic images also aligned with target values and clinical features of glaucoma and healthy cases. As shown in Fig. 2, the gradual increase in vCDR reflects glaucoma progression, reinforcing the clinical relevance of the generated images. Reply to R1#Q3:To mitigate this challenge, we use ensemble agreement across pre-trained models (UNet [21], SAM[22] TransUNet[23]). The average prediction mask from these models is compared with the vCDR-scaled masked image, and the top 50% with the highest Dice and HD95 scores are selected as the most reliable. Notably, our primary focus is on data imbalance and AI fairness, not segmentation. Reply to R1#Q4,&R2:Fig. 1 shows the details strucutres of GlauciDiff, introduced in Sec. 2.1 such as Stable Diffusion, Condition, and Zero convolution,etc. We will revise Sec. 2.2 to emphase the link between two sections. Reply to R1#Q6:This detailed demographic information for both FairSeg and FairVLMed10k is provided in the shared anonymized link, including “data_summary.csv” and “filter_file.txt”. Reply to R1#Q7,&R2:We reimplement the compared methods using their open-source code, train on the same dataset, and evaluate under the same metrics to ensure a fair comparison. Reply to R1#Q8:The two datasets are distinct via different patient cohorts, time periods, and labeling strategies[6,7], and their demographic attributes (such as race, gender, etc.) are different distributed, thus actually proving our method’s generalizability. Reply to R4#Q1,#Q2:Thank you for pointing this, we will revise the references [7,13,22] in camera-ready version. For Sec 3.1, we pretrained SAM on the FairSeg [7] training set. Reply to R4#Q3,#Q4:The total training samples are consistent across methods. While compared methods use traditional augmentation, ours applies synthetic augmentation, ensuring demographic balance (Male:Female = 1:1). Instead of Demographic Parity (DP), we use DPD and DEOdds. DPD quantifies deviations in positive prediction rates across subgroups, while DEOdds compares TPR and FPR differences without enforcing DP or a balanced test set. Following prior test set construction [6, 7], we performed data cleaning, focusing on quantifying deviations using DPD and comparing error rates with DEOdds. Reply to R4#Q5:All metrics in Tables 1 and 2 are reported as percentages. We will revise the camera-ready version to make it clear, thank you for asking. Reply to R4#Q6:Space constraints prevented inclusion in the manuscript, but our selection strategy improves AUC and F1 by nearly 2% on both datasets compared to not using it. It will be added in the camera-ready version.
References: Gao X R, Wu F, Yuhas P T, et al. Automated vertical cup-to-disc ratio determination from fundus images for glaucoma detection[J]. Scientific Reports, 2024, 14(1): 4494. Koo T K, Li M Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research[J]. Journal of chiropractic medicine, 2016, 15(2): 155-163.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes a new vCDR-conditioned diffusion (GlaucoDiff) model for the synthesis of SLO images for glaucoma diagnosis and fair evaluation. In the post-rebuttal phase, two previously ‘revise’ reviewers shifted to ‘accept’, as they were satisfied with the explanations provided about the technical aspects and evaluation strategy. The rebuttal satisfied most of the concerns relating to clinical grounding (e.g., with regards to the vCDR ranges labelled by an ophthalmologist, and the ICC agreement), fairness metrics, and the data build-up in demographics. However, there are outstanding concerns in overall depth and clarity of the method description - the selection mechanism ablation, and more vulnerability on how ControlNet was integrated, but I still found that clarification on the synthetic augmentation strategy, as well as the fairness framing, improved the paper overall. Given the scope and relevance of the research, as well as improvement following the rebuttal, I ultimately take the majority position - to accept.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper introduces a novel and practical framework for fairness-aware glaucoma image generation with controlled vCDR, addressing an important problem with clear clinical relevance. While some methodological clarifications are needed, the overall idea is strong and personally I find the direction promising—I recommend accept.