Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Optical coherence tomography (OCT) enables detailed visualization and critical segmentation of retinal layers, which is essential for ophthalmological diagnosis. However, the development of automatic segmentation methods has been hindered by limited annotated datasets due to time-consuming manual labeling processes. Therefore, we propose RetiDiff, a three-stage diffusion model-based framework to synthesize realistic annotated OCT retinal images for enhancing segmentation performance. By leveraging the diffusion model, RetiDiff can synthesize diverse and realistic images guided by segmentation masks. To improve synthesis quality and accuracy in pathological regions, we introduce dynamic region masking (DRM), which selectively modifies pathological areas during training. To align the continuous outputs from mask sampling in the diffusion model with discrete segmentation labels, we propose discrete mask clustering (DMC), which converts these outputs into discrete values consistent with the labels. Experimental results show that RetiDiff effectively mitigates data scarcity by synthesizing realistic and diverse annotated OCT retinal images, which substantially enhance retinal layer segmentation performance. Compared to state-of-the-art methods, RetiDiff-synthesized datasets improve the average Dice score by 8.7% across all retinal layers, with a particularly notable increase of up to 53.8% in pathological regions. The code and dataset are publicly available at: https://github.com/MaybeRichard/RetiDiff

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1518_paper.pdf

SharedIt Link: https://rdcu.be/eHwOn

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_49

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MaybeRichard/RetiDiff

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiSic_RetiDiff_MICCAI2025,
        author = { Li, Sicheng AND Dan, Mai AND Chu, Yuhui AND Yu, Jiahui AND Zhao, Yunpeng AND Zhao, Pengpeng},
        title = { { RetiDiff: Diffusion-based Synthesis of Retinal OCT Images for Enhanced Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {516 -- 525}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a novel three-stage conditional OCT image synthesis framework, introducing two key components: Dynamic Region Masking (DRM) to handle pathological regions during generation, and Discrete Mask Clustering (DMC) to transform continuous masks into discrete condition labels. The effectiveness of the proposed method is demonstrated through significant improvements in downstream segmentation performance using the synthesized data.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Model design: The introduction of DRM and DMC is well-motivated and thoughtfully integrated into the overall architecture. These components effectively address the challenges of pathological region representation and mask discretization in conditional generation.
2. Comprehensive experiments: The experimental setup is thorough, including comparisons with external methods, ablation studies to validate component effectiveness, and evaluation on downstream segmentation tasks. The design of the experiments is sound and aligns with the paper’s objectives.
3. Convincing results: The results are compelling, particularly in downstream segmentation, where substantial improvements in Dice scores are observed. Visualizations are clear and detailed, supporting the quantitative findings effectively.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. In the external comparisons, the authors benchmark against Retree and LDM, and claim they are SOTA models. However, Retree is primarily designed for fundus image synthesis, and LDM is a general-purpose image generation model. It is unclear why these are considered SOTA for OCT synthesis. The rationale behind selecting these models as baselines should be clearly justified. Are there no stronger or OCT-specific generative models available for comparison?
2. Unfair comparison in downstream segmentation: In evaluating downstream segmentation performance, comparisons such as R/S(0/1000) vs. R/S(0/60) or R/S(60/0) are conducted using the same number of training epochs. However, this introduces a confound: the apparent advantage of R/S(0/1000) may stem from its significantly larger training dataset. A fairer comparison would be between R/S(0/1000) and R/S(1000/0)—e.g., by oversampling the real data to match the synthetic data quantity. This adjustment is necessary to isolate the true effect of synthetic data quality.
3. In Figure 2, the positioning of subfigure titles (A and B) is inconsistent and somewhat awkward. It is recommended to standardize their placement, preferably in the top-left corner of each subfigure for clarity and consistency.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces a conditional OCT synthesis method with well-reasoned innovations and strong empirical support. The proposed DRM and DMC components address key limitations in conditional generation and contribute meaningfully to improved downstream segmentation. While some concerns remain, particularly regarding the fairness of experimental comparisons and the choice of baseline models.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper proposes RetiDiff, a novel three-stage diffusion model framework for synthesizing realistic and diverse annotated OCT retinal images to improve retinal layer segmentation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A diffusion-based image synthesis pipeline guided by segmentation masks to address the scarcity of annotated OCT datasets. This can help avoid the need of manually labeling OCT images.
2. The paper proposed Dynamic Region Masking (DRM) to enhance synthesis quality in pathological regions by selectively modifying them during training. And the proposed Discrete Mask Clustering (DMC) can convert continuous diffusion outputs into discrete segmentation labels for consistency.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. No comparison with real human-annotated images. It’s unclear how closely the generated masks match real annotations.
2. For stage 2 and 3, the DME dataset is too small. This testing setting sample size is not sufficient enough
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Additional datasets would be great to validate the effectiveness and generalizability of the synthesized images.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

To address the limitation of annotated datasets in medical image segmentation tasks, the authors propose RetiDiff, which introduces two key modules: Dynamic Region Masking (DRM), which selectively modifies pathological regions, and Discrete Mask Clustering (DMC), which transforms the output into discrete values consistent with ground truth annotations. In the application of retinal layer segmentation, RetiDiff achieves an average Dice score improvement of 8.7% on the synthesized dataset compared to existing state-of-the-art methods, with a particularly significant improvement of 53.8% in the segmentation of IRF pathological regions. These results demonstrate the effectiveness of the method in addressing the scarcity of annotated data and confirm its utility in segmentation tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed RetiDiff is structured into three distinct stages: pre-training, mask training, and image fine-tuning. These stages progressively enhance the model’s representational capacity and generative performance. The design is logically organized, making it intuitive to understand and implement.
2. In RetiDiff, the two key modules, Dynamic Region Masking (DRM) and Discrete Mask Clustering (DMC), are specifically designed to address the challenges in generating fluid regions and aligning the output with discrete labels, respectively. These modules effectively improve the performance of the generative model.
3. In RetiDiff, two key modules, Dynamic Region Masking (DRM) and Discrete Mask Clustering (DMC), are specifically designed to address distinct challenges. DRM focuses on the generation of fluid regions during synthesis, while DMC ensures alignment between continuous diffusion outputs and discrete segmentation labels. These components collectively enhance the performance of the generative model.
4. RetiDiff enables the synthesis of a large number of diverse and realistic annotated OCT images, which in turn enhances the performance of segmentation models. This approach effectively addresses the performance limitations in OCT image segmentation caused by the scarcity of annotated data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Is there a possibility that the generated images may introduce artifacts or anatomically implausible structures? How do the authors ensure the clinical reliability of the generated images?
2. The authors report relative improvements in Table 2. It is recommended to explicitly clarify this in the manuscript.
3. Please include clear legends and annotations in Figure 1 to indicate the meaning of different colors and symbols.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well motivated and proposes a compelling method for retinal OCT image generation. Apart from some issues pointed out, the method is described in sufficient detail, and the effectiveness of the generated images in enhancing segmentation performance is validated on the DUKE dataset.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the editors and reviewers for their valuable comments, time and effort. R1-W1：Synthetic images may introduce artifacts or anatomically implausible structures. To ensure clinical reliability, we employ two complementary approaches: (I) Quantitative quality assessment using established metrics (Fréchet Inception Distance, FID; and Learned Perceptual Image Patch Similarity, LPIPS) to compare synthetic and real images. (II) Downstream task validation demonstrating that RetiDiff, when trained on synthetic data, achieves excellent segmentation performance on real clinical test images. This dual validation confirms that our synthetic images maintain clinically acceptable anatomical fidelity and that the knowledge derived from synthetic data can be transferred to real-world clinical applications. R2-W1: We selected LDM and Retree as baselines for three key reasons: (I) LDM provides a robust diffusion model benchmark reproducibility across medical imaging tasks. (II) Current OCT-specific generative models are limited - existing methods either focus on volumetric synthesis rather than B-scans [1-2] or lack open-sourced implementations [3]. (III) While Retree was developed for fundus imaging, it specifically addresses mask-guided image generation, making it relevant for our task despite the modality difference. Our selection balances task relevance, implementation reproducibility, and comparative value in the absence of directly comparable OCT-specific models. R2-W2: Addressing data scarcity is precisely our paper’s core contribution. The DUKE DME dataset contains only 110 annotated images total (66 for training), reflecting the real-world challenge of obtaining expert annotations. To address this challenge fairly within these constraints, our experimental design empolys: (I) R/S(66/0) vs. R/S(0/66): A controlled comparison with identical mask distributions to directly evaluate synthetic image quality. (II) R/S (0/1000): Demonstrating our method’s key benefit - scaling beyond manual annotation limitations. The reviewer’s suggested (R/S(1000/0)) comparison is challenging to implement, as obtaining 1000 expert-annotated OCT images would be prohibitively expensive and time-consuming - this fundamental limitation is exactly what our method aims to overcome. R3-W1: We evaluate generated mask quality indirectly through downstream task performance. Segmentation models trained exclusively on our synthetic data perform excellently on real expert-annotated test data, suggesting strong alignment between synthetic and real annotations. In future work, we plan to conduct direct evaluations through expert assessment studies comparing synthetic and real data quality. R3-W2: The scarcity of annotated data is a core challenge addressed by our work. In OCT retinal segmentation, even publicly available datasets typically contain only approximately 100 annotated images, which is insufficient for robust model training and testing. To overcome this limitation, our three-stage approach incorporates pretraining on large-scale unannotated data (84,484 images) to establish robust feature representations before fine-tuning, this approach transfers knowledge from large-scale unannotated data to the labeled domain, effectively addressing the annotation bottleneck while maintaining applicability. R1-W2&W3, R2-W3: We will refine the manuscript based on these suggestions in the final version. [1] Danesh, Hajar, et al. “Synthetic OCT data in challenging conditions: three-dimensional OCT and presence of abnormalities.” Medical & biological engineering & computing, 2022. [2] Huang, Kun, et al. “Memory-efficient high-resolution oct volume synthesis with cascaded amortized latent diffusion models.” International Conference on Medical Image Computing and Computer-Assisted Intervention, 2024. [3] Wu, Yuli, et al. “Retinal OCT Synthesis with Denoising Diffusion Probabilistic Models for Layer Segmentation.” IEEE International Symposium on Biomedical Imaging, 2024.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

RetiDiff: Diffusion-based Synthesis of Retinal OCT Images for Enhanced Segmentation

Author(s):