Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Automated diabetic retinopathy (DR) lesion segmentation aids in improving the efficiency of DR detection. However, obtaining lesion annotations for model training heavily relies on domain expertise and is a labor-intensive process. In addition to classical methods for alleviating label scarcity issues, such as self-supervised and semi-supervised learning, with the rapid development of generative models, several studies have indicated that utilizing synthetic image-mask pairs as data augmentation is promising. Due to the insufficient labeled data available to train powerful generative models, however, the synthetic fundus data suffers from two drawbacks: 1) unrealistic anatomical structures, 2) limited lesion diversity. In this paper, we propose a novel framework to synthesize fundus with DR lesion masks under limited labels. To increase lesion variation, we designed a learnable module to generate anatomically plausible masks as the condition, rather than directly using lesion masks from the limited dataset. To reduce the difficulty of learning intricate structures, we avoid directly generating images solely from lesion mask conditions. Instead, we developed an inpainting strategy that enables the model to generate lesions only within the mask area based on easily accessible healthy fundus images. Subjective evaluations indicate that our approach can generate more realistic fundus images with lesions compared to other generative methods. The downstream lesion segmentation experiments demonstrate that our synthetic data resulted in the most improvement across multiple network architectures, surpassing state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4059_paper.pdf

SharedIt Link: https://rdcu.be/dY6fC

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_8

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4059_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Fen_Diversified_MICCAI2024,
        author = { Feng, Xiaoyi and Zhang, Minqing and He, Mengxian and Gao, Mengdi and Wei, Hao and Yuan, Wu},
        title = { { Diversified and Structure-realistic Fundus Image Synthesis for Diabetic Retinopathy Lesion Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {77 -- 86}
}

Reviews

Review #1

Please describe the contribution of the paper

This manuscript proposes a framework incorporating a learnable module to generate anatomically plausible masks for lesion inpainting on retinal fundus images, rather than directly transferring lesions without regard to whether the transferred lesions appear in physiologically reasonable locations. This is performed using a conditional diffusion model that first synthesizes the lesion map mask, before inpainting with an encoder-decoder model is done. Experiments were performed on two public diabetic retinopathy (DR) segmentation datasets, compared against various other lesion augmentation methods on a Dense UNet base model.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Learnable module for anatomically-coherent lesion masks
- Excellent results for both qualitative and quantitative image quality for generated images
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Lack of (clinical) justification/explanation for plausible lesion positions
- Lack of detail on how the synthetically-generated data is used to train the final segmentation model
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. In Section 2.1, it is stated that “The distribution and structure of DR lesions on fundus images are closely related to their structures, namely the ROI, OD, and VE”. However, it is not clear how the conditional diffusion model is trained to generate plausible lesion maps - in particular, was fully-annotated (pixel level) training data of the retinal images for OD/VE/various lesions used as training data?
2. In Section 3.2, details on the subjective evaluation might be expanded - how many ophthalmologists were involved? Were they informed that the dataset was evenly split between real and generated images?
3. In Section 3.4, it is unclear how the generated images were used in the training of the Dense UNet model, for the various methods. Was all of the training data used generated, or were real images also included in some proportion? An analysis of the effect of the ratio of real to generated images would be informative.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The strong performance of the diffusion method is somewhat undercut by the lack of detail on how the generated images were used in training the final segmentation model.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors propose a data augmentation method in the context of DR lesion segmentation. The method first generates anatomically plausible masks that are subsequently used as a condition to generate targeted pathology that is “inpainted” into otherwise healthy images. Image generation is realised with a conditional diffusion model. The method is well evaluated on two distinct datasets, against several appropriate reference methods and in the context of two distinct segmentation approaches. The proposed method achieves consistent improvements over the reference.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the method is evaluated against two rather recent reference methods (RetinaGAN [8], SDM [15]) in terms of realism of generated images by human review. The method compares favorably (Table 1).
- the proposed augmentation is superior to a comprehensive set of alternative data augmentation approaches when combined with two different (Dense U-Net / CNN, Trans2U-Net / Transformer-based) segmentation approaches
- an ablation study confirms the benefit of LTA
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The method description, in particular the interaction and training of the two stages, is challenging to follow since the description lacks clarity and rigor. E.g., symbols such as x_t are not formally introduce but rather need to be searched for in Fig 2. Also what is part of the training data (e.g. the structure segmentation “s” I assume) is only conveyed implicitly.
- Proposing a learnable training augmentation (LTA) module by dropping connected components seems intuitive. However, this section is also quite ambiguous to follow since symbols are not formally introduced. E.g., are we dropping components from y_T or y~? Also if we aim to minimize LPIPS wouldn’t it be best to drop all pathologies/regions and have an empty y such that the mask m becomes trivial?
- There are no statistical tests to confirm significance of the findings
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

There is no indication that code will be made available. As the paper lacks details (e.g. regarding architectures, convergence criteria) reproducibility is not ensured.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- Several typos: e.g. p5 “architecture, This module”; “augment the for the”; “in Our training”
- Table caption should include more details on their content, e.g. Table 1 should contain how many images were used for that test, and that those numbers are based on human expert review, Table 2 should state that the reported measures are Dice scores.
- see 6) for a clarification of symbols. In particular for the description of the LTA module.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall the paper is well written, the method intuitive and well motivated, evaluation is convincing and consistently confirms the approach’s potential.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposes a framework for synthesizing fundus images with diabetic retinopathy (DR) lesions. The method, which is based on conditional diffusion models, consists of two stages: (I) a learnable mask generation module that generates anatomically plausible masks, conditioned on a set of masks from different retinal structures; and (II) an inpainting strategy that generates lesions on a healthy fundus image only within the mask area generated in the first stage, conditioned on the generated masks. Using subjective and objective evaluations on two public datasets, the authors show that the proposed method can generate more realistic fundus images than other generative methods, and that the synthetic data generated can be used as data augmentation to improve lesion segmentation performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method is well motivated.
- The idea is simple and effective.
- The experiments and the results for using the method for data augmentation are convincing.
- The quality of the generated images is very good.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors do not discuss related work on conditional diffusion models (e.g., Saharia et al., 2022; Zbinden et al., 2023), which makes it difficult to understand the specific contribution of the paper and the proposed method.
2. The method has high data requirements. To generate an image, 4 different inputs are required: 3 masks (region of interest, retinal vessel, and optic disc) and a healthy fundus image. I find the requirement for the three segmentation masks particularly strict, as it would require either manual annotation or pre-trained models. This makes the method less practical for many applications, and increases the cost of training the model.
3. Related to the previous point, there is no mention of how the masks for the experiments were obtained. This information is important to understand the practicality of the method and its reproducibility.
4. Unclear explanation of the method. The method diagram in Figure 2 shows only the inference process, not the training process, so there is no graphical explanation of how the model is trained. This may make it difficult for readers unfamiliar with diffusion models to understand the training process. Also, I think the organization of the Method section is not adequate.
5. It is not clear to me how the LPIPS score is calculated for synthetic images. LPIPS is calculated using a reference image with the image to be evaluated. However, if I understand correctly, there is no reference available for synthetic images, since they are generated from healthy samples.
6. The subjective evaluation is well conducted, but limited, as very few samples (40) were presented to the evaluators. This makes me wonder about the significance of the results.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

The paper is largely reproducible, as most of the implementation details are provided and the datasets are publicly available. However, the lack of information on how the masks were obtained is a limitation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. The authors should discuss related work on conditional diffusion models (e.g., Saharia et al., 2022; Zbinden et al., 2023) and outline the operation of these methods to better contextualize the proposed method.
2. I think the paper would benefit from including a discussion of the data requirements for the method and how they limit its practical application or generalization to other tasks or datasets.
3. A clear explanation of how the masks for the experiments were obtained should be provided.
4. The authors should provide a more complete and clear diagram of the method, including the training process. In addition, I think a reorganization of the Method section would help to make the explanation easier to follow. In particular, I think an overview of the method and the notation (including the dataset) should be presented first, followed by the two stages. Also, the description of the data set should include the “structure” for defining pairs.
5. Clarify how the LPIPS score is calculated for synthetic images.
6. The authors should consider increasing the number of samples in the subjective evaluation to improve the significance of the results.
References:
- Saharia, Chitwan, et al. “Image super-resolution via iterative refinement.” IEEE transactions on pattern analysis and machine intelligence 45.4 (2022): 4713-4726.
- Zbinden, Lukas, et al. “Stochastic segmentation with conditional categorical diffusion models.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a well-motivated and effective method for synthesizing fundus images with diabetic retinopathy lesions. However, the paper has some weaknesses that need to be addressed, such as the lack of discussion of related work on conditional diffusion models and the high data requirements, the presentation, and missing information about the data and the evaluation. Addressing these issues would improve the quality of the paper and make it more suitable for publication.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

Diversified and Structure-realistic Fundus Image Synthesis for Diabetic Retinopathy Lesion Segmentation

Author(s):