Abstract

Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF images. Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF im- ages. We introduce an autoregressive GAN for smooth, memory-saving frame-by-frame FFA synthesis. To enhance the focus on dynamic le- sion changes in FFA regions, we design a knowledge mask based on clinical experience. Leveraging this mask, our approach integrates inno- vative knowledge mask-guided techniques, including knowledge-boosted attention, knowledge-aware discriminators, and mask-enhanced patch- NCE loss, aimed at refining generation in critical areas and addressing the pixel misalignment challenge. Our method achieves the best FVD of 1503.21 and PSNR of 11.81 compared to other common video generation approaches. Human assessment by an ophthalmologist confirms its high generation quality. Notably, our knowledge mask surpasses supervised lesion segmentation masks, offering a promising non-invasive alternative to traditional FFA for research and clinical applications. The code is available at https://github.com/Michi-3000/Fundus2Video.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1347_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1347_supp.pdf

Link to the Code Repository

https://github.com/Michi-3000/Fundus2Video

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_Fundus2Video_MICCAI2024,
        author = { Zhang, Weiyi and Huang, Siyu and Yang, Jiancheng and Chen, Ruoyu and Ge, Zongyuan and Zheng, Yingfeng and Shi, Danli and He, Mingguang},
        title = { { Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces Fundus2Video, a novel method for generating dynamic Fundus Fluorescein Angiography (FFA) videos from static Color Fundus (CF) images. This is achieved using an autoregressive Generative Adversarial Network (GAN) that incorporates a knowledge mask derived from clinical insights to focus on regions with significant lesion changes. The method aims to provide a non-invasive alternative to traditional FFA, which is invasive and less accessible.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Innovative Approach: The paper presents a pioneering approach to dynamic FFA video generation, which is a significant advancement in ophthalmic imaging. The use of an autoregressive GAN for frame-by-frame synthesis is novel and addresses the challenge of capturing the entire FFA process.
    2. Clinical Knowledge Integration: The incorporation of a knowledge mask based on clinical experience is an innovative way to guide the model to focus on areas of clinical significance, such as lesions and blood vessels, without requiring manual labeling.
    3. The method’s performance is evaluated using a comprehensive set of metrics (FVD, SSIM, PSNR, LPIPS) and human assessment, demonstrating its superiority over existing image-to-video translation methods.
    4. Potential Clinical Application: The non-invasive nature of the proposed method could make it a valuable tool for research and clinical applications, offering a safer alternative to traditional FFA.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The generalizability appears to be limited, as the proposed method is specialized and may only be applicable to the conversion from fundus photographs to FFA videos, rendering it unsuitable for other medical images and videos. This limitation constrains the method’s influence and value.
    2. The comparison involves too few methods, only two; it would be beneficial to include additional state-of-the-art methods under the same conditions for a more comprehensive comparison.
    3. The innovation is relatively modest; although it represents a new application scenario, as there appears to be no publicly available dataset pairing FFA videos with fundus photographs, the components of the proposed method are essentially existing approaches with slight modifications.
    4. The dataset does not seem to be intended for public release, which significantly reduces the significance and contribution of this work, as it prevents replication and hinders further exploration by other researchers in the community.
    5. There are several typographical errors; for instance, the abstract mistakenly references “FID” instead of “FVD”, among other typos and errors in expression that could be refined. It is advisable for the authors’ team to further polish and revise the manuscript.
    6. There is no analysis or experimental exploration of the hyperparameters, such as the settings for various lambdas. Additional experiments and discussions on this topic are recommended.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is recommended that the dataset and code be fully disclosed to enhance reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to the weaknesses section and make targeted improvements and responses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evaluation is based on the overall degree of methodological novelty and reproducibility, as well as the quality of the paper’s presentation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors responded to and resolved some of my concerns; however, the novelty and reproducibility of the work remain somewhat constrained.



Review #2

  • Please describe the contribution of the paper

    This work introduces a method to generate dynamic FFA videos from static CF images using an autoregressive GAN and a knowledge mask derived from clinical expertise, achieving superior quality compared to existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.This paper introduces Fundus2Video, an autoregressive GAN architecture tailored for frame-by-frame FFA video synthesis from CF images. 2.A knowledge mask is proposed to improves generation in areas like lesions and blood vessels.

    1. The comparison with existing methods demonstrates the effectiveness of the proposed approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.The task of this paper is to generate videos from images, yet the introduction does not discuss the currently popular diffusion models. Why did the paper opt for traditional GAN networks instead of the well-validated video diffusion models? 2.The method mentioned in the paper for generating the key knowledge mask relies on frame differencing to compute a binary mask. Although this approach is simple and convenient, it is prone to noise and highly sensitive to parameters. Additionally, this method may only be applicable to the specific type of images mentioned in the paper. The paper should include a more comprehensive discussion of this aspect. 3.The authors propose in the introduction that the method presented in this paper can be extended to other images, such as MRI. However, the paper does not validate the generalization of the method in the experiments.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper mentions that the code will be made publicly available in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors should include a discussion comparing their generation model with diffusion models. Additionally, they should discuss the inevitable noise impact during the generation process of the knowledge mask.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper aims to generate FFA videos focusing on lesion changes from CF images. The proposed knowledge mask, attention mechanism based on the mask, and contrastive learning are demonstrated to be effective in experiments.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I am satisfied with the author’s reply and have no further concerns.



Review #3

  • Please describe the contribution of the paper

    The authors proposed a conditional generative adversarial network named Fundus2Video, which can take Fundus images and Fundus Fluorescein Angiography (FFA) images from earlier time points to generate FFA images for future time points. To aid with the generation process, the authors utilize unsupervised knowledge-mask-guided training. The mask was generated from existing FFA videos by computing the absolute difference between the FFA frames. The author argues this mask contains the relative pathological information generally labeled by clinical experts and is thus significant for generating video for this GAN architecture. The authors validated their work with human clinician assessments and compared the proposed techniques with other architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A. The authors addressed a quintessential problem of generating FFA videos using both pathological information (mask guidance) and multi-time point context information (earlier FFA images) retinal vasculature using an ensemble of generators and discriminators to address this.

    B. Multiple losses were used to make the results more realistic in terms of containing vascular structure, anomalies, and perfusions. A core strength of this paper is utilizing contrastive mask-guided learning to capture the local region-of-interests.

    C. The qualitative visualization of the generated FFA videos is significant for clinical application, and it can help substitute the need for interventional fluids in the retinal subspace.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A. The author mentioned in Section 3, Implementation Details., that they have split the dataset into validation and test sets. However, no evaluation table has been provided for the validation except for evaluating the test split.

    B. In Table 1, the authors provided the effects of different loss functions. However, it is unclear if “Mask-enhanced PatchNCE loss” is supervised, unsupervised, or combined. It would be better if they provided both separately. Moreover, beside each loss function title name, please put the “symbols” of the loss function, similar to equations 1 to 6, making it easier to discern and correlate.

    C. In section 3, Human Assessment, the authors quoted the conclusive assessment by the clinicians as a score of 2.12, indicating good overall quality of the videos. However, a mean and standard deviation score breakdown would give a better estimate. Moreover, similar results are warranted for the other two methods for comparison, namely, Seg2Vid and Med-ddpm. A better representation could be shown with a boxplot for the final submission.

    D. The authors evaluate their work on a single private dataset and do not provide an anonymous code repository link for reproducibility.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    See point D of Section 6 (Weakness of the paper).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Here are a few suggestions to improve the paper, A. Figures should be self-explanatory. The message remains unclear to the reader if it requires a large caption.

    B. The main paper could expand upon the Human assessment by providing a result table for different models (in terms of mean +/- standard deviation). This would have illustrated the model’s effectiveness in real-world applications better.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the proposed architecture, which tries to address and solve the FFA video generation from multi-time points data, and detailed experimentations that validate the model’s performance, I recommend this paper to be accepted. A notable strength of this paper is the importance of mask-guided contrastive loss functions that focus on regions of interest in the retinal subspace. The authors are suggested to address the points mentioned in the weaknesses to strengthen their work and reinforce the paper’s argument.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I am content with the author’s response and my rating will remain the same.




Author Feedback

The authors would like to thank all reviewers (R1, R4, R5) for their valuable and constructive comments. In this rebuttal, we address the key concerns raised by the reviewers.

(R1) Human assessment. Regarding our own model, the final score is 2.12 (+/-1.07), and we will include the standard deviation in the final version. However, we observed that the visualization results of the other two methods, Seg2Vid and Med-ddpm, did not perform as well, as they were not tailored for this specific task. Therefore, they may not provide a meaningful comparison for human evaluation.

(R4) Why not diffusion models. We have compared our model with a recently proposed diffusion model, Med-ddpm, and the results demonstrate the superiority of our approach as shown in Table 1. We will discuss diffusion models related to our studied problem further in the introduction of the revised paper.

(R4) Noise impact in knowledge mask. We conducted comparative experiments with different thresholds for knowledge masks. Due to space constraints, these results were not included in the paper. Currently, we have selected hyperparameters that ensure the best generation results. Our future work will further analyze the impact of noise in the mask. While we acknowledge the presence of noise, our simple method for obtaining masks has effectively covered various scenarios that contain diverse types of noise in our experiments, and the results demonstrated that introducing the proposed knowledge mask can enhance performance, consistently.

(R5) Application value and data release. We would like to emphasize that our main contribution lies in the application studies including work focusing on translation. Specifically, we introduced a novel problem of generating dynamic FFA videos from static fundus images, which has clinical significance and represents a key innovation. As there were no existing datasets or methods for this problem, we collected our own data and designed clinical knowledge masks and the entire network architecture. The dataset’s creation was not the focus of our task, and due to privacy concerns, it cannot be made public, but we promise to release our codes in our final version. Regarding generalization ability, our model’s adaptable structure not only addresses the novel task of FFA video generation, but also has the potential to extend to video generation for other similar modalities. We plan to explore these extensions in future work, building on the demonstrated feasibility of using GANs for image translation across different modalities, such as from fundus images to ICGA images.

(R5) Additional experiments. Regarding the suggestion to include experiments with additional hyperparameters, we have done experiments with different thresholds for the knowledge mask and different lambdas. Due to space limitations, we only presented the results of the best settings. As for comparing with more novel models, we have chosen Seg2Vid (2019) based on optical flow and Med-ddpm (2023) based on the diffusion model, both of which we consider sufficiently representative in the video generation field. We will include more comparison methods in the final version.

(R1&R4&R5) Expression issues & typos. We will correct and improve unclear or incorrect expressions in the final paper. R1: Regarding the validation split, we used its results during training to select the best epoch, but the fair evaluation is on the test split, hence we did not present results from the validation split. For Table 1, we will add a column indicating whether ground truth was used and symbols of different loss functions. For Figures, we will either add more legends or move unnecessary title descriptions to the main text to shorten captions and make figures self-contained. R4: For the concern on the words “extended to other images” in the introduction, considering our current experiments did not cover this, we will rectify it. R5: We will thoroughly proofread the paper and correct any typos.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    In rebuttal, authors highlight the clinical relevance and potential of the approach of generating dynamic FFA videos from static CF images using an autoregressive GAN. All reviewers agree to accept the paper after rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    In rebuttal, authors highlight the clinical relevance and potential of the approach of generating dynamic FFA videos from static CF images using an autoregressive GAN. All reviewers agree to accept the paper after rebuttal.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Accept as highlighted by all the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Accept as highlighted by all the reviewers.



back to top