Abstract

In the realm of medical image fusion, integrating information from various modalities is crucial for improving diagnostics and treatment planning, especially in retinal health, where the important features exhibit differently in different imaging modalities. Existing deep learning-based approaches insufficiently focus on retinal image fusion, and thus fail to preserve enough anatomical structure and fine vessel details in retinal image fusion. To address this, we propose the Topology-Aware Graph Attention Network (TaGAT) for multi-modal retinal image fusion, leveraging a novel Topology-Aware Encoder (TAE) with Graph Attention Networks (GAT) to effectively enhance spatial features with retinal vasculature’s graph topology across modalities. The TAE encodes the base and detail features, extracted via a Long-short Range (LSR) encoder from retinal images, into the graph extracted from the retinal vessel. Within the TAE, the GAT-based Graph Information Update block dynamically refines and aggregates the node features to generate topology-aware graph features. The updated graph features with base and detail features are combined and decoded as a fused image. Our model outperforms state-of-the-art methods in Fluorescein Fundus Angiography (FFA) with Color Fundus (CF) and Optical Coherence Tomography (OCT) with confocal microscopy retinal image fusion. The source code can be accessed via https://github.com/xintian-99/TaGAT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0115_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/xintian-99/TaGAT

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Tia_TaGAT_MICCAI2024,
        author = { Tian, Xin and Anantrasirichai, Nantheera and Nicholson, Lindsay and Achim, Alin},
        title = { { TaGAT: Topology-Aware Graph Attention Network For Multi-modal Retinal Image Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper design a Topology-Aware Graph Attention Network (TaGAT) for multi-modal retinal image fusion that preserves anatomical structures and fine vessel details better than existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.It is novel to incorporate vascular features in fundus images through graph networks into fusion method. 2.The paper writing is clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.As “The fusion results can not only enhance the visualization and analysis of retinal disease directly by clinicians but also a range of downstream tasks, including vessel segmentation, disease classification, and disease progression monitoring.”, analysis of retinal disease of downstream tasks are necessary. 2.The method relies on segmentation; however, the paper contains minimal description of the segmentation part. For example, it lacks a detailed account of how the segmentation mask images for the two datasets in the experiment were created, manually or other automatic segmentation tools?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.Including some experiments to demonstrate clinical efficacy would make the findings more compelling, particularly regarding vascular segmentation tasks and the diagnosis of vascular-related diseases.

    1. Please consider providing more details about the segmentation and registration methods used in the two datasets.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method of the paper is sufficiently innovative, however, its clinical applicability still requires consideration.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The author explained the segmentation and registration methods used in two datasets in the feedback. And the fusion method is enough novel, but further clinical trials are still needed.



Review #2

  • Please describe the contribution of the paper

    A Topology-Aware Graph Attention Network (TaGAT) is proposed for multi-modal retinal image fusion. The key component is a Topology-Aware Encoder (TAE) that leverages Graph Attention Networks (GAT) to enhance spatial features with the consistent graph topology of retinal vasculature across modalities. The proposed method performs well on two tasks: FFA-CF and OCT-Confocal retinal image fusion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The motivation of utilizing vessel information for cross-modality image fusion is reasonable and convincing.

    Good visualization in Fig.2. For example, I can see the PPA region in the fusion image that does not exist in the fundus image (first row). However, I suggest pointing it out in the paper for those who do not have the background knowledge of fundus imaging.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method requires registration between two modalities. It also requires an additional vessel segmentation model. Apparently, the performance deeply relies on the quality of registration and vessel segmentation, which is not discussed and quantitatively verified. The authors didn’t mention how they trained/acquired the vessel segmentation network either.

    The narrative needs more clarification. For example, I_f is used in equation (10) without declare. Is it the outputted fused image or the ground-truth fused image? Is there any ground-truth fused image that is used in L_int^II (eq.10)? Is the ground-truth fused image directly acquirable in the dataset for training and evaluation?

    I acknowledge the difficulty of collecting paired cross-modality datasets. However, the dataset seems too small (only 40 for training, 19 for test).

    Most importantly, the authors indicate the role of image fusion in downstream tasks, including disease classification, segmentation, etc. However, there are only image-quality metrics in the experiments, which is not direct enough. Authors are suggested to evaluate how fused images are beneficial to downstream tasks. For example, on DRFF (or other larger datasets), train a simple normal/abnormal classification network (ResNet50 or whatever) giving CF input, FFA input, CF-FFA channel concatenated input, and CF-FFA fusion input. We should expect better classification performance for CF-FFA fusion images as input than other input forms.

    Last, is there any need to use both ViT and CNN in LSR? In ablation study, authors are suggested to try CNN only and ViT only, and compare them to ViT+CNN (proposed).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please address the weaknesses mentioned above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting, but the paper needs more clarification and experiments, as mentioned in the weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Authors has addressed most of my comments.



Review #3

  • Please describe the contribution of the paper

    • This paper introduces an end-to-end framework of topology-aware graph attention network for multi-modal retinal image fusion. • A GAT-based Topology-Aware Encoder is devised, which enhances feature representation and model generalization and ensures the preservation of important anatomical structures and fine vasculature details in the retinal image fusion. • The proposed method achieves leading performances in retinal image fusion evaluated on both the DRFF(FFA-CF) and OCT2Confocal datasets, with exceptional preservation of fine structures, details, and textures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • This paper introduces an end-to-end framework of topology-aware graph attention network for multi-modal retinal image fusion. • A GAT-based Topology-Aware Encoder is devised, which enhances feature representation and model generalization and ensures the preservation of important anatomical structures and fine vasculature details in the retinal image fusion. • The proposed method achieves leading performances in retinal image fusion evaluated on both the DRFF(FFA-CF) and OCT2Confocal datasets, with exceptional preservation of fine structures, details, and textures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • In Fig.1, the authors should explain the abbreviations. Also, The structure of Graph Fusion Layer should be explained. • The dataset is very small, so how does the author handle the training data? How to maintain spatial alignment of data features during training? The authors should describe these issues in detail. • By Table 2, we can see that the GCN can improve the performance greatly, thus, I suggest the authors that more GCN methods should be compared to verify the effectiveness of TaGAT.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    • In Fig.1, the authors should explain the abbreviations. Also, The structure of Graph Fusion Layer should be explained. • The dataset is very small, so how does the author handle the training data? How to maintain spatial alignment of data features during training? The authors should describe these issues in detail. • By Table 2, we can see that the GCN can improve the performance greatly, thus, I suggest the authors that more GCN methods should be compared to verify the effectiveness of TaGAT.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces an end-to-end framework of topology-aware graph attention network for multi-modal retinal image fusion. The experimental results verify the effectiveness of the proposed module.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank all reviewers for their constructive comments and for acknowledging the novelty and effectiveness of incorporating vascular features through graph networks into fundus image fusion.

1.Clarify vessel segmentation and registration details(R#1-2 R#3-1 R#4-2):We employed an automatic optimal transport-based graph matching method[14] for retinal image registration (see Sec 3.1, page 6 line 17). We implemented the 2D version of this method, which includes wavelet-based vessel segmentation utilized for both DRFF and OCT2Confocal datasets. For OCT-Confocal, due to its complexity, registration was manually performed and validated by ophthalmologists. We have unfortunately omitted to cite again[14] in the Methodology section and in hindsight these details were not clear enough either but we will correct this in the revised manuscript.

2.Verify the clinical efficacy and downstream tasks such as vessel segmentation and disease classification(R#1-1 R#3-4):We appreciate the suggestions for additional experiments, which align with our research goals. However, due to conference rebuttal policies, new/additional experimental results cannot be included in this rebuttal and in the camera-ready version of the manuscript upon eventual acceptance. Our statement in the Introduction regarding image fusion benefitting downstream tasks such as vessel segmentation and disease classification is supported by numerous existing publications such as [1] and “A review: deep learning for medical image segmentation using multi-modality fusion,” along with findings in related fusion tasks([9],[12],[23],[24]). These are not the primary aims of this study, which focuses on enhancing fusion outcomes. Validation of downstream applications is thus beyond the scope of this submission but is nevertheless part of our current work.

3.Small Size Dataset Handling(R#3-3 R#4-2): For DRFF dataset, we have in fact used data augmentation techniques including flipping, rotating by ±8 degrees, and translating by ±20 pixels to enhance diversity and representativeness.

4.Ablation Study and Model Component Justification(R#3-5):Regarding the use of CNN+ViT in LSR, we did explore both components individually. However, due to space limitations and since it does not constitute our primary research focus, detailed results were not presented. The advantages of combining CNN&ViT are however well documented, e.g. in reference[23], including via ablation studies, and showed significant improvement in various fusion applications including MRI-CT, MRI-PET, and Visible-Infrared.

5.Compare more GCN methods to validate TaGAT effectiveness(Clarify on GCN vs GAT Impact in Table 2)(R#4-3):Table 2 illustrates our ablation study, specifically ‘V GAT→GCN’, where we replaced GAT[17] with GCN[7] to demonstrate the effectiveness of the additional dynamic attention mechanisms. The intention was not to showcase GAT’s performance improvement over GCN but to highlight the advantages of attention mechanisms by removing them from the system. The descriptions in Table 2 that may cause misunderstanding will be clarified in the final version. Exploring various forms of GCN is a good suggestion and aligns with our future work.

6.Narrative Clarity and Technical Details(R#3-2 R#4-1):For R#3-2, “I_f” in Equ (10) represents the fused image produced by our model, not a ground-truth image. We will clearly state that no ground-truth fused images exist, as the fusion process inherently generates new images from the input modalities. For R#4-1, the Graph Fusion Layer(F_G) processes graph features from two modalities using multiple instances of an INN[3]. Each layer handles two halves of the input tensor separately, progressively refining and then concatenating them to form the fused graph feature.

We will ensure that all permitted corrections and improvements are included in final version. The code and data will be made available upon acceptance, as already specified in our original submission.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers agree on the novelty of including segmentation information in the process of retina image fusion. Although the reviewers have some concerns about the size of the dataset and downstream validation, I believe the strengths outweigh the weaknesses.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers agree on the novelty of including segmentation information in the process of retina image fusion. Although the reviewers have some concerns about the size of the dataset and downstream validation, I believe the strengths outweigh the weaknesses.



back to top