Abstract

Accurate deep learning-based segmentation of retinal arteries and veins (A/V) enables improved diagnosis, monitoring, and management of ocular fundus diseases and systemic diseases. However, existing resized and patch-based algorithms face challenges with redundancy, overlooking thin vessels, and underperforming in low-contrast edge areas of the retinal images, due to imbalanced background-to-A/V ratios and limited contexts. Here, we have developed a novel deep learning framework for retinal A/V segmentation, named RIP-AV, which integrates a Representative Instance Pre-training (RIP) task with a context-aware network for retinal A/V segmentation for the first time. Initially, we develop a direct yet effective algorithm for vascular patch-pair selection (PPS) and then introduce a RIP task, formulated as a multi-label problem, aiming at enhancing the network’s capability to learn latent arteriovenous features from diverse spatial locations across vascular patches. Subsequently, in the training phase, we introduce two novel modules: Patch Context Fusion (PCF) module and Distance Aware (DA) module. They are designed to improve the discriminability and continuity of thin vessels, especially in low-contrast edge areas, by leveraging the relationship between vascular patches and their surrounding contexts cooperatively and complementarily. The effectiveness of RIP-AV has been validated on three publicly available retinal datasets: AV-DRIVE, LES-AV, and HRF, demonstrating remarkable accuracies of 0.970, 0.967, and 0.981, respectively, thereby outperforming existing state-of-the-art methods. Notably, our method achieves a significant 1.7% improvement in accuracy on the HRF dataset, particularly enhancing the segmentation of thin edge arteries and veins.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1711_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/weidai00/RIP-AV

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Dai_RIPAV_MICCAI2024,
        author = { Dai, Wei and Yao, Yinghao and Kong, Hengte and Chen, Zhen Ji and Wang, Sheng and Bai, Qingshi and Sun, Haojun and Yang, Yongxin and Su, Jianzhong},
        title = { { RIP-AV: Joint Representative Instance Pre-training with Context Aware Network for Retinal Artery/Vein Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The method learns the segmentation and classification of vessels to arteries/veins from fundus images, a clinically signficant problem in hypertensive retinopathy. Offline, discriminative arteriovenous embeddings are learned from multi-scale patches sampled across the retina. Then, in the main process patches representing local and wider context are input into a dual-stream ConvNext encoder that extracts multiscale features that are processed by two novel modules, the distance aware (DA) module and the patch context fusion (PCF) modules. DA learns to refine vessel boundary, in an area, based on the distance from edges. PCF combines the output of the encoders with attention mechanisms to extract more contextual information. The result is decoded into a 3-channel prediction map: segmentation and artery/vein. A PatchGAN discriminator refines the segmentation map. Tested on 3 public datasets with a diverse range of images, the method marginally exceeds the performance against other methods

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodology: The authors proposed several novelties in the form of new modules for feature extraction of different category of vessels, the combination of contextual information, and finally the refinement of the segmentation map. Their combination in a common framework is also novel. As far as I am concerned, it is the first time that I see the application of a pre-training module to learn diverse embeddings for the different categories of vessels (RIP module). At the same time, the fusion of multi-scale information with attention mechanisms, even though is not new (see ref [1]: DA-Net in MICCAI 2022), it is novel in the context of the Artery/Vein classification system.

    Validation: They perform an extensive validation on 3 public datasets in databases with diverse range of image resolutions, from low to very high. Also, the visualization with the feature projection (tSNE) in Figure 3 helps to understand the value of including the RIP module in the training workflow.

    Reproducibility: The authors provide the necessary tools to the MICCAI community to be able to reproduce the experiments and results. They provide the code implementation with the necessary dependencies, the evaluation of their method, the dataset that was used in the study. Also the previous are supplemented with the trained weights of the proposed model so the community can replicate exactly the algorithm.

    Application: The proposed method can be directly applied in the extraction of clinical useful biomarkers, like the artery/vein ratio (AVR), to assess the hypertensive retinopathy grade.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The difference in the segmentation performance between the proposed method and the existing work is very close. Probably the improvements come from the largest retinal vessels, and not the smallest one. The standard deviation is not provided. Also, the authors do not provide statistical significance analysis of their results. The segmentation problem is highly unbalanced so the contribution of the smallest retinal vessels could not be measured with the standard Sensitivity, Specificity, and Accuracy metrics.

    2) The proposed patch context fusion (PCF) module to fuse multi-scale context is not new a new idea. It has been proposed as an approach in [1].

    [1] DA-Net: Dual Branch Transformer and Adaptive Strip Upsampling for Retinal Vessels Segmentation, MICCAI 2022

    3) Even though the authors discuss the limitations of the existing works in the case of the smallest retinal vessel segmentation detection, in the experiments they do not examine in detail the performance of the method against the existing methods in this category of vessels. The authors also do not discuss the limitation of their method.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) How does the authors’ method compare in terms of methodology and performance in the segmentation with the following method [1], and with respect to the smallest retinal vessel segmentation? The method in [1] combines local and global context with transformers like the presented work in this paper. The authors take special consideration for the smallest retinal vessels segmentation problem by applying an adaptive strip upsampling block, while in this work they do not consider the smallest retinal vessels problem as significant. [1] DA-Net: Dual Branch Transformer and Adaptive Strip Upsampling for Retinal Vessels Segmentation, MICCAI 2022

    2) In the introduction please correct the statement that the arteriovenous width ratio decrease is triggered by microaneurysms (MAs). The reduction in AVR is not caused by microaneuryms it is probably related but not caused by the MAs.

    3) The clarity of figure 1 is very low, some fonts are very small in size. Please increase their size to make it more visible, also the insets, for example the distance maps are barely visible.

    4) Is the patch-pair selection in RIP module robust to noisy examples? In Diabetic Retinopathy there can be cases where the area around the vessels contain other type of lesions, like heamorrages. Also, advanced stages of DR can cause venous beading, is the method robust to segment diseased vessels and classify them as veins or arteries? Another case is areas with lot of junctions and bifurcations, there usually the A/V classification methods fail to either segment the vessels or correctly classify them to veins or arteries. How the method performs in this category of regions?

    5) Please include the names of the methods in the first column of Table 1. For example, A.Galdran [5], A.Galdran [6], L. Li [7], … Luo [14].

    6)In Table 1 are the differences in the metrics statistically significant? Also in the results are the performance improvements originating from better small vessel segmentation, or large vessel boundary detection?

    7) What is the training and inference time of the authors’ method and compared to the existing work? Could the method fit to the time constraints of the clinical workflow?

    8) Please consider performing UMAP for the feature projection instead of the tSNE. UMAP is faster, newer, and it gives better representations/visualizations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Even though some of the components of the method are novel, the validation demonstrated the performance is comparable with existing works in the literature.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I understand that there are significant methodological differences compared to existing work, and that the proposed method has contributions that merits the acceptance. Based on, the author’s rebuttal I am going to increase the score. I am expecting to see the minor changes highlighted to the authors in the camera ready paper.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a new method, called RIP-AV, for the segmentation and classification of retinal blood vessels in color fundus images. The contributions of the paper are twofold. First, the authors propose a new pre-training method to increase the discriminative ability of the encoder. This method is based on the classification of patches into three classes: artery, vein, and both. Second, they propose two new modules: a Patch Context Fusion (PCF) module to fuse global and local features, and a Distance Aware (DA) module (and corresponding loss function) that computes the distance transform map from shallow features to improve the delineation of vessels. The proposed method was evaluated on three public datasets, DRIVE, LES-AV, and HRF, and shows state-of-the-art performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The problem addressed is relevant and the contributions are clearly explained.
    • The proposed methods are well motivated and straightforward.
    • The evaluation was performed on three public datasets.
    • The performance seems to be remarkable.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Incomplete literature review and comparison. Relevant work is not cited or compared to the proposed method. Morano et al. (2021) (see full references in Constructive Comments), along with Chen et al. (2022), were the first to propose a multi-label loss function for retinal artery/vein classification as the one used in the proposed method. In addition, Karlsson and Hardarson (2022) and Zhao et al. (2024) represent the state-of-the-art in retinal artery/vein classification.
    2. Lack of cross-dataset evaluation. It is not possible to know the generalization ability of the proposed method to other datasets.
    3. Lack of details in the proposed method and in experimental setup. For example, it is not clear to me how the patch predictions are then integrated to obtain the final complete segmentation map, and which splits were used for training and testing in the LES-AV and HRF datasets, which do not have a standard split.
    4. The metrics used for evaluation are insufficient. More metrics are needed to thoroughly evaluate the performance of the proposed method. In addition, it is not clear how these metrics are computed. For all pixels? For the detected vessel pixels only? For the ground truth vessel pixels only?
    5. Not entirely appropriate ablation study. Although it is an extended practice, the ablation study should not be performed on the same data set used for comparison with the state-of-the-art methods. In addition, the ablation study lacks a comparison without the discriminator.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    All the details needed to reproduce the experiments are provided and all the datasets used are publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should discuss and compare the proposed method with relevant work that is not cited in the paper neither included in the comparison (Morano et al., 2021; Karlsson & Hardarson, 2022; Zhao et al., 2024).
    2. The lack of generalizability of the models is a well-known problem in the field of retinal vessel segmentation and classification. For this reason, it is important to evaluate the proposed method in cross-dataset settings, as was done in paper Refs. 5 and 6, where the models were trained on DRIVE and tested on LES-AV. I suggest that the authors perform the same evaluation.
    3. The authors should provide a clearer explanation of how the patch predictions are integrated to obtain the final complete segmentation map. The authors should clearly indicate which splits were used for training and testing in the LES-AV and HRF datasets, which do not have a standard split, and different splits can lead to different results. It is not clear to me whether the proposed method was evaluated on the same splits as the state-of-the-art methods. Please indicate if this is not the case.
    4. The authors should clearly describe how the metrics are computed (e.g., whether they are computed for all pixels, for detected vessel pixels only, or for ground truth vessel pixels only). Importantly, this should be consistent across all methods compared, or at least clearly stated otherwise. For example, in Ref. 13, the results for the ground truth vessel pixels are reported, whereas Ref. 12 only reports the results for the detected pixels. Furthermore, I suggest including more metrics typically used in the literature, such as AUROC for retinal vessel segmentation (Karlsson & Hardarson, 2022) and the metrics used for different vessel widths (Chen et al., 2022; Karlsson & Hardarson, 2022).
    5. The ablation study should be performed on a separate validation set. In addition, it should include a comparison without the discriminator to better understand the contribution of this component to the final performance of the proposed method.

    References:

    • Chen, Wenting, et al. “TW-GAN: Topology and width aware GAN for retinal artery/vein classification.” Medical Image Analysis 77 (2022): 102340.
    • Karlsson, Robert Arnar, and Sveinn Hakon Hardarson. “Artery vein classification in fundus images using serially connected U-Nets.” Computer Methods and Programs in Biomedicine 216 (2022): 106650.
    • Morano, José, et al. “Simultaneous segmentation and classification of the retinal arteries and veins from color fundus images.” Artificial Intelligence in Medicine 118 (2021): 102116.
    • Zhao, Aidi, et al. “Optimization of retinal artery/vein classification based on vascular topology.” Biomedical Signal Processing and Control 88 (2024): 105539.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The performance of the method is remarkable, but the paper lacks a thorough comparison with the state-of-the-art methods and a cross-dataset evaluation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have convincingly addressed all my concerns regarding previous work, evaluation, and the explanation of their method.



Review #3

  • Please describe the contribution of the paper

    This paper propose a novel architecture for retinal artery/vein segmentation. The propose architecture has three main components: representative instance pre-training for arteriovenous feature pre-training, distance aware module for accurate prediction of edges and patch context fusion module for leveraging surrounding context.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The experiments results and the visualization show the effectiveness of the proposed method.
    2. The proposed distance aware module and the idea of pre-training with patch classification task is interesting.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. I doubt the novelty of the patch-pair selection algorithm. It seems that it is just used for selecting vessel-centered patches and their corresponding context images. The algorithm is too complex for such a simple task.
    2. The evaluation of the effectiveness of the discriminator is missing.
    3. Patch context fusion module seems very similar to reference [23].
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors may considering optimize the patch-pair selection algorithm.
    2. The authors may explain more about the differences between patch context fusion module and [23].
    3. Some writing promblems: (1) As ‘peripheral region’ has some specific meaning in ophthalmology, the authors may change another word to refer to the peripheral region of the image. (2) The resolution of DIRVE is 565*584 and 1 image in LES-AV has different resolution.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although there are some drawbacks in the writing of the manuscript, the method itself is novel and effective.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I maintain the “Weak Accept” opinion. The author has addressed my concerns regarding PCF module, but the issue of PPS remains.




Author Feedback

We thank the reviewers for their thoughtful comments and the appreciation of remarkable performance (R3) of our model, assessed as “first” (R1) and “interesting” (R4) of RIP. Below, we address the main concerns raised:

  1. The importance of PCF in our model compared to previous methods (R1, R3, R4) We appreciate the reviewer’s reminder that the idea of PCF has been proposed, which we fully acknowledge. Our PCF uniquely combines self-attention and cross-attention to fuse local and contextual features, different from methods like ref [23] using cross-attention or DA-Net employing a dual-transformer approach. Also, our innovation lies in the combination of well-targeted modules, including Patch-pair Selection (PPS), Distance-aware (DA), and PCF to achieve precise A/V segmentation. We are dedicated to overcoming the challenge of distinguishing small vessels and vessels in pathological noise regions which cannot be addressed by a single contextual fusion strategy. Accordingly, we introduce PPS to select small A/V at edges and major vessels at centers, and DA to distinguish vascular features from noise. PCF conducts context fusion on the top of PPS and DA (see Fig 1) to maintain continuity within a patch and promote interactions between patches globally, crucial for accurate A/V segmentation in images with dense vessels and noise. We apologize for lacking discussion on the rationales of the model’s improvements in the current work, this content will be added in the final version. Of note, due to the high density of small vessels and lesions in the HRF dataset used for model training and test, we observed a significant improvement (1.7%) over existing methods, highlighting its robustness (see Table 1 and Fig 2). 2.Performance assessment (R1) In the final version, we will perform statistical analyses and incorporate the standard deviation of SN, SP, and ACC to provide a more comprehensive assessment of the proposed models’ performance. Additionally, UMAP will be used to visualize the feature projection of RIP.
  2. Details of experimental setup (R3) We apologize for any confusion due to our unclear descriptions. Our metrics are calculated based on ground truth vessels, consistent with Ref. 13. Regarding the experimental setup, we utilized patch-pair selection at different scales (AV-DRIVE [64, 128, 256], LES-AV and HRF [96, 128, 256]) to extract multi-scale patches and their context for model training. For testing, sliding windows with 150 overlaps were used to extract patches, with each patch’s context being the entire image. Results from these patches were integrated using strategies from refs [11-14]. These details will be included in Sect 3.2 of the final version.
  3. Comparison and extra evaluation metrics (R1, R3) We appreciate the suggestion for additional comparisons and will conduct a comprehensive analysis in the final version. While we initially used standard metrics SN, SP, and ACC, we acknowledge their limitations for evaluating vessels with varying widths. In the final version, we will introduce AUCROC, AUCPR, SN, SP, and ACC according to vessel width variations.
  4. Other Concerns (R1, R3, R4) 1) Training on the high-resolution HRF dataset using a single 4090 GPU takes approximately eight hours, with an inference time of less than one second per image, making it suitable for clinical applications (R1). 2) We will add citations (Morano et al., Karlsson and Hardarson, and Zhao et al.) and discuss them in Sect.1, along with cross-validation results in the final version (R3). 3) An ablation study on an independent dataset will be performed, demonstrating performance without a discriminator (R3, R4). We are grateful for the reviewers’ constructive feedback and will ensure these enhancements are reflected in the camera-ready version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top