Abstract

Glaucoma is an irreversible eye disease that has become the leading cause of human blindness worldwide. In recent years, deep learning shows great potential for computer-aided diagnosis in clinics. However, the diversity in medical image quality and acquisition devices leads to distribution shifts that compromise the generalization performance of deep learning methods. To address this issue, many methods relied on deep feature learning combined with the employment of data-level augmentation or feature-level augmentation, respectively. these methods suffer from the limited search space of feature styles. Previous research indicated that introducing a diverse set of augmentations and domain randomization during training can expand the search space of feature styles. In this paper, we propose a Randomized joint Data-feature augmentation and Deep-shallow feature fusion method for automated diagnosis of glaucoma (RDD-Net). It consists of three main components: Data/Feature-level Augmentation (DFA), Explicit/Implicit augmentation (EI), and Deep-Shallow feature fusion (DS). DFA randomly selects data/feature-level augmentation statistics from a uniform distribution. EI involves both explicit augmentation, perturbing the style of the source domain data, and implicit augmentation, utilizing moments information. The randomized selection of different augmentation strategies broadens the diversity of feature styles. DS integrates deep-shallow features within the backbone. Extensive experiments have shown that RDD-Net achieves the SOTA effectiveness and generalization ability. The code is available at https://github.com/TangYilin610/RDD-Net.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3922_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/TangYilin610/RDD-Net

Link to the Dataset(s)

https://zenodo.org/records/12562673

BibTex

@InProceedings{Tan_RDDNet_MICCAI2024,
        author = { Tang, Yilin and Zhang, Min and Feng, Jun},
        title = { { RDD-Net: Randomized Joint Data-Feature Augmentation and Deep-Shallow Feature Fusion Networks for Automated Diagnosis of Glaucoma } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a statistical feature-based data augmentation strategy by combining a network that contains ResNeSt50 and CBAM modules for glaucoma classification. The results and ablation studies show their method improves the performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The accuracy has been greatly improved based on their results.
    2. The experiments are well conducted, and the analysis of the results is sufficient.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In Section 2.1, what is the meaning of x_i∈R^(B×C), authors should explain this. The DFA in figure 1 is ambiguous, the flow of the figure should be more clear. For example, where DFA is implemented and what is the meaning of the R~(0,1)? The explanation of methodology is unorganized. What is the relationship between variable R(x) in Section 2.1 and Section 2.2?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In Figure 1, the input feature to Deep-Shallow Feature Fusion (DS) is unclear, there are double S1 and S2. How these features are merged, authors need to discuss and draw a clear diagram.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using statistical feature for data augmentation have been introduced in some previous work [paper references -13, 17] in terms of Explicit Styles Mixed Statistics and Implicit Moment Exchange part. Authors should precisely mention their contribution and differences between these previous research.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This work present RDD-Net for the automated diagnosis of glaucoma. To addresses the challenge of distribution shifts in medical imaging data, RDD-Net employs a randomized joint data-feature augmentation strategy coupled with deep-shallow feature fusion. The experiments are comprehensive and the results are good.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1, This work addresses a meaningful clinical problem, i.e., glaucoma recognition with potential distribution shifts. 2, The main idea of combining both data augmentation and feature augmentation is interesting and performs well. 3, Feature fusion across layers also benefits the performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, Deep-Shallow Feature Fusion (DS) is unclear. Since ResNet-ST obtains hierarchical features, the sizes of features vary in each stage, can they be concatenated directly? 2, Since the flatten operation in feature fusion would increase the dimension largely, I wonder the total params and FLOPs compared with original ResNet-ST. Besides, why not apply average pooling for feature aggregation?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1, What does “DeepAll” in Table 2 mean? Is it a naive baseline? 2, Figure 3 makes me confused. More description should be added. 3, More analysis about the results would be helpful.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have conducted extensive experiments, including comparison with other glaucoma recognition approaches, domain generalization methods, and ablation studies of each component. The experiments are comprehensive and the performance of proposed approach is really good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author has addressed my concerns.



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a Randomized Joint Data-Feature Augmentation and Deep-Shallow Feature Fusion method for automated diagnosis of glaucoma.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The RDD-Net is designed to broaden the range of feature styles explored for automated glaucoma diagnosis. It consists of three main components: DFA, EI, and DS. DFA randomly selects Data-level and Feature-level Augmentation statistics from a uniform distribution. EI involves both Explicit augmentation, perturbing the style of the source domain data, and Implicit augmentation, utilizing moments information. DS integrates deep and shallow features within the backbone, facilitating deep-shallow layer feature fusion. Extensive experiments have shown that RDD-Net achieves state-of-the-art effectiveness and generalization ability on two benchmark datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The document is written in poor English, I noticed grammatical errors

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The document is clear, concise and well organized The strength of contribution to the field is well represented in this article The scientific quality of the article is good.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The document is clear, concise and well organized The strength of contribution to the field is well represented in this article The scientific quality of the article is good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their insightful comments. We will address all the concerns. Q1: Grammatical errors and the reproducibility.(R1) We have asked a native English speaker to go through the paper thoroughly. The code will be released once the paper is accepted. Q2: The contribution and differences between [Ref.-13, 17] and proposed method.(R2) Ref.[17] proposed the Implicit augmentation method, utilizing moments information. It swaps the moments of the learned features of training images. Ref.[13] proposed an Explicit augmentation to perturb the style of the source domain data by randomly sampling from a uniform distribution. The domain-invariant representations are learned with the feature styles by randomly mixing the augmented and original statistics. We proposed a randomized joint data-feature augmentation and deep-shallow feature fusion network for automated diagnosis of glaucoma. It consists of three main components: DFA, EI, and DS. DFA randomly selects Data/Feature-level Augmentation statistics from a uniform distribution. EI involves both Explicit augmentation, perturbing the style of the source domain data, and Implicit augmentation, utilizing moments information. The randomized selection of different augmentation strategies broadens the diversity of feature styles. DS integrates deep-shallow features within the backbone. Extensive experiments have shown that RDD-Net achieves the SOTA effectiveness and generalization ability. Q3: Clarification : 1)the meaning of x_i∈R^(B×C); 2)the relationship between R(x) in Section 2.1 and Section 2.2.(R2) 1) x_i∈R^(B×C) represents the input features of the i-th batch in ResNeSt. 2) In Section 2.1, Eq. (1) indicates random sampling of the data/feature-level augmentation statistics from a uniform distribution. In Section2.2, Eq. (2) is revised as R(y)= 𝜆2𝐸𝑥𝑝𝑙𝑖𝑐𝑖𝑡 (y)+ (1 − 𝜆2)𝐼𝑚𝑝𝑙𝑖𝑐𝑖𝑡 (y), y=x_i, when 𝜆1=1; y=S1(x_i), when 𝜆1=0. And 𝜆2~U(0,1). Q4: In Fig.1, the meaning of R~ U(0,1), DFA and DS are all unclear. There are double S1 and S2. Need to discuss and draw a clear diagram.(R2) We apologize for the ambiguous information in Fig.1. We will draw a clearer diagram in the revised paper and provide more details. R~U(0,1) is a mistake. It is revised as 𝜆1~𝑈 (0,1). Si(i=1,2,3,4) indicate the four layers in ResNeSt. DFA is a module that indicates the random selection of the data/feature as the input feature in EI, depending on 𝜆1. S1 and S2, with different colors, represent the first two layers in ResNeSt used for data/feature-level augmentation, respectively. DS integrates deep-shallow features from S1 to S4. Q5: Clarification on the way of concatenation in DS.(R5) We use the ‘concat’ operation to concatenate different layers along the 2nd dimension. Q6: The comparison of the params and FLOPs with ResNeSt. Why not apply average pooling for feature aggregation?(R5) We chose Refuge2 as the target domain and calculated the paras and FLOPS of the two methods. The paras of RDD-Net and ResNeSt used in our paper are 32.98M and 32.96M, respectively. The FLOPS for both methods are 0.44G. We compared the performance of Flatten versus Avgpool on the four benchmarks. The ACC(%) for Flatten and Avgpool are 92.8vs.91.65(Refuge2), 82.25vs.78.37(Harvard), 75.85vs.74.15(ORIGA), 84.54vs.79.18(RIMONE), 83.86vs.80.84(Avg.). Flatten yielded better results. Q7: Explanation of DeepAll and Fig.3.(R5) DeepAll is the implementation setup where models are trained on all source domain data and tested on unseen domains. Fig.3 is the scatter plot of the statistics produced by our data augmentation strategy using 2D t-SNE on four datasets. Different colors denote the image statistic features from different datasets. From Fig.3, Refuge2 and Harvard produce better points separation. Meanwhile, as is shown in Tab.2, the ACC of Refuge2 and Harvard are improved significantly compared with the SOTAs. It indicates that broadening the feature search space is vital to improving generalization.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top