Abstract

Automatic segmentation of diabetic retinopathy (DR) lesions in retinal images has a translational impact. However, collecting pixel-level annotations for supervised learning is labor-intensive. Thus, semi-supervised learning (SSL) methods tapping into the abundance of unlabeled images have been widely accepted. Still, a blind application of SSL is problematic due to the confirmation bias stemming from unreliable pseudo masks and class imbalance. To address these concerns, we propose a Rival Networks Collaboration with Saliency Map (RiCo) for multi-lesion segmentation in retinal images for DR. From two competing networks, we declare a victor network based on Dice coefficient onto which the defeated network is aligned when exploiting unlabeled images. Recognizing that this competition might overlook small lesions, we equip rival networks with distinct weight systems for imbalanced and underperforming classes. The victor network dynamically guides the defeated network by complementing its weaknesses and mimicking the victor’s strengths. This process fosters effective collaborative growth through meaningful knowledge exchange. Furthermore, we incorporate a saliency map, highlighting color-striking structures, into consistency loss to significantly enhance alignment in structural and critical areas for retinal images. This approach improves reliability and stability by minimizing the influence of unreliable areas of the pseudo mask. A comprehensive comparison with state-of-the-art SSL methods demonstrates our method’s superior performance on two datasets (IDRiD and e-ophtha). Our code is available at https://github.com/eunjinkim97/SSL_DRlesion.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1033_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1033_supp.pdf

Link to the Code Repository

https://github.com/eunjinkim97/SSL_DRlesion

Link to the Dataset(s)

https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid https://www.adcis.net/en/third-party/e-ophtha/

BibTex

@InProceedings{Kim_Semisupervised_MICCAI2024,
        author = { Kim, Eunjin and Kwon, Gitaek and Kim, Jaeyoung and Park, Hyunjin},
        title = { { Semi-supervised Segmentation through Rival Networks Collaboration with Saliency Map in Diabetic Retinopathy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a novel semi-supervised segmentation approach based on rival networks that supervise each-other. It is related to [20] and [21].

    The main proposal follows [21] co-training semi-supervised framework of two diverse networks, in which the winning network on labeled samples provide pseudo-labels for unlabeled samples to the other network for each batch. Also, the supervision is masked with differences among network outputs. The authors incorporate ideas from [20] to account for a dual-weighting system, so that each network focus on either difficult or underrepresented samples.

    The paper also proposes modifying the pseudo-label unsupervised stream by, first, measuring “easy class” weights for the imbalance-aware network, and accounting for class imbalance in the difficult-aware network, to supervise the other network; and second, switching the difference mask of [21] by a saliency-based mask.

    Comparison and ablation studies are provided in DR lesions segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel proposal effectively leveraging the combination of two related work strategies for semi-supervised learning of segmentations.

    Extensive experiments that include the comparison with alternative semi-supervised learning methods and ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper does not fulfill the minimum clarity requirements for publication and contains multiple mistakes in the formulas.

    Proposed modifications are not clearly justified and validated

    Lacks a comparison with state of the art works in DR lesion segmentation for reference

    Relations and modifications with respect to related works are not clearly stated in the paper.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There are several parameter values not explained in the paper that are provided in the supplementary materials. However, the paper lacks the required clarity and contains several mistakes to be able to follow all the required algorithmic details with confidence.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    There are several unclear sections in the paper. Namely:

    • Section 2 is not clear and contain multiple errors in the formulas along with missing symbols and definitions (e.g. epsilon, tau, beta, iterations t, etc.)
    • The “sharpening process with temperature factor T” in section 3.1 is not explained nor referenced.
    • D in eq (2) multiplies and divides. There might be a missing sum in the denominator. Moreover, f should be clarified to denote both f_A and f_B for clarity.
    • Inconsistent symbols throughout the paper. For example, in section 3, first, supervised loss is defined as L_x, and unupervised as L_u, while later, the first is referred to as L_sup and the second as L_cons
    • The relation between W_A/B and W_easy/min and W^diff/dist could be simplified a lot under a clearer and less convoluted notation.
    • The main idea under the saliency method [13] could be provided for clarity.

    The proposed modifications, i.e. 3.2 - modified weighting strategy on the pseudo-label unsupervised stream wrt [20] and 3.3. the incorporation of a saliency map capturing “color-striking features” instead of mask differences wrt [21], are not clearly justified, nor validated on the provided comparison and ablation studies.

    Related to the previous comment, the actual contributions with respect to related works [20]-[21] are not clearly identified. This deviates the attention from what is actually proposed to advance with respect to related works and what should be actually validated in the experiments.

    Additionally, the provided discussion in the experiment sections lacks the required focus in the same sense, to highlight the value of the contributions and it is focused instead on discussing “improved performance” without analysis. In my opinion, this might be improved in two ways.

    First, by clearly discussing the differences with the compared approaches, and grasping specific evidences that confirm the guiding hypotheses for the proposal (how do you explain the obtained improvement? Is it due to the proposal?).

    Second, in the ablation study, it is not clear what is the baseline and how the different modules are integrated. Moreover, it is not clear how each of them relates with previous works. I supose (it is not stated) that the baseline is just cross-pseudo supervision. However, if that’s the case, the provided results should match those of CPS in table 1, which they don’t. Then, the only increment discussed is the one on the fourth row (saliency), which is not clear if only includes the saliency [13] in the pseudo-labeled part, or also includes the masking in the supervised part [21]. Also, the alternative would be to preserve the mask of [21] in the unsupervised stream, and should be compared (along with the BR+VW case). BR only should be somehow related to MCF, while results do not match. Finally, VW should be related to DHC with, again, mismatching results with those of table 1. Similarly to the case of saliency, the flipping weights from wrt to the proposal in [20] should be accounted as a separate ablation option.

    Overall, the paper writing lacks enough quality to present and validate the, otherwise, interesting research direction and potentially remarkable results. Please consider my concerns and comments to improve the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Reject — must be rejected due to major flaws (1)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the value, interest, and promising results of the conducted research, the paper lacks enough clarity to be published in its current form. This is a major flaw.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I’ve reconsidered my recommendation to weak accept, after pondering the rebuttal and other reviewer’s comments.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a multi-lesion semi-suppervised segmentation method using difficulty-aware and imbalance-aware rival networks and collaboration between them. The proposed method evaluated on two diabetic retinopathy datasets achieves better results compared to several semi-supervised methods in segmenting four different lesions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is interesting, particularly the combination of victor-guided weighting and saliency-based consistency strategies, in an effort to enhance semi-supervised medical image segmentation performance.

    Good sets of experiments including ablation studies are presented to convey the contributions of the paper.

    The proposed method exhibits improved performance compared to several other semi-supervised methods as well as supervised models in a limited-labeled data settings.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. All different components already exist in literature. There is definitely benefit in combining them.

    2. The presentation of the paper can be significantly improved, e.g., details of the parameter choices, analysis of the results, discussion of failure cases, etc.

    3. There are several inconsistencies in the reported results and the discussion in the texts. E.g., The ablation study result in Table 3 suggests the combination of the three contributions does not perform the best. But it’s claimed otherwise in the preceding paragraph. In fact, the contributions of different components are inconsistent.

    4. Lesion segmentation results for two datasets and an ablation study are presented without discussing them well.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Figs. 2 and 3 need to be better annotated. A visualization of the segmented lesions by the proposed method at different proportions of labeled data would be interesting to look. The authors might also consider adding segmentation from the e-optha dataset.

    2. The proposed method has consistently higher std. Can the authors explain why their method consistently have poorer scores for microaneurysms?

    3. It seems different image sizes are used for two datasets. Why so?

    4. How did the authors select the particular values for the hyperparameters \beta, \tau, th?

    5. The authors are suggested to add a discussion comparing the #parameters and training/inference time of different methods to their proposed one.

    6. Can the authors discuss any limitations or failure cases of their method?

    7. Table 1 (supplemental) should be part of the main paper.

    8. What did the author mean by balanced performance? If it’s for different lesion types, the reported results suggest otherwise. Inter-class performance gaps are evident from the reported results.

    9. If the proposed method used for clinical applications. How to use it?

    10. How did the authors select \lambda_u, \lambda_cor, \alpha, T values?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Inconsistent claims, unclear or missing details, lack of proper analysis of the results.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I appreciate the authors for addressing the reviewers’ comments. I am happy to improve my final rating.



Review #3

  • Please describe the contribution of the paper

    The authors present a novel; extension to dual objective segmentation model for semi-supervised retinal segmentation. The problem is formulated as a competition between two segmentation networks optimized using different objectives. They also propose a novel way to aggregate the weights of the defeated network. This approach showcases consistent improvement over SOTA methods on same modality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty Authors present a creative and novel improvement to a SOTA retina segmentation method. They optimize two different networks, one optimizing for difficult lesion segmentation and other for class imbalance. To the best of my knowledge I have not seen such kind of competitive model formulation for medical image segmentation before.

    Thorough evaluation. SSL performance is evaluated with two different datasets and compared with SOTA method for retinal lesion multi-segmentation. Along with this the authors also present ablation over different component of their proposed system providing insight and motivation behind each modular decision.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think the paper is done really well and I have just minor issues.

    1. There is no clear portray of how the inference path looks like. A figure to explain the model inference, or even a well structured text block will add a lot of value, and resolve some confusion.

    2. It would also be worth to report the method’s performance with 100% supervision.

    3. Figure 1 caption, second sentence is grammatically incorrect please rephrase.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This is a really novel idea. The paper is well-motivated and thorough, however there could be some improvements made to writing clarity. In terms of results the authors did a great job at providing comparisons to SOTA with different datasets and settings.

    The paper can be improved a lot by adding clarification about the inference, as well as an analysis of 100% supervision of the method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents strong novelty and demonstrated good evaluation + gains over SOTA multi-class retinal lesion segmentation.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks for your comments. Note that our code is open (anonymized link in Abstract). Below we clarify confusions. R1&R4:Differences (and ablation) with related works [20-21]. We suggest novel bifurcated rival (BR), Victor-guided weighting (VW), and saliency-guided consistency (SC) components compared to [20:DHC,21:MCF]. As explained in Sec 3.1, BR has two different models with a weighting strategy that learns difficulty and imbalanced classes, while MCF has two different models without a weighting strategy. As in Sec 3.2, VW has Victor selection and flipping weight, while DHC did not. We will clarify texts. Due to these differences, BR surpasses MCF for 0.86% and 2.83% in minor and difficult classes (MA and SE) and VW surpasses DHC for 1.22% and 7.43% in major and easy classes (HE and EX) in Table 3 and Supp Table 2. This clearly shows our components are different from [20-21] and better. Thus, raised “mismatch”ed results are expected.
R4:Baseline. This was a mistake. The baseline model consists of two different structured models, which differ from CPS[3] which has two models of the same structure. Each incremental element is added to this base model in Table 3. We will make corrections. R4:Saliency map; alternative mask of [21]. The difference mask from [21] encourages the models to learn the discrepant region that each model doesn’t know and involves a lot of contours with high uncertainty. In contrast, our saliency map provides low-uncertainty information from Victor model in lesion-related areas to use reliable information as in UA-MT[24]. In Table 3, our SC surpasses baseline for 5.56% and adding SC surpasses (BR+VW) for 4.29% for the difficult SE class, because our SC distinctly represents broad bright features like SE and also decreases the negative confirmation bias of pseudo mask unrelated to SE. R1:Details. Optimal threshold for the saliency map was chosen based on inclusion of vessels or other structures. Hyperparameters(\alpha, \beta, \tau, T, \lambda_cor) followed [20-21]. \lambda_u was empirically set to 1. All these values are given in Sec 4 and Supp Table 1 and the way will be clarified. Model sizes differ slightly based on the backbone (MCF, Ours: 26.8MB, Others: 26.3MB). We will add them. R1:Higher std and poorer scores for MA. Ours had the lowest std for both DSC and PRC at 10% (Table 1). Still, we admit the poor MA as a limitation due to Victor model dependent on dice. However, we achieved 0.45-1.36% performance gain in MA than [21] thanks to the weighting strategies. R1:Balanced performance. Other methods showed big gains in one or two classes, but ours improved evenly in all classes. R1:Inconsistencies in the results and texts. We acknowledge reporting errors in Table 3 and will revise them. Our model was not always best in all classes, so we described the result as “combining them resulted in complementary and balanced advances” in Sec 4. R1:Image Size. Due to variations in sizes provided, we resized images accordingly. R1:Visualization. We will update Fig 3 to include results for e-optha dataset.
R1:Clinial application. Our model can aid physicians by providing additional information for efficient follow-up. R1&R2: Inference. Inference step ensembles of two model outputs and takes about 6.1 sec per image similar to [21]. We will update Fig 2 to explicitly show inference path. R2: 100% supervision. It’s in the last sentence of page 7. R4:Notation errors. We will revise them. 1) We admit a missing sum in the denominator for D in eq (2). Subscripts of f (f_A and f_B) were omitted for compactness. 2) In Sec 2, we briefly mention symbols and ideas used in [20] because we adapt their ideas for our purpose. 3) We noted “supervised loss” as L_sup and L_X. We will use L_sup for consistency. 4) W_A/B are inverse of W_easy/min and the same as W_diff/dist. This will be clarified. R4: SOTA DR lesion segmentation. As they don’t utilize unlabeled images, we did not compare them.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has successfully addressed most of the reviewers’ concerns.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has successfully addressed most of the reviewers’ concerns.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes a multi-lesion semi-supervised segmentation method using difficulty-aware and imbalance-aware rival networks. The reviewers list the interesting novel approach and the comprehensive experiments as strengths. The reviewers expressed concerns about missing details. The rebuttal addressed these concerns, and then the reviewers upgraded their reviews to WA, A, and WA. (One reviewer scored WR but stated that he/she upgraded the score to WA in the justification.) The meta-reviewer recommends accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposes a multi-lesion semi-supervised segmentation method using difficulty-aware and imbalance-aware rival networks. The reviewers list the interesting novel approach and the comprehensive experiments as strengths. The reviewers expressed concerns about missing details. The rebuttal addressed these concerns, and then the reviewers upgraded their reviews to WA, A, and WA. (One reviewer scored WR but stated that he/she upgraded the score to WA in the justification.) The meta-reviewer recommends accepting this paper.



back to top