Abstract

Representation learning offers a conduit to elucidate distinctive features within the latent space and interpret the deep models. However, the randomness of lesion distribution and the complexity of low-quality factors in medical images pose great challenges for models to extract key lesion features. Disease diagnosis methods guided by contrastive learning (CL) have shown significant advantages in lesion feature representation. Nevertheless, the effectiveness of CL is highly dependent on the quality of the positive and negative sample pairs. In this work, we propose a clinical-oriented multi-level CL framework that aims to enhance the model’s capacity to extract lesion features and discriminate between lesion and low-quality factors, thereby enabling more accurate disease diagnosis from low-quality medical images. Specifically, we first construct multi-level positive and negative pairs to enhance the model’s comprehensive recognition capability of lesion features by integrating information from different levels and qualities of medical images. Moreover, to improve the quality of the learned lesion embeddings, we introduce a dynamic hard sample mining method based on self-paced learning. The proposed CL framework is validated on two public medical image datasets, EyeQ and Chest X-ray, demonstrating superior performance compared to other state-of-the-art disease diagnostic methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2364_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2364_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Hou_AClinicaloriented_MICCAI2024,
        author = { Hou, Qingshan and Cheng, Shuai and Cao, Peng and Yang, Jinzhu and Liu, Xiaoli and Tham, Yih Chung and Zaiane, Osmar R.},
        title = { { A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces enhanced self-supervised learning by utilizing shared semantic information between high-low-quality pairs for medical images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Representation learning in low-quality contexts is a promising direction, especially given the prevalence of artifacts and other noise in medical image collection, particularly in fundus images.

    1. Self-supervised learning is a valuable method in the field of medical imaging, where the challenges of data scarcity, the need for robust and generalizable models, and the high costs of data annotation are significant.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1). The main motivation has already been addressed in similar works, which diminishes its novelty.

    Che, Haoxuan, Siyu Chen, and Hao Chen. “Image quality-aware diagnosis via meta-knowledge co-embedding.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.

    (2). The baseline for the EyeQ comparative experiment in this article is much lower than that of the original paper. In EyeQ, even ResNet18 can achieve a score of 0.8848 on images that are 100% rejected quality label.

    Fu, Huazhu, et al. “Evaluation of retinal image quality assessment networks in different color-spaces.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22. Springer International Publishing, 2019.

    (3). Missing a baseline comparison for self-supervised learning methods such as MOCO-v3, MoBY, DINO, MAE, and DINO v2. this paper need a comprehensive comparison of these methods for self-supervised learning on low-quality data.

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.

    Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski,and Armand Joulin. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The baseline lower than reported in the original paper

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to weakness section

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    lacking the comparison experiment with the latest self-supervised learning methods

    Low based line performance in EyeQ

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The author solved some of my concerns, and I decided to change to weak reject. The proposed method still lacks comprehensive evaluation experiments with Sota self-supervised learning methods.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a multi-level contrastive learning method for disease diagnosis using image quality as the label. A self-paced learning schema is introduced for dynamic hard samples during CL training. Experiments on two datasets demonstrate improved accuracy over existing baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Using image quality to guide the contrastive pairs generation is new. The experimental results is promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The method requires image quality as additional labels, which is not always available in medical datasets. (2) The hard negative mining seems similar to [5]. (3) It is hard to tell how the multi-level CL mitigates false negatives. In the high-quality lesion and low-quality lesion contrast, similar low-quality lesion patches as the anchor patch still exist in the negative part. Can we label them as 4 classes and use CL to cluster them into 4 different classes?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Code is given in the supplementary.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Figures are too small, it’s hard to read on paper. (2) Fig 1(b2) is strange, why do all the low-quality lesion samples appear on the healthy sample side? The model should at least map some of them to the correct embedding place. (3) In the CL stage, do the authors assume they have the lesion label? If so, why not use supervised contrastive learning based on the lesion label? Contrasting high-quality and low-quality lesion samples seems counter-intuitive, it will make the model focus on noise parts instead of the lesion parts. (4) What about low-quality lesions and high-quality healthy contrast? They are part of the problem in Fig 1(b2). (5) If 100% images are low quality, is the method shrinks to normal basic CL? If so, where does the accuracy gain come from in Table 1? (6) What is the value of Kt? Please do an ablation study on this hyper-parameter.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I’m not totally convinced by the multi-level contrastive learning idea, at least by the current version of the manuscript. It would be better to provide more illustration on how basic CL deals with the problem, where are the false negatives and pinpoint how the method addresses the problem.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Still not convinced by the Multi-CL, the three level perspectives repelling is essentially pushing two parts away and still does not consider continuity. If “Not all regions in low-quality images have poor quality”, how do we know the quality of every region? The self-pace is not new.



Review #3

  • Please describe the contribution of the paper

    Representation learning aids in understanding distinctive features in latent space and interpreting deep models, yet challenges arise due to lesion distribution randomness and complexity of low-quality factors in medical images. Contrastive learning (CL) for disease diagnosis has shown promise, but its effectiveness depends on sample pair quality. The researchers propose a clinical-oriented multi-level CL framework to enhance lesion feature extraction and discriminate between lesions and low-quality factors, improving disease diagnosis accuracy from low-quality medical images through multi-level pair construction and dynamic hard sample mining. The framework outperforms state-of-the-art methods on EyeQ and Chest X-ray datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The narrative logic of the article is very clear, with a well-structured presentation that enables readers to easily follow the flow of the paper and grasp its key concepts and steps.
    2. The method is innovative, addressing the issue of lesion feature extraction in medical images effectively by proposing a novel multi-level contrastive learning framework. Particularly, in the construction of multi-level positive and negative sample pairs and the introduction of a dynamic hard sample mining method based on self-paced learning, it brings new insights and technological breakthroughs to the field of disease diagnosis.
    3. The proposed solution not only holds academic significance but also practical relevance. Through validation on publicly available medical image datasets such as EyeQ and Chest X-ray, the approach demonstrates the potential for more accurate and effective disease diagnosis in low-quality medical images, providing clinicians with more reliable diagnostic tools and guidance to improve diagnostic accuracy and efficiency in medical practice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    If possible, could more case analyses be provided? This would be beneficial for validating the effectiveness.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It can improve the presentation of the paper and prove the effectiveness of the method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method design and effect verification are sufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ efforts in evaluating our work and acknowledge the paper’s contributions, including its innovative method (R1) and promising results (R3). We address each reviewer’s questions as follows: I.METHOD(R3) (a)Unlike [5], we propose Multi-CL and design a hard sample mining method based on self-paced learning. This strategy dynamically selects hard samples based on the model’s learning state, without requiring a pretrained teacher model, improving stability and convergence speed.(R3-3.Q2) (b)Compared to general CL with random positives/negatives, CoMCL constructs positives and negatives based on quality and lesions, ensuring semantic consistency between positives and anchors, and differences between negatives and anchors. This reduces false negatives, better fits diagnostic needs, and provides clear interpretability. Clustering samples into four classes via CL is inspiring but overlooks the relative relationships and continuity of lesions under different qualities by focusing on absolute classes.(R3-3.Q3) (c)Considering lesion regions are small relative to the entire image, CL based on image-level lesion labels struggles to identify lesion features. Hence, our method considers three level perspectives: 1).Quality: high-quality lesions (HL) vs. low-quality lesions (LL); 2).Lesion: HL vs. high-quality healthy (HH); 3).Lesion: LL vs. low-quality healthy (LH). Learning from both image quality and lesions enables the model to identify lesions in low-quality images. However, when contrasting low-quality lesions and high-quality healthy samples, since both quality and lesion are present, the model may not effectively distinguish quality and lesion features, affecting downstream task performance. It should be noted that Fig.1(b2) simplifies the complex to intuitively illustrate the impact of low-quality factors on feature learning, prioritizing problem visualization over reflecting all details.(R3-[7.Q2-7.Q4]) II.MISUNDERSTOOD(R1/R3/R4) (a)We will provide more case analyses: 1).Analyzing CoMCL’s diagnostic performance on images of varying qualities. 2).Comparing CoMCL with other methods in identifying key features of more diseases. (R1) (b)Many publicly datasets(e.g. EyeQ/DR2/HRF) already contain quality label. Besides, obtaining quality labels is inexpensive and can be easily labeled. (R3-3.Q1) (c)Not all regions in low-quality images have poor quality[1], allowing patch construction for multi-level contrast. Thus, CoMCL does not degrade into basic CL.(R3-7.Q5) [1]Hou Q. A Collaborative …[J].TMI, 2024. (d)Kt adaptively varies with training steps (Sec.2.2) and is mainly determined by the number of negatives,δ,as shown in the ablation study in the previous supplementary material.(R3-7.Q6) (e)The baseline you mentioned is for quality assessment task, while our work focuses on DR grading. The former aims to evaluate image quality, while the latter is disease diagnosis. The research tasks are different. It is inappropriate to directly compare their performance. We kindly request you to re-examine the content and objectives of our work.(R4-Q2) III.NOVELTY(R4) (a)Our work differs in problem definition and solution: Motivations: We address low-quality factors’ interference with disease feature extraction, while Che et al. utilize low-quality features for diagnostic stability. Methods: Our multi-level CL enhances lesion feature extraction through patch construction and hard sample mining, distinct from meta-knowledge joint embedding. (R4-Q1) (b)CoMCL focuses on design philosophy,e.g. Multi-level CL in low-quality image, rather than a specific SSL method. We welcome integrating our multi-level idea into more SSL methods. In fact, incorporating multi-level CL can improve the performance of related methods (e.g., MoBY, DINO…) on low-quality images. (R4-Q3) IV.OTHERS(R1,R3) (a)We have uploaded the code to the CMT system, code link will be included in final paper.(R1) (b)We will resize figures in the final paper.(R3-7.Q1)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received WA, WR, WR, while R4’s comment seems not very relevant. Overall, I think the paper still has merit in methodological design and experimental results.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received WA, WR, WR, while R4’s comment seems not very relevant. Overall, I think the paper still has merit in methodological design and experimental results.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I would like to champion this paper. While the reviews say Weak Accept and 2xWeak Reject, the reviews actually emphasize several strengths of the paper: Clear writing, novel idea, and practical significance, whereas the listed weaknesses are fairly small. Reviewer 4 seems to have misunderstood the paper, as the suggested baselines tackle a different problem.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I would like to champion this paper. While the reviews say Weak Accept and 2xWeak Reject, the reviews actually emphasize several strengths of the paper: Clear writing, novel idea, and practical significance, whereas the listed weaknesses are fairly small. Reviewer 4 seems to have misunderstood the paper, as the suggested baselines tackle a different problem.



back to top