Abstract

Medical image classification is an essential medical image analysis tasks. However, due to data scarcity of rare diseases in clinical scenarios, the acquired medical image datasets may exhibit long-tailed distributions. Previous works employ class re-balancing to address this issue yet the representation is usually not discriminative enough. Inspired by contrastive learning’s power in representation learning, in this paper, we propose and validate a contrastive learning based framework, named Balanced Parametric Contrastive learning (BPaCo), to tackle long-tailed medical image classification. There are three key components in BPaCo: across-batch class-averaging to balance the gradient contribution from negative classes; hybrid class-complement to have all classes appear in every mini-batch for discriminative prototypes; cross-entropy logit compensation to formulate an end-to-end classification framework with even stronger feature representations. Our BPaCo shows outstanding classification performance and high computational efficiency on three highly-imbalanced medical image classification datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2020_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2020_supp.pdf

Link to the Code Repository

https://github.com/Davidczy/BPaCo

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cai_BPaCo_MICCAI2024,
        author = { Cai, Zhiyuan and Wei, Tianyunxi and Lin, Li and Chen, Hao and Tang, Xiaoying},
        title = { { BPaCo: Balanced Parametric Contrastive Learning for Long-tailed Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper propose and validate a contrastive learning based framework, BPaCo, to tackle long-tailed medical image classification. There are three key components in BPaCo: across-batch class-averaging to balance the gradient contribution from negative classes; hybrid class-complement to have all classes appear in every mini-batch for discriminative prototypes; cross-entropy logit compensation to formulate an end-to-end classification framework with even stronger feature representations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. BPaCo can perform contrastive learning with in-the-queue representations that are obtained from previous batches, so that near-optimal performance can be attained even with a mini-batch of a small size.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The comparison method is not the latest. There are no more detailed experimental results to prove the improvement in few classes. Ablation experiments are not sufficient.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    They conducted experiments on three datasets and achieved some improvements. However, these comparison methods are 2022 and before. Researchers have proposed many new methods in 2023 for comparison. Please refer to NeurIPS2023, CVPR2023, ICCV2023, AAAI2023, etc. Also, what about the classification results for many, medium, and few classes or each class? Ablation experiments can be more detailed, for example, the balance parameters of loss, and different losses such as LDAM loss.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Not sufficient experiment.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    The comparison method is not the latest, and the experimental results have not improved much compared with Focal loss [10] in 2017. Furthermore, the author’s reply did not resolve my doubts, thus I recommend reject.



Review #2

  • Please describe the contribution of the paper

    The paper introduces the BPaCo framework, designed to tackle the long-tailed distribution of medical image data. The BPaCo framework implements across-batch class-averaging to mitigate the dominance of head classes in training. The integration of class-wise learnable parametric centers with the weights of the linear classifier enhances discriminative learning. Extensive experiments demonstrate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-motivated and it is a good idea to tackle the long-tailed distribution of medical image data.
    2. The paper is easy to read and presents a clear structure, with the goals of the research well established.
    3. The paper achieves the state-of-the-arts on three standard benchmarks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main idea of the proposed framework is similar to the combination of BCL and PaCo/GPaCo. Please clarify the differences. The overall methodology may appear as an assemblage of past techniques.
    2. The methods used for comparison in the study are from publications that are two years old or older. Including more recent methodologies, such as CoGloAT [1]. [1] Bach T, Tong A, Hy T S, et al. Global Contrastive Learning for Long-Tailed Classification[J]. Transactions on Machine Learning Research, 2023.
    3. The paper’s approach may lack substantial innovation, especially considering the current era of large-scale foundation models. It would be beneficial for the paper to more explicitly differentiate BPaCo from these methods, clearly highlighting the unique contributions.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Exploring how large-scale foundation models, including foundation models trained via self-supervision or large language models driven by massive textual data, could be adapted or integrated into long-tailed image classification presents an interesting avenue for research. This discussion could significantly enhance the paper’s relevance and foresight in leveraging cutting-edge AI technologies for medical imaging.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty of methodology, experiments, writing.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel method (BPaCo) to address the challenges caused by class imbalance in classification of medical images. The method relies on supervised contrastive learning of representations, building upon the work of Zhu et al. 2022, adapting it by introducing across-batch class-averaging, hybrid class-complement, and cross-entropy logit compensation. The paper reports that the proposed method outperforms the SOTA on three long-tailed medical datasets using different imaging modalities: ISIC2018, APTOS2019 and OCTA500.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is built upon the idea of balanced contrastive learning (Zhu et al. 2022), combining a classification branch and a contrastive learning branch. However, the adaptation of the method introduced in this paper is non-trivial as well as significant, and it achieves excellent results in the presented benchmarks. Therefore, I believe that the proposed work is a substantial contribution towards finding the solution to dealing with imbalanced datasets for medical image classification, and that it should be interesting to the MICCAI audience.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Nothing that I can think off.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This is just a minor remark. I suggest using the term “instance” or “example” instead of “sample” when addressing individual cases/data points. A sample (in statistics, primarily) is a set of instances. Moreover, “sample”/”sampling” is also used as a verb. Using the term “sample” for one “instance” makes the text more difficult to follow.

    Also, I suggest using parentheses of varying sizes in Eq.3.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    From my perspective the proposed adaptation of balanced contrastive learning for dealing with highly imbalanced datasets for classification problems is quite impressive – both in terms of ingenuity, and in terms of implementation complexity. Moreover, the presented results suggest that the method outperforms the SOTA. Finally, the overall presentation of the paper is very good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

R1: Thank you for your encouraging comments. We will make the modifications in the final version based on your two suggestions.

R3:

  1. Comparison with some latest works. We appreciate the valuable feedback from the reviewer, which helps further improve our work. However, due to this year’s rebuttal constraints, we cannot provide additional experimental results in the response. We reproduce the 5 different latest works mentioned by the reviewer. Generally, our BPaCo achieves overall better performance than all of them.
  2. Improvement in few classes and more ablation studies. We apologize that due to space limit we were unable to include those detailed experimental results in the submitted version. We shall include those results in our journal version.

R4:

  1. Innovation of our method. The core innovation of our work lies in the across-batch class-averaging strategy, which is crucial for ensuring generalizability. The PaCo/GPaCo methods completely abandon class-averaging, while the BCL method only focuses on class balancing within a single batch. However, for data with a long-tailed distribution, it is common to have tail classes that do not appear at all within some batch during training. In this case, the BCL method essentially achieves class averaging for the Many and Medium classes, but it does not improve the classification performance for tail classes. In contrast, our method fully utilizes information from previous batches, allowing the tail classes to contribute to updating the model’s weights for most of the training time. Furthermore, the core motivation behind our loss formulation is to enhance the representation capability of tail classes and meanwhile to avoid sacrifices in the representation capability of head classes, which is achieved through the diversified representation of class centers. The classifier branch primarily relies on Cross-Entropy for updates, with minimal influence from across-batch class averaging. This ensures strong discriminative representations for class centers of the head classes. Meanwhile, the Contrastive branch utilizes the BPaCo loss, compromising the representations of head classes to improve the representation capability of tail classes. A combination of these two branches leads to stable convergence of our model.
  2. Large-scale foundation models. We greatly appreciate the reviewer’s highly critical and valuable comments. Applying foundation models to long-tail distribution problems is currently a hot yet highly debated topic. Existing algorithms for handling long-tail distribution problems explicitly measure the class distribution of the data and employ various methods to achieve a balance. However, the training of large-scale foundation models mostly relies on self-supervised learning on a significant amount of unlabeled data, which somehow contradicts the approach to addressing the long-tail distribution problems due to the lack of class annotations. From our perspective, foundation models may serve as a better starting point for algorithms handling long-tail distribution problems. Leveraging the discriminative feature representation capabilities of foundation models can enhance the accuracy of class center representations in subsequent long-tail algorithms. Ultimately, these two kinds of approaches can complement each other and improve the performance of downstream long-tail tasks.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers appreciated the clear presentation, the method’s effectiveness, and its contribution to handling imbalanced datasets. However, concerns were raised regarding the lack of comparison with the latest methods, insufficient detailed experimental results, and the need for more comprehensive ablation studies. Two of the experimental datasets are not “long-tail”.

    A few Constructive Suggestions: 1: To improve the paper, consider addressing the following points: First, include comparisons with more recent methods from 2023, such as those presented at NeurIPS, CVPR, and ICCV, to ensure the evaluation is current and comprehensive.

    2: “flexible sampling for long-tailed skin lesion classification” is a very relevant paper, please consider cite this in the camera ready version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers appreciated the clear presentation, the method’s effectiveness, and its contribution to handling imbalanced datasets. However, concerns were raised regarding the lack of comparison with the latest methods, insufficient detailed experimental results, and the need for more comprehensive ablation studies. Two of the experimental datasets are not “long-tail”.

    A few Constructive Suggestions: 1: To improve the paper, consider addressing the following points: First, include comparisons with more recent methods from 2023, such as those presented at NeurIPS, CVPR, and ICCV, to ensure the evaluation is current and comprehensive.

    2: “flexible sampling for long-tailed skin lesion classification” is a very relevant paper, please consider cite this in the camera ready version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper aims to tackle the critical issue of class imbalance in medical image classification. The strengths of the paper are 1) substantial technical contributions; 2) superior performance than SOTA; and 3) clear presentation. The major concern is that some latest works are not included for comparison. The AC finds the strengths outweigh the weaknesses and recommends acceptance.

    The quality of the paper can be further improved by clarifying the differences between the proposed method and existing methods in the final version. To validate the paper’s claims of superior performance, future improvement should include a broader range of the most recent SOTA methods in the quantitative comparison.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper aims to tackle the critical issue of class imbalance in medical image classification. The strengths of the paper are 1) substantial technical contributions; 2) superior performance than SOTA; and 3) clear presentation. The major concern is that some latest works are not included for comparison. The AC finds the strengths outweigh the weaknesses and recommends acceptance.

    The quality of the paper can be further improved by clarifying the differences between the proposed method and existing methods in the final version. To validate the paper’s claims of superior performance, future improvement should include a broader range of the most recent SOTA methods in the quantitative comparison.



back to top