Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multi-label classification (MLC) in medical image analysis presents significant challenges due to long-tailed class distribution and disease co-occurrence. While contrastive learning (CL) has emerged as a promising solution, recent studies primarily focus on defining positive samples, overlooking the low gradient problem associated with single-disease representation and the impact of co-occurring diseases. To address these issues, we propose ws-MulSupCon, a novel weighted stratification method in CL for MLC. Our gradient analysis indicates that separating the single-disease cases can amplify their gradient contributions. Accordingly, we stratify training samples into single- and multi-disease cases to enhance the representation learning of each disease. Moreover, we design a weighted loss function based on class frequency and disease comorbidity, mitigating the dominance of prevalent diseases and improving rare disease detection. To further discriminate between the healthy and diseased samples, a dedicated CL for healthy cases is introduced, improving overall classification performance and preventing false positives. Extensive experiments on NIH ChestXRay14 and MIMIC-CXR demonstrate that ws-MulSupCon outperforms SoTA methods across nearly all disease classes, showing its superiority and the effectiveness of learning long-tailed distribution in multi-label medical image classification. The code is available at https://github.com/xup6YJ/ws-MulSupCon.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1834_paper.pdf

SharedIt Link: https://rdcu.be/eHxa4

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_65

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/xup6YJ/ws-MulSupCon

Link to the Dataset(s)

NIH ChestXRay14: https://paperswithcode.com/dataset/chestx-ray14 MIMIC: https://www.nature.com/articles/s41597-019-0322-0

BibTex

@InProceedings{LinYin_Weighted_MICCAI2025,
        author = { Lin, Ying-Chih AND Chen, Yong-Sheng},
        title = { { Weighted Stratification in Multi-Label Contrastive Learning for Long-Tailed Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {677 -- 687}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposea a contrastive learning framework for multi-label classification in medical imaging. It addresses key challenges like long-tailed data distribution, disease co-occurrence, and false positives in multi-label classification tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper analyzes the gradients in CL for multi-label scenarios and highlights that single-disease cases contribute weaker gradients when averaged with multi-disease cases.
2. A separate CL module for healthy samples improves the model’s ability to distinguish between healthy and diseased cases, reducing false positives.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. It is wrong to randomly separate datasets because of the overlap in samples.
2. Although there are many specific designs for MLC, the proposed method actually performs worse than many SOTA methods.
3. Why not use CheXpert as a dataset?
4. The hyperparameter lambda should be discussed。
5. What is the difference between the method proposed in this paper and weighted binary cross entropy?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

T
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The writing of this paper is good but its novelty is marginal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

I recommend rejection due to insufficient comparison and discussion.

Review #2

Please describe the contribution of the paper
1. Theoretical analysis reveals that single-disease samples suffer from diluted gradients in existing multi-label contrastive learning (e.g., MulSupCon). By stratifying samples into single- and multi-disease groups, the method amplifies gradient contributions from single-disease cases.
2. Introduces two weighting strategies: Prioritizes rare classes based on their occurrence frequency. Mitigates dominance of high-comorbidity diseases via inverse mean comorbidity scores.
3. Extensive experiments on NIH ChestXRay14 and MIMIC-CXR demonstrate state-of-the-art performance (e.g., 81.99% mAUC on CXR-14), particularly for rare classes in long-tailed distributions.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. First work to decouple single- and multi-disease samples in medical MLC, addressing gradient dilution (MulSupCon and Sim-Diss overlook this).
2. Benchmarked on 14-class NIH and 13-class MIMIC datasets with 7 metrics (mAUC, macro/micro F1).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Stratification increases computational overhead (indexing single/multi-disease groups).
2. Basic augmentations (flipping, rotation) are used, while advanced medical-specific techniques are unexplored.
3. Missing comparisons with state-of-the-art long-tail methods weaken claims on superiority.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper addresses core challenges in medical MLC (long-tail, comorbidity) with innovative solutions. Stratified contrast and weighted losses are clinically meaningful, supported by thorough experiments (e.g., +1.58% mAUC). While complexity and partial benchmarking are minor flaws, the contributions significantly advance the field.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper proposes a method called ws-MulSupCon, which extends supervised contrastive learning to better address the challenges of multi-label classification in medical imaging, particularly long-tailed class distributions.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The authors identify a key limitation in the application of contrastive learning to multi-label classification under long-tailed class distributions—namely, the low gradient issue and the underutilization of disease co-occurrence.

2) They propose a novel and effective solution to address this, which yields consistent performance improvements (1–2%) over existing methods, demonstrating both practical impact and conceptual clarity.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1) The evaluation could be further strengthened by including additional datasets to better demonstrate the robustness and generalizability of the proposed method. For instance, incorporating datasets such as ChestMNIST, or even synthetic non-biomedical datasets, might help validate its applicability across different domains and settings.

2) It would also be helpful to compare the proposed approach on a balanced multi-label dataset (i.e., one without a long-tailed distribution). This could help isolate and clarify the effectiveness of the proposed loss function specifically in multi-label scenarios, beyond its performance under class imbalance.

3) While Figures 1 and 2 are intuitive and help convey the motivation, further empirical evidence would strengthen the claim regarding the low gradient problem in single-disease representation. For example, it might be valuable to quantitatively analyze the average gradient norms for single- versus multi- label samples (perhaps within the MulSupCon framework). This could provide direct support for the core hypothesis and enhance the reader’s understanding of why the problem is important and should be addressed.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

1) The proposed method underperforms the TWML baseline in terms of micro-F1 and recall—metrics that are especially important in clinical settings, where false negatives can be more critical than false positives. While this appears to be a broader limitation shared by many contrastive learning-based methods, it highlights a valuable direction for future work to improve clinical applicability.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a clear and relevant problem, and offers a well-motivated solution to address it. The challenge of multi-label classification under long-tailed class distributions can be relevant in many biomedical datasets.
Reviewer confidence

Not confident (1)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

Valuable comments and suggestions given by the reviewers are highly appreciated. Major raised concerns and our responses are listed below. Reproducibility of the paper (R1, R2, R3): We will release our source codes upon acceptance of our paper. Novelty (R1): The proposed ws-MulSupCon advances contrastive learning (CL) by jointly considering disease prevalence (single-label cases) and co-occurrence (multi-label cases). Our stratified design enhances the gradient of single-disease representations and facilitates effective learning of complex multi-disease representations. These new designs make our work a novel contribution to multi-label medical image classification. Data splitting (R1): Regarding the data splitting concern, we randomly split the data by patient to ensure no sample overlap across the training, validation, and test sets. These splits were fixed and consistently applied in all experiments. We will clarify this issue in the camera-ready version. Worse than other SOTA (R1): We appreciate the reviewer’s comment and would like to clarify a potential misunderstanding. We implemented both loss function-based and CL pretraining-based SOTA methods using identical settings and data splits for fair comparison. As shown in Tables 1–3, ws-MulSupCon outperforms prior methods across most metrics, including class-wise AUC. While Two-Way Multi-Label Loss achieves higher recall and F1, ws-MulSupCon offers better balanced performance with superior mAUC. Discussion of λ (R1): We did not fully discuss λ in the paper due to space constraints. This hyperparameter balances the CL loss between diseased and healthy samples, enhancing the separation of normal and abnormal representations. As shown in Table 4, introducing the healthy case CL guided by λ leads to consistent improvements in macro-level metrics, underscoring its role in improving the classification of minority classes. The optimal λ was selected through a comprehensive search over [0,1], with stable performance within ±0.5. Extra dataset (R1, R3): We did conduct experiments on the CheXpert dataset. However, due to space limits, we prioritized reporting class-wise AUC on the current two datasets to better highlight the effectiveness of ws-MulSupCon in addressing long-tailed distributions. Obeying the rebuttal guidelines, we cannot show the new results in the rebuttal, but we will do it in the final paper. Difference from weighted binary cross entropy (BCE) (R1): While weighted BCE mitigates class imbalance by reweighting the loss in the output space, ws-MulSupCon addresses this problem at the representation level through a weighted stratified CL design. Rather than simply rebalancing the entropy loss, ws-MulSupCon enables the model to discriminate disease-specific representations, leading to more robust classification performance. Computation (R2): We appreciate the insightful comments and the recognition of the potential influence of our paper. Though our method introduces a slight increase in GPU memory and training time per epoch compared to MulSupCon, it delivers substantial performance gains. We believe this trade-off is valuable for the clinical benefits and will continue to optimize efficiency. Augmentations (R2): Though our method achieves SOTA performance with standard augmentations, the suggestion to utilize medical-specific augmentations provides further insights to improve our work. Comparison with SOTA long-tail methods and balanced dataset (R2, R3): In response to R2, while our initial experiments focus on loss and CL pretraining-based methods, we recognize that benchmarking against recent long-tail methods would enhance the study. For R3, our paper targets real-world imbalance, which reflects practical clinical scenarios. Quantitative analysis of gradient (R3): We appreciate the reviewer’s suggestion. While our results already show the benefits of emphasizing single-disease gradients, a quantitative analysis would add rigor and make the results more robust.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The rebuttal address the issues from reviewers from some extent, but the answers to some questions raised by reviewers are not convincing. The improvement is marginal.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper proposed a solid method to address long-tailed classification problem. The improvements are solid, and the analysis of gradients are clear. In the rebuttal, the authors successfully address the Reviewer 1’s comments.

back to top

Weighted Stratification in Multi-Label Contrastive Learning for Long-Tailed Medical Image Classification

Author(s):