Abstract

Chest radiography (CXR) plays a crucial role in the diagnosis of various diseases. However, the inherent class imbalance in the distribution of clinical findings presents a significant challenge for current self-supervised deep learning models. These models often fail to accurately classify long-tailed classes. Current Vision-Language models such as Contrastive Language Image Pre-training (CLIP) models effectively model the manifold distribution of the latent space, enabling high zero-shot classification accuracies. Although CLIP performs well on most of the primary classes in the dataset, our work reveals that its effectiveness decreases significantly for classes with a long-tailed distribution. Our approach employs a class-weighting mechanism that directly aligns with the distribution of classes within the latent space. This method ensures a substantial improvement in overall classification performance, with particular emphasis on enhancing the recognition and accuracy of rarely observed classes. We accomplish this by applying Gaussian Mixture Model (GMM) clustering to the latent space. The subsequent clusters are further refined by Student t-distribution, followed by a metric loss that utilizes the altered embeddings. Our approach facilitates stable and adaptive clustering of the features. This results in a notable average improvement of 7\% points in zero-shot AUC scores across 40 classes in the MIMIC-CXR-JPG dataset from previous SOTA models.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4154_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MadRaj_CXRCML_MICCAI2025,
        author = { Madhipati, Rajesh and Bhat, Sheethal and Buess, Lukas and Maier, Andreas},
        title = { { CXR-CML: Improved zero-shot classification of long-tailed multi-label diseases in Chest X-Rays } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a method to address class imbalance to increase diagnostic performance in chest radiography. They introduce a Gaussian Mixture Model refined with a Student’s t-distribution to cluster CLIP-extracted visual embeddings, capturing the heavy-tailed nature of underrepresented disease classes. Further, they apply triplet loss to improve intra-class compactness while using contrastive loss with pseudo-generated text labels to enhance inter-class compactness. Evaluations on the MIMIC-CXR-JPG dataset, compared with other methods, demonstrate a 7% improvement in macro AUC along with enhanced performance on rare classes.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s novel contribution is in combining a Gaussian Mixture Model refined with a Student’s t-distribution and triplet loss. Before these components have been used separately. The authors not only boost classification performance but also improve the quality of the embedding space. Their method enhances both latent feature clustering and zero-shot classification, achieving about a 9% improvement for rare classes, while also benefiting the base classes. They support their findings with 5-fold cross-validation and report statistically significant results (p-value), along with ablation studies to fine-tune hyperparameters like degrees of freedom for t-distribution and batch size showing that their approach is robust compared to the baseline. The paper is well-written with clear explanations of technical terms, and the open availability of the code further.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    One major weakness is the limited ablation study; for example, it would be valuable to see additional experiments on the weighting of losses to verify whether the triplet loss truly drives improvements (and t-distribution). More analysis comparing the most represented rare classes versus the least represented ones, with a more granular breakdown in the results table, could also strengthen the evaluation. The study is conducted only on the MIMIC-CXR-JPG dataset, with no results reported on other datasets (such as PadChest), and there is no comparison with alternative approaches like class-balanced loss functions or re-sampling strategies. Moreover, while AUC is a useful metric, additional metrics such as MCC, F1 score, Precision/Recall and confidence intervals would provide a more detailed view of performance. The approach also incurs a noticeable computational cost, a 20.8% increase in FLOPs and a 24.15% increase in training time per step, which needs further discussion, including potential optimizations. More clarification on the rationale behind the chosen number of degrees of freedom. Finally, the paper could benefit from exploring t-SNE visualizations in more depth, particularly for the rare classes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Recommendation: Weak Accept.

    The paper shows a clear technical innovation by using a refined Gaussian Mixture Model with a Student’s t-distribution, triplet loss, and pseudo-generated text labels to boost diagnostic performance. The cross-validation results on the MIMIC-CXR-JPG dataset clearly demonstrate this method’s potential.

    On the other hand, there are some limitations. The ablation study is quite narrow, the evaluation is done on just one dataset, and more performance metrics would have been helpful. Considering these issues in future work could make the method even more robust and widely applicable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper advances zero-shot multi-label classification in chest X-ray (CXR) analysis through the novel CXR-CML method. The authors model the latent distribution manifold using a Gaussian Mixture Model (GMM), which is further refined with a Student-t distribution. Their approach significantly improves the performance of 12 long-tailed and 28 base classes in zero-shot CXR classification.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose an innovative approach using cosine similarity loss to align images with their corresponding text captions in a shared latent space. A notable aspect of CXR-CML is the application of a Gaussian Mixture Model (GMM) to the visual-language embeddings extracted by CLIP, enhancing the alignment between modalities. Another strength is the use of a non-traditional GMM with a t-distribution, which is better suited to capture the long-tailed nature of medical data. The inclusion of triplet loss, mathematically defined to address this specific case, further strengthens the methodology. The use of a large-scale dataset (~200K images across 39 disease classes) provides a comprehensive evaluation, and the proposed method demonstrates significant improvements over baseline and other visual-language (VL) methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper could benefit from the inclusion of additional evaluation metrics beyond AUC to provide a more comprehensive assessment of the model’s performance. Additionally, it would be helpful to include visual examples of correctly classified labels across different CXR categories (common, medium, and rare) to better illustrate the model’s effectiveness.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a novel and well-motivated approach to zero-shot multi-label classification in chest X-rays through the proposed CXR-CML framework. By integrating CLIP-based visual-language embeddings with a GMM refined by a Student-t distribution, the authors effectively address the long-tailed nature of medical data. The result show clear improvements over strong baselines. Despite minor limitations, such as limited evaluation metrics and lack of visual results, the overall contribution is significant, making this paper worthy of acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces CXR-CML, a new approach for improving zero-shot classification of diseases in Chest X-rays, especially for rare conditions. It tackles the problem of class imbalance using a GMM to understand the latent space of the data better, then refines it using a Student’s t-distribution. This helps the model perform better at identifying both common and rare diseases, with noticeable improvements in AUC scores.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    a) Innovative approach combining GMM with the Student’s t-distribution to address class imbalance in medical imaging, particularly for rare diseases. b) Five-fold cross-validation on a large dataset shows strong performance in zero-shot classification, especially for rare conditions, representing a major breakthrough. c) 7% improvement in AUC over previous methods, highlighting its superiority. d) The focus on a multi-label dataset with both common and rare diseases makes the approach highly relevant for real-world clinical use, where chest X-rays often show multiple pathologies.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    a) The method has higher computational costs, which could be an issue in resource-limited environments. b) It also depends on GMM to model data distribution, which may not work well for all datasets. c) More details on training and hyperparameters, like how the GMM is set up or tuned, would help others replicate the results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Development of the proposed method combining GMM and Student’s t-distribution offers an interesting solution to the problem of long-tailed data in medical images. The strong experimental results, particularly the significant improvement in classification performance for rare diseases, make the paper convincing. However, the computational overhead and reliance on GMM are the drawbacks, and if these issues can be addressed, the method has potential for use in real-world medical AI systems.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their thorough feedback and positive evaluation of our paper. Evaluation of additional datasets will be conducted in the future upon availability of long-tailed class annotations. We also plan additional evaluations of generalizability on non-medical image datasets with long-tail class distributions, including extended set of classification metrics.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper presents CXR-CML, a novel method to improve zero-shot multi-label disease classification in chest X-rays by addressing long-tailed class distributions. The authors combine CLIP-based vision–language embeddings with a Gaussian Mixture Model refined by a Student’s t-distribution and incorporate both triplet and contrastive losses to improve intra-class compactness and inter-class separability. Evaluated on the large-scale MIMIC-CXR-JPG dataset, the method achieves clear improvements in macro AUC—particularly for rare classes—over state-of-the-art baselines.

    The reviewers commended the paper for its innovative formulation and experimental validation, noting the clear benefit of the proposed modifications for zero-shot learning in clinical imaging. Some limitations were noted, including the lack of evaluation on additional datasets and increased computational cost. However, these do not undermine the overall strength of the work, which demonstrates both conceptual novelty and clinical impact.



back to top