Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Contrastive learning techniques have achieved significant success and have been widely applied in both general and medical domains. However, there is a data difference between the general domain and the medical domain about negative mentions, which almost never appear in general domain but almost always in medical domain. We find that most existing medical contrastive learning methods do not effectively utilize or even overlook the numerous negative mentions present in the data during training, resulting in deficient multimodal feature alignment capabilities. To address this issue, we propose the Visual Entailment Based Contrastive Learning (VECL) method. By introducing a ternary visual entailment contrast relationship of entailment, neutral, and contradiction, our method effectively utilizes both positive and negative mentions for modeling fine-grained sample relationships, enhancing the model’s multimodal feature alignment capabilities. The experiment results show that we achieves SOTA performance on classification, grounding and report generation tasks. Resources are maintained at https://github.com/WVeLong/VECL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3892_paper.pdf

SharedIt Link: https://rdcu.be/eHw6n

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_38

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/WVeLong/VECL

Link to the Dataset(s)

MIMIC-CXR dataset: https://physionet.org/content/mimic-cxr/1.0.0/ Open-I dataset: https://openi.nlm.nih.gov/faq CheXpert dataset: https://stanfordmlgroup.github.io/competitions/chexpert/ ChestXray14 dataset: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/37178474737 ChestXDet10 dataset: https://github.com/Deepwise-AILab/ChestX-Det10-Dataset PadChest dataset: https://bimcv.cipf.es/bimcv-projects/padchest/

BibTex

@InProceedings{WuWei_Medical_MICCAI2025,
        author = { Wu, WeiLong AND Yang, Jingzhi AND Zhu, Xun AND Zhang, Xiao AND Liu, ZiYu AND Li, Miao AND Wu, Ji},
        title = { { Medical Contrastive Learning of Positive and Negative Mentions } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {392 -- 401}
}

Reviews

Review #1

Please describe the contribution of the paper
1. Proposes a novel evaluation method: Introduces the Positive-Negative Contrastive (PNC) evaluation method, which assesses a model’s ability to distinguish between medical images and both positive/negative mentions of disease categories. Unlike prior methods that only evaluate semantic similarity with positive mentions, PNC provides a more comprehensive evaluation by incorporating negative mentions, which are critical in medical reports.
2. Develops a contrastive learning approach: Presents a Visual Entailment-based Contrastive Learning (VECL) method to model entailment, contradiction, and neutral relationships between medical images and radiology reports. This explicitly captures complex semantic interactions, improving the model’s understanding of nuanced medical text-image relationships.
3. Re-emphasizes the importance of PNC metrics: Highlights the significance of PNC evaluation in medical vision-language models, demonstrating through experiments that it complements traditional Positive-Only Similarity (POS) metrics by addressing intra-class similarities (e.g., distinguishing “disease present” vs. “disease absent”) of the same category, which is vital for real-world medical scenarios.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Questionable novelty:
- The PNC evaluation method is not entirely novel, as similar approaches (e.g., CheXZero) already used positive-negative prompt pairs. The paper acknowledges re-emphasizing rather than introducing PNC, which weakens its claim to originality.
- VECL shares conceptual similarities with prior work like MedKLIP, which also modeled entailment/contradiction using triplet supervision, reducing its distinctiveness
1. Evaluation limitations:
- The “zero-shot classification” using PNC is criticized for not being truly zero-shot, as it relies on in-domain labels, creating an unfair comparison with baselines that use out-domain datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Limited Novelty in Evaluation: While PNC is emphasized as important, it was previously introduced in works like CheXZero, and the paper’s contribution is more of a revival rather than a groundbreaking innovation. Although this study asserts that negative mentioning uniquely differentiates medical from general image domains, no supporting evidence is provided.
2. Labeling and Data Reliance: Relies on LLMs to extract labels from reports without verifying their accuracy, raising concerns about the reliability of training and evaluation data.
3. Fairness in Comparison: The zero-shot classification setup may be unfair to baselines, as it uses in-domain labels or specific prompt formats not universally adopted, potentially skewing comparisons.
4. Incomplete Experimental Details: Missing critical information on data cleaning processes, ablation study selection criteria, and clinical-level evaluation, weakening methodological rigor.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper addresses an important problem in medical vision-language pretraining, with strengths in addressing clinical negation and providing comprehensive evaluations. However, its weaknesses significantly impact its suitability for publication
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors propose a novel contrast learning method (VECL) that can be widely applied to classification tasks and report generation tasks. It makes full use of negative sample data, which is in line with practical application scenarios. Also, the experimental dataset achieves SOTA performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The experiments showed that these tasks were completed to a SOTA level in the Fine-Tuning Classification, Zero-Shot Grounding and Retrieval Based Report Generation tasks.
2. The proposed methodology is innovative. The proposed method makes full use of the missing data and has good practical application value.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. There is a lack of additional experiments to demonstrate the effectiveness of the method.
2. In particular, there is a lack of comparative experiments on the effectiveness of the method in other models.
3. The visualisation experiments are not convincing.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a new method for comparative learning in the Fine-Tuning Classification, Zero-Shot Grounding and Retrieval Based Report Generation tasks . The setup of the experiment then focuses on comparing the models built by the method with other models not built by the method. This experimental approach does not effectively reflect the usefulness of the method. Moreover, there are corresponding problems in visualising the experiment.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.
1. it is a novel pre-training architecture that can achieve better results than some other benchmark pre-training architectures.
2. It has some potential and has a lot of scope for downstream tasks.

Review #3

Please describe the contribution of the paper

The paper proposes Visual Entailment Based Contrastive Learning (VECL) for multimodal medical pretraining. It introduces a ternary contrastive relationship (entailment, neutral, contradiction) to leverage both positive and negative mentions in radiology reports, addressing a limitation in prior contrastive learning approaches. A novel 3D-InfoNCE loss is introduced, and the method achieves SOTA results on multiple tasks. The paper proposes Visual Entailment Based Contrastive Learning (VECL) for multimodal medical pretraining. It introduces a ternary contrastive relationship (entailment, neutral, contradiction) to leverage both positive and negative mentions in radiology reports, addressing a limitation in prior contrastive learning approaches. A novel 3D-InfoNCE loss is introduced, and the method achieves SOTA results on multiple tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Clearly identifies and addresses a gap in existing contrastive learning methods.
2. Novel ternary contrast framework with 3D-InfoNCE loss.
3. Strong empirical results across classification, grounding, and report generation.
4. Clear method description using public datasets and pretrained models.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Minor terminology issues (e.g., ‘vclassification’).
2. No justification or sensitivity analysis for batch size (256), important in contrastive settings.
3. Entailment formulation lacks theoretical comparison to alternatives.
4. Limited to chest X-rays; generalizability claims are speculative.
5. No benchmarking against newer non-contrastive or hybrid methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

VECL introduces a ternary visual entailment framework and a new 3D-InfoNCE loss to better align image-report representations. The method outperforms prior approaches on multiple tasks and shows robustness to label noise. However, minor issues (e.g., terminology, batch size justification) lower the score slightly. Still, the work is novel and valuable, meriting a Weak Accept.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely appreciate all reviewers’ constructive feedback.We acknowledge R1’s emphasis on the critical challenge of clinical negation handling and comprehensive evaluation design,and share R2,R3’s enthusiasm about the methodological novelty of VECL,particularly its ternary contrastive paradigm and 3D-InfoNCE loss for modeling complex medical vision-language interactions.Below we provide responses to all technical concerns:

Note:We previously submitted this paper to ICLR and noticed that R1’s comments are almost summarize or rephrased the previous comments and replies from one of the reviewers in ICLR. However,we have addressed most of these issues in the paper submitted to MICCAI,and most of the problems in R1’s comments don’t exist.

To R1:Questionable novelty about PNC.Re-emphasizing the importance of PNC metrics is not our contribution,we didn’t have any acknowledgment of re-emphasizing or introducing PNC and only mentioned the PNC was proposed in CheXzero in the Introduction.It’s nothing to do with novelty.

To R1:Questionable novelty about VECL.In modeling,MedKLIP introduces binary supervision by determining the presence/absence of specific diseases in one report(uncertain are discarded during training)and optimizes by CE loss,whereas VECL employs ternary supervision of “entailment/neutral/contradiction” by assessing visual entailment relationships between any image report pairs and optimizes by 3D-InfoNCE loss.

To R1:PNC is not truly zero-shot classification metrics.POS is a widely-used zero-shot evaluation method.Compared with POS,PNC does not introduce any more additional information but negation words.

To R1:Lack of verification on the accuracy of LLM labels.Result is in Ablation Study,indicating our method has robustness to label noise and effectively learning abilities.

To R1:Fairness in comparison.The baselines,including MedCLIP,MedKLIP,and KAD,all introduce in-domain labels or medical prior knowledge.There is no unfair comparison.

To R2:Analysis for batch size.Generally for contrastive learning,the larger batch size,the better.256 is the maximum that can fit on a single A800,so we didn’t do sensitivity analysis.

To R2:Entailment formulation lacks theoretical comparison.Conventional visual entailment methods aim to ensure that a sample’s correct visual entailment probability exceeds its incorrect ones.While VECL additionally enforces that a positive sample’s visual entailment probability surpasses the probabilities of other negative samples.The integration of visual entailment into the contrastive learning framework constitutes one of the key contributions of this paper.

To R2:Generalizability claims are speculative.LLM’s analysis capability is exceptional to process various types of report data.Experiments show that VECL exhibits strong label noise robustness. And the label extraction process and framework are image modality agnostic.So we are confident that the generalizability is true.

To R2,R3:Lack of additional comparative experiments.Our work focuses specifically on medical contrastive learning pre-training methods and compares our approach against other representative methods within this category.Comparisons with non-contrastive or hybrid pre-training approaches,while potentially interesting,are beyond the current scope,which aims to isolate and study contrastive mechanisms in medical settings;Regarding R3’s comment,we would like to clarify that our method is designed as a complete pre-training framework rather than a plug-and-play module.So “comparative experiments on the effectiveness of the method in other models” would not be applicable.

To R1,R2,R3:Other details.We conducted ablation studies on all the modified parts in the framework;The type of experiment is comparison on public datasets,without clinical evaluation conditions,which is consistent with baseline.

Commitment: All experiment and visualization results are reproducible.Code and models will be available upon acceptance.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Medical Contrastive Learning of Positive and Negative Mentions

Author(s):