List of Papers Browse by Subject Areas Author List
Abstract
Medical Visual Question Answering (Med-VQA) aims to assist in clinical diagnosis, but still faces challenges with language bias. Current approaches oversimplify the causal relationship between clinical terms and answers by treating it as a binary positive/negative effect. This can lead to the persistence of bias or reduced sensitivity to questions. To address this limitation, we propose a novel approach named DeCoCT (Debiasing Med-VQA via Counterfactual Contrastive Training). We decompose the causal relationship between clinical terms and answers into two components: (1) concept localization in medical images, and (2) prior knowledge from training data. We introduce a Key Region Capture Module (KRCM), trained with counterfactual strategies. It can enhance the model’s ability to capture critical information through clinical terms. Furthermore, we employ counterfactual contrastive training to eliminate spurious correlations introduced by clinical terms while enhancing the model’s focus on relevant visual regions. In addition, we construct a new conditional prior dataset based on VQA-RAD, named VQA-RAD-CP. Extensive experiments demonstrate that our approach significantly mitigates language bias in Med-VQA. Our codes and VQA-RAD-CP dataset are available at https://github.com/YX542/DeCoCT.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2827_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/YX542/DeCoCT
Link to the Dataset(s)
VQA-RAD: https://osf.io/89kps/
SLAKE: https://www.med-vqa.com/slake/
SLAKE-CP: https://github.com/miccai20231/SLAKE-CP
BibTex
@InProceedings{WanXin_Eliminating_MICCAI2025,
author = { Wan, Xingyu and Teng, Qiaoying and Chen, Jun and Lu, Yonghan and Yuan, Deqi and Liu, Zhe},
title = { { Eliminating Language Bias for Medical Visual Question Answering with Counterfactual Contrastive Training } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15965},
month = {September},
page = {196 -- 206}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes DeCoCT, a novel framework for debiasing Medical Visual Question Answering (Med-VQA) . The framework includes a Key Region Capture Module (KRCM) and a counterfactual contrastive training strategy. It also introduces a new bias-sensitive dataset, VQA-RAD-CP, enabling robust evaluation of bias mitigation methods. The method achieves state-of-the-art performance on multiple Med-VQA benchmarks.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A new bias-sensitive Med-VQA dataset was constructed to specifically target and evaluate language bias in medical visual question answering tasks.
- A novel training framework was proposed to effectively mitigate language bias in Med-VQA, incorporating counterfactual reasoning and contrastive learning strategies.
- Extensive comparative experiments were conducted, demonstrating the effectiveness of the proposed method with superior performance across multiple benchmarks.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1.The discussion of related work on language bias is relatively limited, despite the fact that many studies have already addressed language bias in VQA. 2.The method masks clinical terms to generate counterfactuals; however, the paper does not adequately discuss the potential risk that such masking may distort the original semantics of the question, which could affect model training. 3.Although the method achieves strong results on specific datasets, it lacks discussion on its potential limitations when applied to other types of Med-VQA datasets or clinical scenarios beyond radiology.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method introduces a reconstructed Med-VQA dataset and a novel training framework to address the problem of language bias. However, a potential concern lies in the data dependency of the approach—specifically, its generalizability may be limited by the linguistic characteristics of the dataset it was trained on.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The main contribution of the paper is the introduction of a new method called DeCoCT to reduce language bias in Med-VQA. Results in comparison to selected existing models are shown for comparison.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strength is the development of a new counterfactual dataset for evaluation of other medical VQA models along with an approach to train models.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper does not clear describe if there is a test data leakage and the construction of VQA-RAD-CP. Since constructing VQA-RAD-CP involves combining the test and train data into a into a single set, downstream training appears to leak the test data into training, making the results on VQA-RAD questionable.
-
The method’s effectiveness hinges on accurately identifying “clinical terms” using scispacy and expert review. This process might not capture all relevant nuances or could be prone to errors, impacting downstream performance.
-
The details are for “radiology review” are not clear. The authors should state if they are board certified, how may years of experience, how many agreements/disagreements were observed, how they were trained for the task. This important details are missing the paper.
-
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The authors mention and use scispacy but lack the reference:
@inproceedings{neumann-etal-2019-scispacy, title = “{S}cispa{C}y: {F}ast and {R}obust {M}odels for {B}iomedical {N}atural {L}anguage {P}rocessing”, author = “Neumann, Mark and King, Daniel and Beltagy, Iz and Ammar, Waleed”, booktitle = “Proceedings of the 18th BioNLP Workshop and Shared Task”, month = aug, year = “2019”, address = “Florence, Italy”, publisher = “Association for Computational Linguistics”, url = “https://www.aclweb.org/anthology/W19-5034”, doi = “10.18653/v1/W19-5034”, pages = “319–327”, eprint = {arXiv:1902.07669}, }
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper provides a new dataset allowing for the development and testing of medical VQA 2D data models. The paper introduces an approach that provides superior results to those compared against.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
This is a weak accept but given the choice - accept after the promised changes.
Review #3
- Please describe the contribution of the paper
The paper introduces DeCoCT, a method designed to mitigate language bias in Medical Visual Question Answering (Med-VQA) by leveraging counterfactual data and contrastive training. DeCoCT builds directly on DeBCF (Zhan et al. MICCAI 2023), but argues that total removal hurts sensitivity and proposes causal decomposition + visual guidance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) A dual causal interpretation (image localization + prior bias).
2) A CLIP-based Key Region Capture Module (KRCM) guided by masked clinical terms.
3) A counterfactual contrastive loss that balances removing bias and retaining visual relevance.
4) A new evaluation dataset (VQA-RAD-CP) constructed using conditional priors.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1)The authors mention the use of scispaCy for clinical term extraction but provide no further explanation or reference. If this omission is due to space constraints, it is important to include a proper citation so that interested readers can refer to the original source for more details.
2) In the Implementation Details section, the authors state, “In Equation 3, the parameters λ and γ are set to 1 and 0.5 respectively.” However, these parameters are not part of Equation 3, Instead, they are introduced in Equation 6. this should be corrected to avoid confusion.
3) It would be beneficial for the community if the authors released both the VQA-RAD-CP benchmark and the counterfactual training dataset used in their experiments.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
the work is clear, impactful, and offers both conceptual clarity and practical utility for the medical AI community.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
My concerns were addressed
Author Feedback
We sincerely appreciate the reviewers’ detailed feedback and suggestions for our work. It is recognized as “a novel training framework to address the problem of language bias” (R1) and “clear, impactful, and offers both conceptual clarity and practical utility for the medical AI community” (R2). All reviewers acknowledge the value of the VQA-RAD-CP dataset that we constructed for evaluating language bias in Med-VQA. In response to R2’s recommendation, we will make all relevant data and code publicly available upon acceptance. Additionally, DeCoCT demonstrates superior performance across multiple Med-VQA benchmarks, with R1 emphasizing “Extensive comparative experiments”.
Related Work R1 mentioned that “the discussion of related work on language bias is relatively limited.” This is due to space limitations, as we mainly focused on current Med-VQA related research. However, this may not be convenient for readers to understand the related work. We will add more citations on related studies in the final version and conduct a more detailed analysis of the related work in our future work.
Method We appreciate R1’s concern about potential semantic distortion from masking clinical terms. Our method builds on CSST and DeBCF (as cited in our paper [5] and [25]). As shown in CSST, training with masked critical words is proven to enhance robustness against linguistic variations (via CS(k) metric improvement) without semantic distortion. Furthermore, the RoBERTa encoder we used inherits BERT’s bidirectional masking pretraining [2], which significantly improves tasks like SQuAD and GLUE, ensuring robust semantic representation. Therefore, our method reduces over-reliance on clinical terms without distorting the original semantics of the question. R1 noted that our work lacks discussion on potential limitations when applied to other Med-VQA datasets or clinical scenarios beyond radiology. Our current focus is on basic experiments on the VQA-RAD and Slake. We believe our method is not strongly data-dependent because it does not rely on specific radiological features. We will expand our experiments to other datasets in the journal version.
Data and Experiment R3 raised two concerns regarding the accurate identification of clinical terms and the qualifications of the experts. Firstly, ScispaCy [1] achieves 97.4% accuracy in PubMed abstract sentence segmentation, with a 2% error rate in complex sentences. Additionally, the three radiology experts have 15, 17, and 30 years of clinical experience, respectively. Given these strengths, we believe the accuracy of clinical term extraction is well ensured. In fact, the results from ScispaCy and expert review are highly consistent, with only a 1.28% discrepancy. Moreover, the expert review process is as follows: Two experts independently review ScispaCy’s extraction results, and any disagreements (2.17%) are resolved by the third expert, who has 30 years of experience. More details about the “radiology review” will be included in the final version. R3 also raised concerns about potential test data leakage, which may reduce the credibility of our results. In our experiments, separate models with distinct weights were trained and tested on each dataset, preventing test data from influencing the training of other models. Thus, no cross-dataset data leakage occurred. We will include more experimental details in the final version. Finally, the construction of VQA-RAD-CP is detailed in Section 3.1.
Others We apologize for the lack of citation regarding the use of ScispaCy [1], as pointed out by R2 and R3. We also appreciate R2 pointing out a citation error in Section 3.2. The relevant citation and error will be corrected in the final version.
References [1] Neumann, Mark, et al. “ScispaCy: fast and robust models for biomedical natural language processing.” arXiv:1902.07669 (2019). [2] Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” NAACL-HLT2019.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
Authors should address two critical concerns raised by the reviewers: 1. potential leak of test data into training set. 2. clarity on potential risk associated with making clinical terms during counterfactual generation.
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A