Abstract

Medical vision-language models (VLMs) have demonstrated unprecedented transfer capabilities and are being increasingly adopted for data-efficient image classification. Despite its growing popularity, its reliability aspect remains largely unexplored. This work explores the split conformal prediction (SCP) framework to provide trustworthiness guarantees when transferring such models based on a small labeled calibration set. Despite its potential, the generalist nature of the VLMs’ pre-training could negatively affect the properties of the predicted conformal sets for specific tasks. While common practice in transfer learning for discriminative purposes involves an adaptation stage, we observe that deploying such a solution for conformal purposes is suboptimal since adapting the model using the available calibration data breaks the rigid exchangeability assumptions for test data in SCP. To address this issue, we propose transductive split conformal adaptation (SCA-T), a novel pipeline for transfer learning on conformal scenarios, which performs an unsupervised transductive adaptation jointly on calibration and test data. We present comprehensive experiments utilizing medical VLMs across various image modalities, transfer tasks, and non-conformity scores. Our framework offers consistent gains in efficiency and conditional coverage compared to SCP, maintaining the same empirical guarantees. The code is publicly available: https://github.com/jusiro/SCA-T .

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4783_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/jusiro/SCA-T

Link to the Dataset(s)

Links are available at: https://github.com/jusiro/SCA-T

BibTex

@InProceedings{SilJul_Trustworthy_MICCAI2025,
        author = { Silva-Rodríguez, Julio and Ben Ayed, Ismail and Dolz, Jose},
        title = { { Trustworthy Few-Shot Transfer of Medical VLMs through Split Conformal Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a transductive split conformal prediction (SCP-T) framework for medical vision-language models, enabling reliable uncertainty estimates with formal coverage guarantees in few-shot settings. To support this, the authors propose TIMKL, a transductive solver that preserves exchangeability by jointly adapting on calibration and test data without labels, improving both prediction efficiency and conditional coverage.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The paper tackles an important and timely challenge regarding quantifying uncertainty and providing reliable confidence estimates for medical vision-language models. 2) The experimental evaluation is thorough, with performance assessed across nine diverse medical datasets and 100 random seed trials, demonstrating the robustness of the findings. 3) The manuscript is clearly written and well-structured

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) The proposed transductive solver appears to be a marginal extension of TIM [4], differing only in its replacement of the Shannon entropy term with a Kullback–Leibler divergence. A clearer discussion of the novelty and motivation behind this modification would strengthen the contribution. 2) An ablation study on the sensitivity to hyperparameters would be valuable. Does the method require a separate validation set to select optimal hyperparameters, and if so, how is this reconciled with the transductive setting? 3) The paper evaluates “Adapt + SCP” using only a linear probe for adaptation, overlooking test-time adaptation methods. Would the conclusions about exchangeability still hold under more advanced adaptation strategies? 4) The proposed transductive approach assumes access to the entire test set during adaptation. How would the method generalize to real-world clinical scenarios where test data arrives sequentially and predictions must be made in real time?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    In equation (5), the summing index for the second summation operator in the sample-conditional entropy should “c” not “k”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper addresses an important and timely problem in the medical vision-language domain, its methodological contribution feels incremental, lacking a compelling justification for the proposed modification over existing approaches. Additionally, there are practical concerns around the method’s applicability in real-world, sequential prediction settings and unanswered questions around hyperparameter sensitivity.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have adequately addressed my main concerns, including clarifying the novelty of their method, its applicability in sequential settings, and the rationale behind their design choices. While some limitations remain, I believe the paper makes a meaningful contribution.



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel transfer learning pipeline for conformal prediction, termed Transductive Split Conformal Adaptation (SCP-T). The method performs unsupervised transductive adaptation using both calibration and test data jointly. Experiments across three tasks demonstrate that SCP-T consistently outperforms the standard Split Conformal Prediction (SCP) approach.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The problem addressed in this paper is highly relevant and interesting, and the proposed solution is generally well-suited to medical scenarios.

    2. The experimental evaluation is comprehensive, covering diverse settings and providing strong empirical support for the method.

    3. The paper is well-written, and the methodology is clearly presented and easy to follow.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. From my understanding, this paper explores a novel setting—conformal prediction—as an alternative to conventional classification. The primary contribution appears to be the development of a transductive solver tailored for imbalanced class scenarios. However, the idea of transductive information maximization for few-shot learning has been previously introduced in [1]. I recommend that the authors clearly articulate the unique technical innovations of their method in comparison to prior work.

    2. Given that the proposed solver, $TIM_{KL}$​, is specifically designed to handle class imbalance, it would be beneficial to evaluate its effectiveness on explicitly imbalanced datasets. If the results in Table 1 already reflect such settings, I encourage the authors to provide additional details on the class distribution for each task to contextualize the performance gains.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main strength of the paper lies in presenting a novel perspective on applying conformal prediction to few-shot adaptation in medical scenarios. The proposed approach is conceptually sound, and the manuscript is well-written and of high quality. However, the primary concern pertains to the lack of clarity regarding the paper’s unique technical contribution beyond existing methods.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    According to the authors’ rebuttal, I think this paper is somewhat interesting and should be valuable for the medical imaging community. Thus, I recommend to accept this paper.



Review #3

  • Please describe the contribution of the paper

    The authors propose transductive split conformal adaptation for transfer learning on conformal scenarios.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is novel. The paper is very well written with thorough experiments.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    None, given the page limit.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper provides good theoretical justification with good paper presentation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their insightful and detailed comments. We appreciate that they recognized the relevance of the problem addressed (R2,R3), which is a timely (R3), well-suited solution to medical scenarios (R2). They recognized the novelty of the proposed setting (R2) and method (R1), with a theoretical justification appropriate (R1), and conceptually sound (R2). Also, reviewers acknowledged our thorough experimentation (R1,R2,R3), robust findings (R3), and empirical support (R2), with a clearly written and structured manuscript (R1,R2,R3).

  1. General comment (R2,R3): novelty of the transductive solver.
    • Scope: First, we stress that our main contribution is SCP-T, a novel setting for enhancing VLMs adaptation while providing coverage guarantees through conformal prediction (CP), a non-trivial and unexplored scenario for trustworthy AI in medical image analysis, as widely acknowledged in the initial reviews. Also, we provide a comprehensive benchmark on the increasingly popular medical VLMs deployment through CP, covering multiple tasks/modalities, which is by itself a valuable contribution.
    • Transductive solver: Our unsupervised, transductive transfer learning solver builds upon well-established knowledge in information maximization (IM). Even though TIM [4] introduced IM for (supervised) few-shot transductive learning, IM is a general framework which had enjoyed widespread adoption across various problems prior to TIM. Accordingly, novelty lies in leveraging IM as a learning objective adapted to specific problems. Thus, our solver is a novel extension of IM tailored to the proposed SCP-T, which better leverages the available label-proportion priors estimated from calibration data, concretely, using a KL divergence to regularize the expected label-marginal distribution. Such design choices are crucial for properly deploying SCP-T (Table 2), since medical image datasets are naturally imbalanced. In contrast, Shannon entropy maximization in [4] produces solutions biased towards a uniform distribution.
  2. Specific comments:
    • R2: Class imbalance. Indeed, most datasets employed for evaluation are imbalanced, e.g., NCT-CRC, SICAPv2, SkinCancer, MESSIDOR, MMAC, and NIH-LT. Hence, Tables 1,2 results reflect the robustness to such a characteristic. While adding more specific per-dataset analysis is infeasible due to length constraints (and no appendices), we will highlight additional details in the dataset’s description.
    • R3: Incoming sequential data. This scenario was explicitly evaluated through experiments in Fig. 2 (b): test data is received in incoming batches, and SCP-T is adjusted from scratch for each batch. Conformal metrics are computed using the whole test dataset predictions for fair comparisons. Note that no significant drops in set efficiency or conditional coverage were observed, which suggests that our setting is robust to such challenging scenarios, meeting the demands of real-world sequential settings.
    • R3: Test-time adaptation. First, note that exchangeability depends on cal/test data distributions and how they adjust the predicted scores and the conformal predictor, not necessarily on the solver’s specific complexity. Second, CP is an efficient framework typically deployed in black-box scenarios, i.e., exclusively accessing the model outputs. In contrast, TTA baselines, e.g., TENT or WATT, require access to internal model weights and are highly computationally expensive. Hence, we focused on black-box transductive baselines, where TransCLIP (NeurIPS’24) is the current SoTA for VLMs, which we outperform both in terms of accuracy and coverage.
    • R3: Hyperparameters. To suit real-world scenarios and practical constraints, we keep the hyperparameters for all tasks fixed (stated in Sec 4.1), showing the robustness of our method across tasks. Hence, our metrics have already assessed the robustness regarding hyperparameter selection. Again, we could not incorporate ablation studies due to space constraints.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top