List of Papers Browse by Subject Areas Author List
Abstract
Vision-language models (VLMs) are gaining attention in medical image analysis. These are pre-trained on large, heterogeneous data sources, yielding rich and transferable representations. Notably, the combination of modality-specialized VLMs with few-shot adaptation has provided fruitful results, enabling the efficient deployment of high-performing solutions. However, previous works on this topic make strong assumptions about the distribution of adaptation data, which are unrealistic in the medical domain. First, prior art assumes access to a balanced support set, a condition that breaks the natural imbalance in disease prevalence found in real-world scenarios. Second, these works typically assume the presence of an additional validation set to fix critical hyper-parameters, which is highly data-inefficient. This work challenges these favorable deployment scenarios and introduces a realistic, imbalanced, validation-free adaptation setting. Our extensive benchmark across various modalities and downstream tasks demonstrates that current methods systematically compromise their performance when operating under realistic conditions, occasionally even performing worse than zero-shot inference. Also, we introduce a training-free linear probe that adaptively blends visual and textual supervision. Detailed studies demonstrate that the proposed solver is a strong, efficient baseline, enabling robust adaptation in challenging scenarios. Code is available: https://github.com/jusiro/SS-Text .
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4796_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/jusiro/SS-Text
Link to the Dataset(s)
Links are available at: https://github.com/jusiro/SS-Text
BibTex
@InProceedings{SilJul_FewShot_MICCAI2025,
author = { Silva-Rodríguez, Julio and Shakeri, Fereshteh and Bahig, Houda and Dolz, Jose and Ben Ayed, Ismail},
title = { { Few-Shot, Now for Real: Medical VLMs Adaptation without Balanced Sets or Validation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15966},
month = {September},
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes a training-free text-informed linear probe for adapting medical VLMs using few-shot support sets. The paper advocates for a better evaluation setup that meets realistic clinical demands where certain disease categories might be missing from the support set.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) The proposed method is novel. The solution is derived by having an exact solution for the hard-label assignment since it is linear and convex. 2) The proposed solution outperforms state-of-the-art linear probes and adapters in the standard, realistic, and relaxed few-shot settings on 3 medical datasets. 3) The paper is well-written and easy to follow.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) The realistic scenario can be considered another form of base-to-novel generalization where the method is trained on base classes and then evaluated on unseen classes during training. The base-to-novel generalization for medical VLMs has already been evaluated by some previous works (1,2). 2) The method is not compared with prompt learning techniques as done by previous works (3) which are more generalizable to novel categories than linear probes or adapters. 3) The paper focuses on specialist medical VLMs, and overlooks generalist VLMs like CLIP and BiomedCLIP. Would the proposed method be applicable in this situation?
(1) Cao, Qinglong, et al. “Domain-controlled prompt learning.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 2. 2024. (2) Koleilat, Taha, et al. “BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models.” arXiv preprint arXiv:2411.15232 (2024). (3) Shakeri, Fereshteh, et al. “Few-shot adaptation of medical vision-language models.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1) The “standard scenario” in Section 3.2 is not properly indented as a bullet point 2) There’s a typo in Page 8, line 3 (“repealing” –> “repelling”)
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the paper does not benchmark against prompt learning methods or evaluate its applicability to generalist VLMs, the proposed method is novel, methodologically sound, and achieves strong performance across several few-shot settings. The exact solution to the linear assignment problem is a clear technical contribution, and the empirical results show consistent improvements over existing linear probes and adapters.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This research work tackles the use of Vision-Language Model for medical applications for which data are fundamentally different from natural images scenario and provide the code to test the proposed solution to scarce and unbalanced data cases. The authors also explore the new so-called Adapters strategies to adapt it to the specificity of medical image applications such as in histopathology but also Chest X-Ray. The achieved results are improving the SoTA and the methodology to get to that results is original and sound.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Strong evaluation of a new methodology to assess realistic scenario for Medical VLM. Code provided (on the anonyous MICCAI platform)
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
No specific weakness
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Very well documented and presented.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper identifies key limitations in existing few-shot adaptation approaches for medical vision-language models—specifically, their inadequate handling of imbalanced support sets and reliance on a validation set for hyperparameter tuning. To address these challenges, the authors propose a realistic, training-free adaptation framework. Extensive benchmarking across multiple modalities demonstrates the strong performance and broad applicability of the proposed method.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The problem studied in this paper is both important and interesting. Notably, the authors highlight key limitations of existing few-shot adaptation methods from a medical perspective. The proposed method is reasonable.
-
The experimental results are comprehensive, covering three imaging modalities, which effectively demonstrates the robustness of the proposed approach. The accompanying analysis is thorough and insightful.
-
The paper is well-written and clearly presented.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The authors mention that the baseline models are trained under a validation-free setting. While it may seem unfair to compare them with methods that utilize a validation set—given that SS-Text-r is inherently designed to operate without one—it would still be informative to evaluate one or two strong baselines with reasonable hyperparameter tuning. This could help estimate their potential upper-bound performance and offer a more comprehensive comparison.
-
Additionally, an ablation study on the proposed post-processing stage would be valuable in understanding its individual contribution to the overall performance.
-
Regarding the optimization formulations, if I understand correctly, the optimal solution to Eq. (2) is idealy expected to align with that of Eq. (6) if gradient desent goes perfectly. In this case, it would be helpful to further discuss the reasons behind the observed performance improvements. Based on my understanding, Eq. (2) may lead the model to converge to a local optimum, whereas Eq. (6) represents the global optimum.
[1]. Wang, W., Sun, Y., Li, W., & Yang, Y. (2023). Transhp: Image classification with hierarchical prompting. Advances in Neural Information Processing Systems, 36, 28187-28200.
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper studies an important problem: few-shot adaptaiton on medical vision-language models, and identified the significance of unbalanced setting in medical scenarios, and propose promising solution for it. The results demonstrates superior performance especially under realistic settings.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
N/A
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A