Abstract

In trustworthy medical diagnosis systems, integrating out-of-distribution (OOD) detection aims to identify unknown diseases in samples, thereby mitigating the risk of misdiagnosis. In this study, we propose a novel OOD detection framework based on vision-language models (VLMs), which integrates hierarchical visual information to cope with challenging unknown diseases that resemble known diseases. Specifically, a cross-scale visual fusion strategy is proposed to couple visual embeddings from multiple scales. This enriches the detailed representation of medical images and thus improves the discrimination of unknown diseases. Moreover, a cross-scale hard pseudo-OOD sample generation strategy is proposed to benefit OOD detection maximally. Experimental evaluations on three public medical datasets support that the proposed framework achieves superior OOD detection performance compared to existing methods. The source code is available at https://openi.pcl.ac.cn/OpenMedIA/HVL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0985_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://openi.pcl.ac.cn/OpenMedIA/HVL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LaiRun_Hierarchical_MICCAI2025,
        author = { Lai, Runhe and Lu, Xinhua and Chen, Kanghao and Chen, Qichao and Zheng, Wei-Shi and Wang, Ruixuan},
        title = { { Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {229 -- 238}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel framework for Out-of-Distribution (OOD) detection in medical imaging, leveraging a Visual Language Model (VLM). The proposed framework captures visual features across multiple scales and fuses them hierarchically in order to extract both local and global in-distribution (ID) features. In addition, the authors introduce a pseudo-OOD sample selection strategy, focusing on the boundary of the lesion area. This sample approach enhances the model’s ability to distinguish between ID and OOD samples, thereby improving OOD detection. The framework is evaluated against several baseline models across three distinct datasets in the fields of dermatology and histology.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is generally well-written and ideas are well presented.

    • The proposed approach is both simple and relatively novel. The pseudo-OOD sample selection (R2) does not rely on the feature space to sample OOD data, unlike methods such as VOS [1] or NPOS [2]. Additionally, it does not require an external generative model, distinguishing it from approaches like Dream-OOD [3], which depend on external models to generate OOD samples.

    • The method is evaluated against several existing OOD detection methods (unimodal and multimodal) across three different datasets, with testing conducted using multiple random seeds to ensure robustness.

    [1] Xuefeng Du, et al. VOS: Learning What You Don’t Know by Virtual Outlier Synthesis. ICLR 2022. [2] Leitian Tao, et al. Non-Parametric Outlier Synthesis. ICLR 2023. [3] Xuefeng Du, et al. Dream the Impossible: Outlier Imagination with Diffusion Models. NeurIPS 2023

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The use of scaled image patches and their fusion has been explored in prior works such as PatchCore[1], SimpleNet[2], and EfficientAD[3]. The authors should clarify what is novel about their approach and how it distinguishes itself from these existing methods.

    • The pseudo-OOD approach may struggle in cases where lesion boundaries are poorly defined due to low contrast, noise, or complex shapes. The evaluation datasets primarily consist of dermoscopic images with clearly defined, centrally located lesions, which may not represent more challenging, real-world scenarios.

    • The paper presents a range of standard deviations but not per model, making it difficult to assess performance variability. I recommend using the $\text{mean} \pm \text{std}$ format in the table. Additionally, highlighting the best results without statistical validation weakens the credibility of the claims.

    • The original image resolution is not specified. If the resolution is low, scaled images may appear blurry, potentially reducing model performance due to insufficient detail, especially in complex cases.

    • The baseline model and dataset used in the sensitivity study are not provided, complicating the interpretation of results. The negative correlation between image scale $n$ and performance should also be discussed as a potential limitation.

    • The authors did not indicate the memory consumption and time required for inference, when deployed AI models in real-world scenarios this aspect is critical. Could the authors provide some insights into this?

    • Few-shot learning is crucial in medical imaging, where labeled data is scarce. Previous VLM work in OOD detection has demonstrated good performance with limited data, an area worth exploring for future improvements in the paper.

    [1] Roth et al. Towards Total Recall in Industrial Anomaly Detection. CVPR 2022 [2] Zhikang Liu, et al. SimpleNet: A Simple Network for Image Anomaly Detection and Localization. CVPR 2023. [3] Kilian Batzner, et al. EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies. WACV 2024

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper is well-written and presents an interesting approach. While the idea of using OOD detection for rare diseases has been explored in dermatology [1,2,3,4,5], some of these works are not referenced in the current paper. I encourage the authors also to consider more realistic, clinically relevant scenarios, such as teledermatology, where data distributions often differ from the training data. Below are a few suggestions to further strengthen the paper:

    • It would be insightful to evaluate the model on near-OOD data, such as training on the ISIC dataset and testing on HAM1000. I suspect the model may struggle with detecting these near-OOD samples.

    • Testing the model on other modalities or clinical datasets, such as Fitzpatrick17K[6] or PASSION[7], would help assess its generalization to different types of data.

    • The paper primarily compares models trained on the full dataset, but including a few-shot learning scenario would further strengthen the argument, particularly given the limited availability of labeled data in medical imaging.

    • Another potential direction is to explore cross-attention between textual and image features to localize the region of interest. Comparing this method with the current entropy-based approach for lesion boundary localization could offer valuable insights and potentially improve the model’s ability to precisely identify lesion boundaries.

    [1] M. Combalia, et al. Uncertainty estimation in deep neural networks for dermoscopic image classification. CVPRW 2020. [2] Abhijit Guha Roy, et al. Does your dermatology classifier know what it doesn’t know? detecting the long-tail of unseen conditions. [3] Deval Mehta, et al. Out-of-Distribution Detection for Long-tailed and Fine-grained Skin Lesion Images. MICCAI 2022. [4] Torop, M., et al. Unsupervised Approaches for Out-Of-Distribution Dermoscopic Lesion Detection. NeurIPS Workshop. [5] Subhranil Bagchi et al. Learning a meta-ensemble technique for skin lesion classification and novel class detection. CVPRW 2020. [6] Groh, M. et al. Evaluating Deep Neural Networks Trained on Clinical Images in Dermatology with the Fitzpatrick 17k Dataset. CVPRW 2021. [7] Gottfrois, P. et al. PASSION for Dermatology: Bridging the Diversity Gap with Pigmented Skin Images from Sub-Saharan Africa. MICCAI 2024.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important problem that is of interest within the MICCAI community. The proposed approach is innovative, particularly in the pseudo-OOD sample selection method, which does not rely on feature space sampling or external generative models. The evaluation against multiple baselines and datasets adds strength to the work. I encourage the authors to revise the weaknesses, specifically the experimentation part to validate the claims of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel OOD detection framework based on VLMs. With the cross-scale visual fusion stratagy and hard pseudo-OOD sample generation, the proposed method achieves superior OOD performance on different datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method is described in detail.
    • The method integrates hierarchical visual information to enhance OOD detection.
    • A novel pseudo-OOD generation stretagy is proposed.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • In Eq.1, what is the difference between u_i^1 and u_j^1*?

    • Details of the experiment dataset, such as the size and the format, should be provided in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and the novel proposed method works well in experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel hierarchical vision-language based OOD detection method leveraging cross-scale visual fusion. The paper advocates for employing both global and local information so that the model effectively captures the fine-grained information as well as the whole contextual information. Furthermore, for effective delineating of the boundary of the known from unknown, it also proposes a hard pseudo-OOD generation strategy to synthesize artificial outliers.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Neither cross-scale visual fusion strategy nor pseudo-OOD generation proposed in this paper is totally novel. However, such two techniques combinedly applied to medical OOD detection can be considered a novel application. Furthermore, the framework proposed as a whole is indeed novel. This work follows important line of work on applying powerful VLMs to OOD detection scenario. OOD detection is a critical problem in the context of medical domain. The presentation of the paper is also great. The paper evaluates across 3 datasets; in all cases, the proposed method performs well.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are not a lot of major weaknesses of the paper. The paper could maybe benefit by demonstrating the advantage of employing a hard pseudo-OOD generation strategy over other similar naive OOD generation strategies. Furthermore, the authors could include comparison with more recent OOD detection postprocessors benchmarked in OpenOOD to show significant empirical strength of the proposed approach.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The authors are suggested to include recent SOTA posthoc approaches for comparison.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper tackles an important problem of OOD detection in medical domain. The evaluation is convincing and the comparison is made with highly relevant works in the literature though few more could be added in the camera ready version. The empirical results are convincingly significant.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

N/A




Meta-Review

back to top