Abstract

Deep learning techniques have been widely applied to lung nodule malignancy prediction tasks. Recently, the emergence of Vision-Language Models (VLMs) has enabled the use of textual information, further improving diagnostic accuracy. Nevertheless, two key limitations persist: (1) the insufficient utilization of clinical data to enhance computer-aided diagnosis, and (2) the limited ability of existing frameworks to leverage similar cases in the diagnostic process. To address these issues, we propose a clinical data-driven, retrieval-augmented VLM framework for lung nodule malignancy prediction. The proposed framework comprises a multimodal encoder, a retrieval-augmented module, and a text encoder. Lesion classification is achieved by evaluating the similarities between the combined visual and clinical data features and the text features of predefined categories, thereby establishing a robust mechanism for malignancy prediction. Moreover, the retrieval-augmented module further refines the prediction process by incorporating similar cases retrieved using clinical data as a query, thus facilitating more informed and accurate decisions. Overall, this framework comprehensively utilizes clinical data by integrating it into CT image features and enabling cross-interaction in the retrieval-augmented module to support diagnosis with similar cases. Experimental results on the publicly available LIDC-IDRI dataset demonstrate that the proposed framework achieves significant improvements in lung nodule malignancy prediction, with an approximate 3% increase in accuracy. Our code is released on Github: https://github.com/chenn-clear/ClinicalRA.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0920_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/chenn-clear/ClinicalRA

Link to the Dataset(s)

LIDC-IDRI dataset: https://www.cancerimagingarchive.net/collection/lidc-idri/

BibTex

@InProceedings{HouRui_Clinical_MICCAI2025,
        author = { Hou, Ruibo and Chai, Shurong and Jain, Rahul Kumar and Li, Yinhao and Liu, Jiaqing and Teng, Shiyu and Shi, Xiaoyu and Lin, Lanfen and Chen, Yen-Wei},
        title = { { Clinical Data-Driven Retrieval-Augmented Model for Lung Nodule Malignancy Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {108 -- 117}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a clinical data-driven, retrieval-augmented Vision-Language Model (VLM) framework for lung nodule malignancy prediction. It integrates clinical data with CT images and leverages similar case retrieval to enhance diagnostic accuracy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This manuscript introduces a retrieval-augmented VLM that mimics radiologists’ diagnostic reasoning by combining clinical data with image features and retrieving similar cases for context-aware predictions.
    2. It employs clinical data both to enrich visual embeddings and as a query for retrieval, addressing limitations of image-only methods.
    3. It demonstrates significant accuracy gains (63.8% vs. 60.9% baseline) and notable improvements in classifying ambiguous “unsure” nodules, supported by rigorous ablation studies.
    4. This manuscript uses t-SNE visualizations to validate the model’s ability to separate malignant, benign, and unsure nodules spatially, enhancing clinical trust.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Although this work achieves promising results on the LIDC-IDRI dataset, further experiments on additional datasets are necessary to validate the correctness and generalizability of the proposed method. 2.The high structuring of clinical data(clinical data has eight attributes related to images)may lead to the experimental method being reliant on highly structured textual information.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation is clear. But author should conduct experimental validation on more datasets to verify the correctness of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    This work simulated clinical experts to implement a diagnostic algorithm, it integrates clinical data with CT images and leverages similar case retrieval to enhance diagnostic accuracy. It is a novel idea.



Review #2

  • Please describe the contribution of the paper

    This paper focused on lung nodule diagnosis, which is challenging due to the progressive GT labels. The authors proposed applying retrieval-augmented technology to integrate the clinical data is interesting. The proposed framework effectively improved the performance compared to baseline methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper simulated the diagnostic practices of radiologists which rely on past similar cases. This is significant for inspiring more studies to align with the clinical practice.
    2. The quantitative results surpass the previous ordinal regression methods and VLM-based models.
    3. The t-SNE results show superiority of the retrieval-augmented module.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. For the Multimodal Encoder, why the authors applied a simple MLP? And can the ResNet-18 be replaced by other encoders such as ViT, and how the proposed framework will perform?
    2. If the framework will be affected by the class-imbalance problem, since the number of benign nodules is larger than that of the malignant noduels?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see strengths and weakness

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contributions are:

    A novel retrieval-augmented framework designed specifically for lung nodule malignancy prediction that mimics radiologists’ diagnostic reasoning by retrieving similar cases to support decision-making.

    A unique approach to retrieving similar samples by using clinical data as a query rather than image-based retrieval. The framework utilizes clinical data in two complementary ways: integrating it with visual features to enrich image representation and using it as a query to identify semantically relevant samples.

    Significant performance improvements on the LIDC-IDRI dataset, with approximately 3% increase in accuracy, particularly in the challenging “unsure” category of nodules.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a retrieval-augmented framework that effectively mimics the diagnostic process of radiologists, who typically reference similar past cases when making diagnoses. This is a clinically-inspired methodology that shows promise for real-world applications.

    The paper addresses the underutilization of clinical data in previous lung nodule classification approaches. By using clinical data both for enhancing visual features and for guiding the retrieval process, the framework demonstrates a comprehensive utilization of available information.

    The results show clear improvements over state-of-the-art methods, with the framework achieving the highest accuracy (63.8%) compared to both ordinal classification methods and other VLM-based approaches. The approach particularly excels in correctly classifying the challenging “unsure” category.

    The paper includes a thorough ablation study that clearly demonstrates the contribution of each component to the overall performance, supporting the design choices made in the framework.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper only evaluates the method on a single dataset (LIDC-IDRI). While this is a standard benchmark in the field, validation on multiple datasets would strengthen the claims about the method’s generalizability.

    While the paper mentions the memory requirements (approximately 4 GB of GPU memory), it doesn’t provide a comprehensive analysis of computational efficiency, training time, or inference speed compared to other methods, which are important considerations for clinical deployment.

    The paper fixes the number of retrieved samples (K) to 5 without extensive justification or exploration of how different values might affect performance.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel approach to lung nodule malignancy prediction that effectively integrates clinical data and a retrieval-augmented framework, addressing known limitations in current methods.

    The framework shows significant performance improvements over state-of-the-art methods, with convincing empirical evidence and thorough ablation studies that validate the design choices.

    The approach is well-motivated by clinical practice, mimicking how radiologists use prior similar cases in their diagnostic process, which enhances its potential for clinical translation. The paper is well-written, clearly organized, and provides sufficient details for reproducibility.

    The weaknesses identified (single dataset evaluation, limited exploration of retrieval parameters, etc.) do not significantly detract from the paper’s contributions and could be addressed in future work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    This paper presents a practically relevant, well-motivated, and thoughtfully validated solution to an important clinical problem. It brings a novel methodology rooted in clinical workflows, offers strong empirical evidence, and the authors demonstrated good judgment and professionalism in the rebuttal.




Author Feedback

We thank the reviewers for their valuable feedback and appreciate their recognition of our work’s novelty and clarity. Below, we respond to the main concerns raised.

R#1: (A) Choice of encoders: Although structurally simple, the MLP effectively bridges the dimensional gap between low-dimensional (8D) clinical features and high-dimensional image representations. This design achieves meaningful feature alignment and yields consistent improvements over image-only baselines, as shown in Table 2. ResNet-18 was chosen for its proven stability in prior work and its suitability for our dataset scale (1,010 patients). Our framework is compatible with other encoders, such as ViT and other advanced backbones, and we will systematically investigate their performance in future work. (B) Class imbalance issue: Retrieval-augmented frameworks, such as RAC (Long et al., CVPR 2022), have been shown to mitigate class imbalance by retrieving semantically similar minority-class examples. These retrieved samples serve as informative references that guide classification, thereby reducing reliance on the global class distribution. In our case, we use clinical attributes (e.g., spiculation and margin), known indicators of malignancy, to retrieve malignant cases that are diagnostically relevant, even when such cases are underrepresented. As Table 1 shows, our model achieves balanced performance across classes, showing robustness to class imbalance. To further enhance minority-class handling, we plan to explore adaptive sampling or class-aware loss functions in future work.

R#2: (A) Generalizability and dataset adaptation (shared with R#3): We employed LIDC-IDRI because it is one of the most widely used benchmarks for lung nodule analysis. Although our experiments were conducted on this dataset, the proposed method is not dataset-specific. The retrieval mechanism based on clinical features is generalizable and can be applied to other medical imaging datasets that include clinical information. In line with this, Reviewer #3 noted that our clinically inspired framework shows promise for real-world applications. To further assess generalizability and effectiveness, we will include evaluation on additional datasets in the extended journal version. (B) Flexibility of clinical input: We clarify that our framework is not limited to structured formats. In the case of LIDC-IDRI, where structured clinical attributes are readily available, we directly input their numeric values, as described in the paper. However, the framework is inherently flexible and can also accommodate unstructured clinical text. When structured data is unavailable, free-text reports or notes can be processed using pretrained language models (e.g., BERT) to extract meaningful representations for both feature enhancement and retrieval. This flexibility allows our approach to operate effectively across diverse clinical documentation styles, including semi-structured or fully unstructured data.

R#3: (A) Generalizability and dataset adaptation: [Addressed above under R#2 (A)] (B) Computational efficiency: As for computational demands, our model is lightweight by design. It requires ~4 GB of GPU memory and completes retrieval, training, and inference within ~30 minutes. This efficiency supports practical deployment in real-world clinical settings, and we consider a comparative analysis with other methods an important direction for future work. (C) Retrieval sample size K: Regarding the fixed choice of K = 5 for retrieval, this value was empirically selected to balance semantic diversity with retrieval precision. Smaller K values (e.g., 1–2) provide limited context, while larger values (e.g., 7–8) risk introducing irrelevant samples. This choice was based on internal observations during model development, and we will include a comprehensive ablation in the extended journal version.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal addressed most of the reviewers’ concerns, and all reviewers maintained a positive evaluation, indicating that this paper should undoubtedly be accepted. However, some comments from the reviewers need to be incorporated into the final version.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers agreed to accept the manuscript, and the rebuttal addressed most reviewers’ doubts.



back to top