Abstract

In-context learning with Large Vision Models presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the performance of Large Vision Models in in-context learning highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging Large Vision Models for medical tasks have focused mainly on model-centric approaches like fine-tuning, we study an orthogonal data-centric perspective on how to select good visual prompts to facilitate generalization to the medical domain. In this work, we propose a label-efficient in-context medical segmentation method enabled by introducing a novel Meta-driven Visual Prompt Selection (MVPS) mechanism, where a prompt retriever obtained from a meta-learning framework actively selects the optimal images as prompts to promote model performance and generalizability. Evaluated on 8 datasets and 4 tasks across 3 medical imaging modalities, our proposed approach demonstrates consistent gains over existing methods under different scenarios, introducing both computational and label efficiency. Finally, we show that our approach is a flexible, finetuning-free module that could be easily plugged into different backbones and combined with other model-centric approaches.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3444_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3444_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

All datasets used in this paper are public and can be found in paper’s citations.

BibTex

@InProceedings{Wu_Efficient_MICCAI2024,
        author = { Wu, Chenwei and Restrepo, David and Shuai, Zitao and Liu, Zhongming and Shen, Liyue},
        title = { { Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The author propose a in-context medical segmentation method by introducing a new visual prompt selection process, where a prompt retriever obtained from a meta-learning framework actively selects images as prompts to promote model performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a meta-driven visual prompt retrieval approach, by constructing a meta-learning scheme to teach a transformer-based prompt retriever which images are worthy of being selected as visual prompts to enhance model performance. The author use a supervised learning for the prompt selection. A supervised learning for the prompt selection was introduced before by Zhang-Zhou-Liu « What makes good examples for visual in-context learning » . The new points are to construct a different meta-learning scheme to teach a prompt retriever and to optimize the prompt retriever through the probability distribution estimation and reshaped reward policy gradients.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • The presentation is not good enough. For example, the authors say that the support set/prompt pool consists of unlabelled images, but they still need masks to calculate the reward, and then after selected k-samples from visual prompts in the prompt pool, they need the segmentation masks of selected samples for completing in-context segmentation tasks. The Illustration Fig. 3 is also not easy to understand the proposed method. Some sentences are long and hard to follow.

    • The authors use supervised learning for the prompt selection, but they do not compare their method with the previous one using supervised learning for prompt selection by Zhang-Zhou-Liu op. cit. They only compare their approach with a random prompt selection and a unsupervised learning approach. Although it is shown by experiments in Zhang-Zhou-Liu’s work that the supervised approach is better than the random prompt selection and the unsupervised one.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In my opinion, the authors should make the presentation more clear by explaining in details on Meta Tasks Construction and Prompt Retriever and by using short sentences. For examples, in the Prompt Retriever part, the sentences are long, so hard to understand.

    The most important thing should do is to compare results of their approach with results of existed supervised learning approaches for prompt selection.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using supervised learning for the prompt selection does exist (cf. Zhang-Zhou-Liu « What makes good examples for visual in-context learning » ). In this paper, the authors also use extracted embeddings to be the inputs of prompt retriever as in Zhang-Zhou-Liu’s paper. The main contribution of the paper is a new process in the supervised learning for the prompt selection. However, they do not compare their results with existed approaches with supervised learning, they only compare their results with random prompt selection and a unsupervised learning approach which are shown to be less efficient by experiments than supervised approach (cf Zhang-Zhou-Liu ).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    I am not satisfied with the explanation of the authors. They should provide their comparison with SupPR even their result is not as good as SupPR.



Review #2

  • Please describe the contribution of the paper

    The paper presents a novel approach that leverages meta-learning to generate effective prompts from pre-trained Large Visual Models (LVMs), thereby eliminating the need for costly fine-tuning. This innovative method enables efficient use of existing models, reducing computational requirements and label demands. The proposed MVPS method is comprehensively evaluated across 8 datasets, spanning 3 imaging modalities, demonstrating its efficacy in terms of computational and label efficiency, and potential extension to various backbone architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written and easy to follow, making it accessible to readers from diverse backgrounds.
    • The main challenge addressed by the authors is indeed intriguing, as the computationally demanding nature of fine-tuning Large Visual Models (LVM) renders it impractical in many research institutions or hospitals. This makes the proposed solution particularly relevant to the MICCAI community.
    • Extensive evaluation has been conducted, and the results demonstrate that the proposal outperforms the considered baselines.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors should revisit their literature review because there are previous works that focus on prompt selection [1-2] and are not considered in the comparison.
    • The paper’s data-centric approach assumes a query set comprising 100 images. However, in the medical domain, it is often challenging to obtain sufficient images and labels due to various issues such as privacy concerns, cost, annotation subjectivity, and more. It is unclear why the authors chose this specific number of images.
    • In continuation with the previous point. How the proposed approach performs in low-data regimes and compare its results with other meta-learning approaches [3-4].

    [1] Exploring effective factors for improving visual in-context learning. Yanpeng Sun, Qiang Chen, Jian Wang, Jingdong Wang, Zechao Li [2] Instruct Me More! Random Prompting for Visual In-Context Learning. Jiahao Zhang, Bowen Wang, Liangzhi Li, Yuta Nakashima, Hajime Nagahara. [3] One-Shot Learning for Semantic Segmentation. Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, and Byron Boots [4] Conditional networks for few-shot semantic segmentation. Kate Rakelly, Evan Shelhamer, Trevor Darrell, Alyosha Efros, and Sergey Levine.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Provide a clear justification for why 100 images were chosen as the query set size and discuss potential limitations and trade-offs associated with this choice.
    • Even if it’s not possible to submit new results with respect to previous studies, it would be helpful for readers to understand the unique aspects of your work and how it builds upon or diverges from the existing literature.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and enables efficient use of existing models, reducing computational requirements and label demands. The authors comprehensively evaluate their method across 8 datasets demonstrating its efficacy in terms of computational efficiency, label efficiency, and potential extension to various backbone architectures. However, the methodology would be strengthened by considering previous works in the field, as this would provide a more comprehensive understanding of the proposed approach’s strengths and limitations.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed all my concerns and included previously missing works and bibliography. The work is beneficial and relevant to the MICCAI community; therefore, I will raise my score to accept.



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose an in-context medical segmentation method from the perspective of visual prompt selection mechanism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. the proposed method is novel since it tries to solve the problem through the aspect of prompt selection and successfully design a meta-learning based method to select the optimal prompt for in-context segmentation.
    2. The proposed method is clearly presented and the solid reasoning is provided.
    3. Extensive experimental results are provided to prove the effectiveness of the proposed methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main weakness of the paper is that Fig.3 cannot well demonstrate the proposed idea in the paper. Please label the symbols mentioned in the caption also in the figure, and maybe reorganize the location of different objects in the figure to better reflect their relationship.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It will be better if the authors can provide the code upon the acceptance of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please reorganize Fig. 3
    2. Please better describe when the ground truth of the supporting set and the query set will be used.
    3. Have the authors tried other score calculation in addition to DICE score to see how it will affect the performance?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty, description, and experimental results.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The paper has solved all my concerns and qualifies for publication.




Author Feedback

Regarding the DICE score: It was used as both our scoring function and performance metric, for the convenience of showing the score of supervised methods as an upper bound for direct comparison. We have also tested the mIOU as scoring function and observed similar patterns.

Regarding the 100 query size: Given different data availability, our framework could potentially adapt to different query set sizes. Here, we used 100 as a simplified demonstration of real-life limited data scenarios. We searched query sizes of 25, 50, 75, 100, and 150 (using K-means centroids over varying Ks) and found that although having more (>100) will help, 100 seems to be a reasonable amount to represent the target distribution. However, using a query size <50 may hurt the prompt selection quality. We acknowledge the potential limitations and tradeoffs, thus in the discussion section, we have included one paragraph discussing the limitations and drawbacks of decreasing the query size.

Regarding clarity and bibliography: We have reorganized Fig. 3 and the caption and will update it in the final version: “Meta-driven Visual Prompt Selection (MVPS) Framework. Given a training set (e.g., HAM10k) and a testing set (e.g., ISIC), the MVPS framework constructs tasks for meta-training and meta-testing. Each task consists of a pool of 1000 unlabeled images and a query set of 100 labeled image-mask pairs. The goal of meta-training is to learn a prompt retriever that selects the best prompts from the unlabeled pool, and labels them as prompt pairs to enhance segmentation performance on the query/evaluation set. The retriever takes in embeddings extracted from frozen LVMs to reduce the dimensionality of the data and reduce the computational costs associated with the retriever training. The retriever is optimized by scoring function (e.g. DICE) taking in segmentation masks of the LVM prediction against the ground truth masks in the query set. For meta-testing, the retriever selects prompts from a simulated prompt pool of 1000 images in the test dataset. These prompts are used by the LVM to perform segmentation on new query images. The evaluation is done by comparing the segmented output against the ground truth, assessing the model’s performance in a real-world scenario.” Additionally, we will include a more comprehensive comparison with all the related works mentioned by the reviewers and revisit the text to improve clarity.

Regarding the comparison with methods like SupPR, we compared with SupPR but didn’t include due to: Efficiency: SupPR involves contrastive pretraining of a vision encoder with around 315 million parameters, which contradicts our goal of efficient LVM adaptation. Our method, MVPS, uses a frozen vision transformer and a lightweight prompt retriever that is only 7% the size of SupPR. This focus on efficiency is why we chose to compare with LoRA, emphasizing our method’s adaptability to medical imaging without extensive computational resources. Technical Contribution: MVPS differentiates itself by focusing on a data-centric approach rather than modifying the LVM architecture or prompt inputs. MVPS actively retrieves the best set of prompts given an unlabeled prompt pool. Our method provides a case study on prompt effectiveness, offers a lightweight prompt retriever for selecting and analyzing effective prompts, and can be combined with other approaches without changing the underlying architecture. Domain Generalization: While SupPR offers valuable insights in the natural image domain, its primary focus is not on optimizing prompt selection for medical imaging. For example, on dermatology datasets, MVPS achieves a 3.47% average gain over TopK, compared to SupPR’s 3.28% which has ten times our training parameters. The significant domain shift in medical images poses challenges for SupPR, which relies on cosine-similarity matching. In contrast, MVPS uses task augmentation during meta-training, resulting in better generalization across domains




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Both R1 and R3 gave an “Accept,” but R4 gave a “Reject” due to the lack of an important baseline comparison. Based on the authors’ rebuttal, I agree that the proposed method has advantages over SupPR, and the reason for not including SupPR is acceptable. Therefore, I suggest an “Accept” but encourage the authors to include SupPR performance in the final version as suggested by R4.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Both R1 and R3 gave an “Accept,” but R4 gave a “Reject” due to the lack of an important baseline comparison. Based on the authors’ rebuttal, I agree that the proposed method has advantages over SupPR, and the reason for not including SupPR is acceptable. Therefore, I suggest an “Accept” but encourage the authors to include SupPR performance in the final version as suggested by R4.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This manuscript received two positive reviews and one negative review. I recommend accepting the paper, but I encourage the authors to include the performance of SupPR in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This manuscript received two positive reviews and one negative review. I recommend accepting the paper, but I encourage the authors to include the performance of SupPR in the final version.



back to top