Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

In-context learning (ICL) is emerging as a promising technique for achieving universal medical image segmentation, where a variety of objects of interest across imaging modalities can be segmented using a single model. Nevertheless, its performance is highly sensitive to the alignment between the query image and in-context image-mask pairs. In a clinical scenario, the scarcity of annotated medical images makes it challenging to select optimal in-context pairs, and fine-tuning foundation ICL models on contextual data is infeasible due to computational costs and the risk of catastrophic forgetting. To address this challenge, we propose Cycle Context Verification (CCV), a novel framework that enhances ICL-based medical image segmentation by enabling self-verification of predictions and accordingly enhancing contextual alignment. Specifically, CCV employs a cyclic pipeline in which the model initially generates a segmentation mask for the query image. Subsequently, the roles of the query and an in-context pair are swapped, allowing the model to validate its prediction by predicting the mask of the original in-context image. The accuracy of this secondary prediction serves as an implicit measure of the initial query segmentation. A query-specific prompt is introduced to alter the query image and updated to improve the measure, thereby enhancing the alignment between the query and in-context pairs. We evaluated CCV on seven medical image segmentation datasets using two ICL foundation models, demonstrating its superiority over existing methods. Our results highlight CCV’s ability to enhance ICL-based segmentation, making it a robust solution for universal medical image segmentation. The code will be available at \url{https://github.com/ShishuaiHu/CCV}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0254_paper.pdf

SharedIt Link: https://rdcu.be/eHwKY

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_14

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/ShishuaiHu/CCV

Link to the Dataset(s)

https://huggingface.co/datasets/microsoft/BiomedParseData

BibTex

@InProceedings{HuShi_Cycle_MICCAI2025,
        author = { Hu, Shishuai AND Liao, Zehui AND Zhen, Liangli AND Fu, Huazhu AND Xia, Yong},
        title = { { Cycle Context Verification for In-Context Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {141 -- 151}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes Cycle Context Verification (CCV), a framework that boosts in-context learning (ICL) for medical image segmentation through a self-verification cycle. After predicting a query mask, the model reverses roles with an in-context image to indirectly assess prediction accuracy. This guides prompt optimization for better alignment, improving performance without backbone fine-tuning or large-scale annotations. CCV outperforms existing methods on seven datasets and two ICL models.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed CCV is the self-verification mechanism that allows the model to assess its own segmentation quality by swapping the roles of query and in-context images. It also introduces a learnable query-specific prompt that is optimized during inference.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Although the method is evaluated on two foundational ICL models, the paper does not discuss its generalizability to other backbones, which may limit its broader applicability. The approach also heavily relies on the quality of in-context images and the capacity of the backbone model, raising concerns about its robustness under imbalanced data distributions or weak contextual relevance. Additionally, it is recommended to present experimental results illustrating the effect of varying the number of in-context pairs on model performance.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the experimental results demonstrate certain advantages over other methods, the paper lacks evaluation on different backbones and under various data conditions, casting doubt on its generalizability and robustness. Moreover, key factors such as the quality and quantity of in-context pairs are insufficiently analyzed, rendering the conclusions less convincing.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors replied to my concerns.

Review #2

Please describe the contribution of the paper

In the In-context learning (ICL) for universal medical image segmentation, the alignment between the query image and in-context image-mask pairs are crucial to the performance. However, in a clinical scenario, the scarcity of suitable in-context pairs and inability to fine-tuning foundation ICL models makes the alignment challenging. In order to overcome this limitation, this paper propose a cycle context verification (CCV) pipeline that enables the ICL model to double-check its query predictions at inference time. In addition, a learnable, query-specific prompt is introduced to improve the alignment between each query image and its in-context pairs, thereby enhancing segmentation performance. The quantitative experimental results show that CCV outperforms existing methods, and the code will be accessed.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) Inspired by the test time scaling in the field of LLMs, this paper proposes a cycle context verification method, which allows the ICL model to double-check its predicted mask for improving universal medical image segmentation. Firstly, given a test query image and an associated in-context pair, the ICL model generates an initial segmentation mask for the query. Secondly, using the predicted query mask and the query image as new context pair, while the original in-context image becomes the new query image. By swapping the query and context pair in the second step, the ICL model generate a mask prediction for the originally given in-context image, and the accuracy implies the initial query mask prediction. (2) Directly fine-tuning the pretrained ICL model may potentially lead to model collapse, where the model becomes overfitted to the verification step. This paper introduces a query-specific prompt, which is learnable and spatially aligned with the query image. By adding the prompt and query image, the alignment between the query and in-context images is expected to be enhanced, thereby leading to better query segmentation. (3) The experiment is robust, the dataset is open, and the code will be accessed.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) The overall pipeline in the formality is similar to the reconstruction pipeline. (2) In the ablation experiment of removing prompt optimization, the performance drop can be attributed to other factors like hyperparameter settings or fine-tuning strategies. The model collapse description is too general, there can be further research in the performance drop by using the cycle ICL pipeline solely. (3) The visualization of image prompt is lack of interpretability to effectively support the conclusion of enhancing the alignment between the query image and in-context pairs.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) The method proposed in this paper is promising in the field of in-context learning based universal medical image segmentation. (2) The dataset is open, and the code will be accessed. (3) There can be further research in the performance drop by using the cycle ICL pipeline solely.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Thanks for the author’s reply. I tend to accept this paper.

Review #3

Please describe the contribution of the paper

The paper proposes a novel, plug-and-play loop context validation pipeline that enables ICL models to scrutinize their query predictions while reasoning. Instead of traditional cue learning, which adjusts a uniform cue across the data set, we introduce a learnable, query-specific cue that is optimized to improve alignment between each query image and its context pair, thereby improving segmentation performance. Using the basic model of two ICLs on multiple medical image segmentation datasets shows that the method is superior to existing techniques.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The research direction of this paper is relatively novel, studying how to fine-tune the general segmentation model efficiently.
2. This paper proposes a novel, plug-and-play cycle context verification pipeline that enables the ICL model to double-check its query predictions at inference time.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The author has explained the problem of requiring a large number of labels in InMeMo, but the method in this paper has less description of how to take advantage of the contribution of no labels.
2. Comparison with SAM model should be added.
3. Since the training and reasoning processes are not well shown in Fig.1, the author may consider explaining the two processes separately.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A relatively novel research direction
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

After reading the author’s rebuttal, I think this is an acceptable paper.

Author Feedback

We sincerely thank the reviewers for their valuable feedback and for recognizing the novelty, potential (R1&R3), and superior performance (R1&R2&R3) of our CCV framework.

R1Q1-Similarity to reconstruction pipelines: Segmentation and reconstruction are fundamentally different tasks: the former involves pixel-wise classification, whereas the latter focuses on image generation. Although reconstruction methods like CycleGAN use cycle consistency as a regularization for unpaired image translation, our CCV introduces a novel test-time cycle context verification mechanism for ICL-based segmentation. This mechanism enables prediction validation via role-swapping and refinement via query-specific prompt optimization, which we believe is a new paradigm in the ICL domain.

R1Q2-Performance drop after removing prompt optimization: When prompt optimization is removed and the entire model is updated during the verification stage, the model tends to overfit the prediction of the context image but not improve the query prediction. This leads to a phenomenon we term as model collapse, where the optimization objective is met by adjusting model parameters in a way that doesn’t enhance the query segmentation task.

R1Q3-Visualization of image prompts: As shown in Fig. 2, we visualize the learned query-specific prompts and illustrate how adding the prompt changes the segmentation and corresponding Dice scores. The visualizations clearly demonstrate that the learned prompts alter the query image to yield improved segmentation accuracy, indicating enhanced query-context alignment. Higher Dice scores after adding the prompt support the conclusion that better alignment is achieved.

R2Q1-Evaluation on two ICL models: Our CCV framework is plug-and-play and compatible with existing ICL-based segmentation models. We selected SegGPT and UniverSeg as backbones because they are representative and publicly available ICL models at the time of submission. The consistent performance improvements across both architectures shown in Table 1 adequately validate the effectiveness and generalizability of CCV.

R2Q2-Reliance on in-context image quality: Our work is precisely motivated by the challenge of ICL models’ sensitivity to contextual quality, especially when optimal in-context pairs are scarce or exhibit weak contextual relevance. CCV is explicitly designed to address this by improving query-context alignment through test-time prompt optimization. As shown in Tables 1, CCV consistently enhances performance with varying context inputs, indicating improved robustness to diverse data distributions and contextual relevance, rather than relying on initially perfect context.

R2Q3-Effect of context size: The primary focus of our work is on improving query-context alignment for ICL-based segmentation, not exhaustively exploring all influencing factors like context size. While our main experiments use eight context pairs for UniverSeg, we also evaluated CCV with four pairs, where it improved the average Dice score from 51.98% to 55.25%.

R3Q1-Strengths over InMeMo: Our CCV operates entirely at test time and optimizes a query-specific prompt using only the test image and its corresponding in-context pair(s). Thus, CCV offers advantages in scenarios with limited labeled data by not requiring an additional offline training phase.

R3Q2-Comparison with SAM: SAM is not based on in-context learning. Our work specifically addresses challenges unique to ICL-based segmentation, e.g., context-query alignment. Therefore, SAM is not included as a comparison method.

R3Q3-Clarification of training and inference: We clarify that our CCV operates entirely during the inference phase of a pre-trained ICL model. During inference, the query-specific prompt is iteratively optimized using the CCV pipeline. Once the prompt converges, it is used to enhance the query image, which, along with the context pairs, is then fed into the ICL model for the final prediction.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Cycle Context Verification for In-Context Medical Image Segmentation

Author(s):