List of Papers Browse by Subject Areas Author List
Abstract
Nuclei instance segmentation is crucial for biomedical research and disease diagnosis. Pathologists utilize information such as color, shape, and the surrounding tissue microenvironment to distinguish nuclei. However, existing models are limited as they rely solely on features from the current patch, neglecting contextual information from neighboring patches. This limitation impedes the model’s ability to accurately identify nuclei. To address this issue, we propose CA-SAM2, a novel framework that enhances the prompt propagation capability of the Segment Anything Model 2 (SAM2) through a Context Injection Module(CIM), integrating surrounding contextual information during segmentation. Additionally, to adapt SAM2 to the pathology image domain, we introduce a convolutional branch to extract domain-specific features from pathological images. We further design a Multi-Level Feature Refinement Block (MFRB) to refine the prior features extracted by SAM2 and integrate domain features. Finally, we incorporate a regression head and a classification head after the convolutional branch to automatically generate point prompts, eliminating the need for manual annotation. Extensive evaluations of CA-SAM2 on the MoNuSeg and CPM-17 datasets demonstrate its effectiveness and practicality in enhancing nuclei segmentation. The code is available at https://github.com/HanbinHuang123/CA-SAM2.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1881_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/HanbinHuang123/CA-SAM2
Link to the Dataset(s)
N/A
BibTex
@InProceedings{HuaHan_CASAM2_MICCAI2025,
author = { Huang, Hanbin and He, Hongliang and Xu, Liying and Zhu, Xudong and Feng, Siwei and Fu, Guohong},
title = { { CA-SAM2: SAM2-based Context-Aware Network with Auto-Prompting for Nuclei Instance Segmentation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15968},
month = {September},
page = {86 -- 95}
}
Reviews
Review #1
- Please describe the contribution of the paper
The method provides an adaptation of SAM2 without a manual prompt. The prompt is generated through a regression head. In addition, this paper proposes a context injection module and feature alignment modules which use cross attention to encode the frames.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) Automatic prompting is an important requirement for many medical applications which is addressed by the paper. 2) The results show an improvement over classical and baseline methods.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) There is no analysis on the effectiveness of the regressor head as it seems the most critical point of failure to me. How does the regressor produce one point per nucleus and yet is able to segment all nucleii? Are there a fixed number of regression heads, each producing one point prediction? How often does the head identify a point at the correct location?
2) There is missing comparison with adaptation methods of SAM. There is a lot of literature regarding adapting SAM for specific domains and the paper misses comparing to a few of these methods. For example, SAMed [1], MedSAMAdapter [2], S-SAM [3]. [1] Customized Segment Anything Model for Medical Image Segmentation [2] Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation [3] S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation
3) There are typs in the paper that need to be corrected - Minor weakness
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper addresses an important problem for using SAM2. However, it requires some more detailed analysis in this aspect. While there is adequate architectural novelty, these are basically cross attention blocks. Thus, there needs to be more focus on why there is a performance improvement and verifying if the regression heads are indeed producing good prompts. Please address the weaknesses for the rebuttal.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I am assuming from the rebuttal that the classification head is the major component that decides whether the location predicted by the regression head is valid or not. There should be a slightly more detailed explanation about the roles of the two heads and why the regression head is required (consider using the auto-mode of SAM2 with uniformly sampled points). However, if I am to make a binary decision of accept/reject, I am inclined towards accepting the paper.
Review #2
- Please describe the contribution of the paper
The paper introduces CA-SAM2, a context-aware framework for nucleus instance segmentation that leverages Segment Anything Model 2 (SAM2) and addresses the limitations of existing models by integrating contextual information from neighboring patches.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The dual memory design of CIM innovatively adapts SAM2’s video segmentation memory mechanism to 2D pathological images, explicitly modeling nucleus-tissue microenvironment context and addressing the critical flaw of prior models that focused solely on local patch features.
- By freezing SAM2’s encoder and adding a convolutional branch with MFRB, the framework adapts to medical images without large-scale retraining, balancing performance and computational efficiency—especially valuable for the annotation-challenged pathology domain.
- Comprehensive evaluations on the MoNuSeg and CPM17 datasets demonstrate superior performance over state-of-the-art models. Ablation studies validate the effectiveness of each module, providing robust evidence for the framework’s design.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
While the automatic point prompt generation is practical, its performance slightly lags behind specialized prompt generators (e.g., PromptNucSeg achieves 0.5% higher PQ on MoNuSeg). The prompt design relies on a “one-prompt-one-nucleus” strategy similar to prior work (e.g., PNS [14]), lacking significant innovation.
-
The experiments use relatively small datasets (30 images in MoNuSeg, 64 in CPM17), potentially limiting generalizability across diverse pathological scenarios and lacking large-scale validation on whole-slide images (WSIs).
-
The framework’s performance is tightly linked to SAM2’s memory and prompt propagation mechanisms, originally designed for video segmentation. While the analogy of treating WSIs as video frames is creative, it is currently difficult to disentangle performance gains from SAM2 itself versus the proposed modules.
-
The dual-branch architecture in SAM, where, for example, a ViT branch captures prior features and a convolutional branch extracts domain-specific features, is not novel. The authors should compare their work against at least two existing studies: [1] Lin, Xian, et al. “Beyond adapting SAM: Towards end-to-end ultrasound image segmentation via auto prompting.” MICCAI 2024. [2] Gao, Yifan, et al. “MBA-Net: SAM-Driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation.” MICCAI 2024.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The integration of context memory (CIM) and domain-adaptive feature fusion (MFRB) addresses challenges in nucleus segmentation, with consistent experimental results outperforming baselines. However, the innovation in this work appears to be incremental, consisting of the addition of two modules to an existing dual-branch architecture. The authors should compare their approach to related work utilizing SAM-based dual-branch structures and provide a stronger justification for the motivation behind these two modules.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
Thanks for the author’s clarification, my concerns have been paritally addressed. But after reading the reviews of other reviewers, I decide to maintain this score.
Review #3
- Please describe the contribution of the paper
They develop a pathology image segmentation model by leveraging SAM2, a foundation model trained on large-scale general video datasets for video segmentation. By incorporating various modules to utilize contextual information across patches within pathology images and addressing the domain gap between general images and pathology images, they significantly enhance segmentation performance and achieve high accuracy across diverse datasets.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
To effectively leverage Whole Slide Images (WSIs) for model training, the authors propose a novel approach that extends beyond the commonly used patch-based cropping method. They introduce a video-like data structure to process WSI data more contextually, which is then applied to the SAM2 model.
The authors propose a novel memory bank architecture to retain and utilize information from neighboring patches. Unlike traditional memory banks, their Context Injection Module has two distinct components: a Texture Memory that extracts and stores foreground (nuclei) information, and an Environment Memory that captures background tissue context.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Although the current method processes patches based on a sliding window strategy, it is unclear whether sufficient contextual information is being effectively captured. In particular, the approach does not account for the influence of input order among patches, which can introduce inconsistencies in the contextual representation. Furthermore, a structural limitation exists whereby the first patch lacks access to any previously stored memory, thereby receiving no benefit from the memory bank. This issue becomes more critical when working with datasets like CoNSeP[1], where the distribution of nuclei types is uneven. In such cases, reliance on previously aggregated memory information could introduce unintended bias, potentially degrading model performance. Therefore, a more rigorous analysis of these limitations is warranted.
The proposed MFRB module lacks clear novelty when compared to existing methods. While it leverages an adapter-based structure to enable parameter-efficient training, the architectural innovation beyond simply integrating a convolutional network is not well justified. The distinctions from conventional approaches remain marginal. Furthermore, although the Context Injection Module (CIM) may represent a new attempt within the specific domain of nuclei segmentation, its overall structure and operational mechanism closely resemble those of standard memory bank frameworks. The absence of significant differentiation limits its impact and raises questions about the originality of the contribution.
While the performance improvements demonstrated in Table 1 are promising, it appears that these gains may partially stem from the increased number of trainable parameters in the proposed model. Therefore, although the ablation study results presented in Table 3 are important for understanding the contribution of each component, a direct comparison of the number of parameters with existing methods is also necessary. Such a comparison would help clarify whether the observed performance gains are due to architectural innovations or simply the result of a larger model capacity.
[1] Graham, S., Vu, Q. D., Raza, S. E. A., Azam, A., Tsang, Y. W., Kwak, J. T., & Rajpoot, N. (2019). Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical image analysis, 58, 101563.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
In the third paragraph of the Introduction, “SAM2” should be capitalized (currently written as “sam2”). There are also minor grammatical issues, such as spacing errors, in the Texture Memory part of the Method section. Please double-check writing.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed feature processing module does not demonstrate substantial novelty. While the attempt to reinterpret pathology images as video-like sequences and leverage a powerful foundation model such as SAM2 is noteworthy, the approach lacks sufficient elaboration and analytical depth. Further exploration and rigorous analysis are necessary to validate and justify the design choices, particularly regarding how the sequential structure contributes to segmentation performance and model generalization.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all reviewers for the valuable comments. The reviewers agree that CA-SAM2 has advantages of “novel approach(R1) and architectural(R2)”, “especially valuable(R3)”, “good performance(R1&R2&R3)”. Here we address the main points in their reviews. (R1Q1)Input order We considered input order during design and tested four strategies: Row-wise, Zigzag, Spiral inward, and Spiral outward, with spiral outward performing best. This was omitted due to oversight. We will add details in Experiment and include an “order” parameter in the code for reproducibility. For the first patch, we store training-time texture memory in checkpoints to provide foreground information. (R1Q2)Handling nuclei with different categories Our texture memory stores diverse samples and builds a maximum similarity memory, so by controlling the size of the constructed maximum similarity memory, we can avoid biases caused by different categories. We agree category-specific processing may yield better results and will explore this in future work. Thanks for your suggestion. (R1Q3)Rationality of MFRB design The convolutional branch is supervised by point generation rather than segmentation, so we update domain features first, then use them to refine prior features, thereby leveraging domain information from the regression task. (R1Q4)Overall structure and mechanism of CIM The overall structure of CIM is modified from SAM2. Its mechanism is designed according to task requirements: the foreground is determined by feature similarity, and the background by spatial distance. (R1Q5)Trainable parameters and performance Due to guidelines, we cannot add additional experimental tables and results, but our method freezes the image encoder, therefore the number of trainable parameters does not increase significantly as the backbone grows. We will enrich the relevant explanations in the final version. Thanks for your feedback. (R2Q1)Point Generation Thanks for your question. We used the common regression+classification method to generate points. The deficiencies in the explanation will be corrected in the final version. (R2Q2)Comparison with SAM adaptation methods Considering that we use the Adapter fine-tune method, we compared MedSAMAdapter(MedSA in Table 1, used with Stardist post-processing). If the guidelines allow, we will add other comparison results. (R2Q3&R3Q1)Module motivation and improvement justification SAM2 performs well on natural images but struggles with pathology due to domain gaps. We froze the encoder to reduce costs, while introducing a convolutional branch and designing the MFRB to fine tune and refine prior features, resulting in a 2.1% improvement in AJI (Table 2).Moreover,Existing nuclear instance segmentation models only consider the context within the input image, which limits the segmentation performance. Therefore, we designed CIM activates SAM2’s memory components (encoder, bank, attention, originally designed for video segmentation) in a context-aware manner to better support 2D segmentation, resulting in a 1.0% improvement in AJI (Table 2). Additionally, our texture memory enhances the memory bank with similarity and IoU filtering, resulting in an AJI of 0.736. Thus, CIM contributes to an improvement of over 0.6%. (R3Q2)Prompt design lacks significant innovation As stated in the Introduction, our contribution is not in prompt design. However, we believe automatic point prompt generation is necessary and thus included it as an extension. (R3Q3)Comparison with dual-branch methods Unlike others that use segmentation losses, our convolutional branch is supervised by detection loss, which is different from the task of the SAM2. Thus, we only aim to compensate for the domain features missing in the SAM2, rather than performing deep fusion of the two features and merging the dual branch outputs like other methods. Writing(R1Q6&R2Q4): We will review and revise our manuscript. We sincerely thank all reviewers again for their feedback and suggestions.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper presents a well-motivated framework that adapts SAM2’s video segmentation memory mechanism to pathology image analysis through a dual memory design, addressing contextual modeling limitations in prior work. The proposed architecture balances efficiency and performance by leveraging frozen foundation model encoders alongside a lightweight, domain-adaptive convolutional branch. Comprehensive evaluations and ablation studies convincingly support the method’s effectiveness, warranting acceptance.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A