Abstract

Cell nuclei segmentation is crucial in digital pathology for various diagnoses and treatments which are prominently performed using semantic segmentation that focus on scalable receptive field and multi-scale information. In such segmentation tasks, U-Net based task-specific encoders excel in capturing fine-grained information but fall short integrating high-level global context. Conversely, foundation models inherently grasp coarse-level features but are not as proficient as task-specific models to provide fine-grained details. To this end, we propose utilizing the foundation model to guide the task-specific supervised learning by dynamically combining their global and local latent representations, via our proposed X-Gated Fusion Block, which uses Gated squeeze and excitation block followed by Cross-attention to dynamically fuse latent representations. Through our experiments across datasets and visualization analysis, we demonstrate that the integration of task-specific knowledge with general insights from foundational models can drastically increase performance, even outperforming domain-specific semantic segmentation models to achieve state-of-the-art results by increasing the Dice score and mIoU by approximately 12% and 17.22% on CryoNuSeg, 15.55% and 16.77% on NuInsSeg, and 9% on both metrics for the CoNIC dataset. Our code will be released at https://cvpr-kit.github.io/SAM-Guided-Enhanced-Nuclei-Segmentation/.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3533_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://cvpr-kit.github.io/SAM-Guided-Enhanced-Nuclei-Segmentation/

Link to the Dataset(s)

https://github.com/masih4/CryoNuSeg https://github.com/masih4/NuInsSeg https://conic-challenge.grand-challenge.org/



BibTex

@InProceedings{Swa_SAM_MICCAI2024,
        author = { Swain, Bishal R. and Cheoi, Kyung J. and Ko, Jaepil},
        title = { { SAM Guided Task-Specific Enhanced Nuclei Segmentation in Digital Pathology } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel method for cell nuclei segmentation, adapting U-Net3+ and incorporating the powerful encoder of Segment Anything Model (SAM) to add global representations of the input image.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The method achieves high performance on three datasets.
    • The paper is well-written and the method is well-motivated.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of clarity in figure descriptions (Fig. 2 and Fig. 3).
    • Unclear insights from feature visualizations.
    • Insufficient ablation study on eU-Net3+’s encoder.
    • Typos and grammatical errors throughout the paper.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Clarify which SAM encoder is chosen for the proposed method in Fig. 2.
    • Provide more insights on what can be drawn from feature visualizations in Fig. 3 and how they relate to nuclei segmentation.
    • Add an ablation study on eU-Net3+’s encoder to demonstrate its effectiveness.
    • Carefully proofread the paper to eliminate typos and grammatical errors.
    • Improve the overall clarity and presentation of the paper.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper’s novelty and promising performance are notable, the weaknesses in clarity, presentation, and completeness of the study outweigh the strengths. The authors need to address the issues mentioned above to improve the paper’s overall quality. In its current state, the paper requires significant revisions before it can be considered for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Some concerns raised have been adressed



Review #2

  • Please describe the contribution of the paper

    The authors introduced a novel segmentation framework where we used SAM to provide the broad global representational information to the detailed local feature extraction task-specic eU-Net3+ model using X-GFB. The exprimental results show improvments over other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novelty mainly relys on the particular combination of already existing methods. The expriments are extnsitive and the results show significant improvements.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The innovations in the methodology are somewhat lacking, the proposed methods seem to be picking/packing existing methods. The changes appear rather minor from a deep learning/machine learning perspective.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Information on the particular settings and parameters used for all the experiments should be provided for reproducibility purpose.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The novelty of the proposed method has not been sufficiently highlighted in the description of the methodology. Although the experimental results appear quite positive, more information and details on the exprimental settings and key parameters shoud be provided. Alternatively the source code can be provided for the sake of reproducibility.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the propose method has not been sufficiently highlighted, not enough details or information on the experimental settings from which the result derived.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper utilize the foundation model to guide the task-specific supervised learning by dynamically combining their global and local latent representations, via proposed X-Gated Fusion Block, which uses Gated squeeze and excitation block followed by Cross-attention to dynamically fuse latent representations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method in the article is quite novel, and the experiments are conducted in a thorough manner.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The text contains numerous spelling and grammatical errors which need to be corrected, such as in the last paragraph of the introduction “(2) mparison” and on the seventh page, second line “oweingto”.

    2. Why choose SAM as the encoder? SAM’s training data primarily consists of natural images, hence the “dark knowledge” embedded in the encoder predominantly pertains to natural phenomena. Why not opt for already proposed large models in the medical domain, such as SAM-MED or SAM-3D, which could potentially reduce the domain gap significantly?

    3. It would be beneficial to visualize the segmentation results of other methods for comparison in the experiments. Additionally, the proportion of each component in the combined loss function is not specified.

    4. There are no ablation studies in the experimental section.

    5. Can you briefly discuss potential areas for further improvement in the proposed method?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper proposes a novel image segmentation method assisted by SAM, which is quite innovative and the experiments are detailed. However, there are still several issues that need to be addressed:

    1. The text contains numerous spelling and grammatical errors which need to be corrected, such as in the last paragraph of the introduction “(2) mparison” and on the seventh page, second line “oweingto”.

    2. Why choose SAM as the encoder? SAM’s training data primarily consists of natural images, hence the “dark knowledge” embedded in the encoder predominantly pertains to natural phenomena. Why not opt for already proposed large models in the medical domain, such as SAM-MED or SAM-3D, which could potentially reduce the domain gap significantly?

    3. It would be beneficial to visualize the segmentation results of other methods for comparison in the experiments. Additionally, the proportion of each component in the combined loss function is not specified.

    4. There are no ablation studies in the experimental section.

    5. Can you briefly discuss potential areas for further improvement in the proposed method?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel image segmentation method assisted by SAM, which is quite innovative and the experiments are detailed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Some of the issues raised have been made clear




Author Feedback

We thank the reviewers for taking their valuable time in reviewing our work and providing with insightful comments. We have arranged the replies to the reviewers based on their concerns.

  1. Typos and Grammatical Errors (Reviewer 1, 3): We have carefully reviewed and corrected all typographical and grammatical errors throughout the manuscript.

  2. Selection of SAM Encoder (Reviewer 1, 4): We opted for the base model checkpoint as it contained more ambiguous global information required for nuclei segmentation. We only use the output from the image encoder of SAM. Initially, we intended to use medical domain based vision models like MED-SAM or I-MedSAM but it was not viable options due to the unavailability of the saved checked points at the time, and the authors had indicated that these models were not adequately trained on pathology images for ensuring high performance. Consequently, we opted for SAM despite its limitations. We plan to explore these medical-domain specific models in future work, although we believe that the enhancements offered by our proposed methods will remain beneficial even then.

  3. Clarity in Figure Descriptions (Reviewer 1): We have now clarified the explanation of each picture as follows -
    1. Figure 2 explains our choice of opting for SAM with base model checkpoint. It is done so as to provide the task-specific eU-Net3+ with more ambiguous global guidance for nuclei segmentation. This can be observed in the Figure 2 with the illustrated region where the base model checkpoint has more ambiguous regions than the larger model checkpoints.
    2. Figure 3 analyses effectiveness of our proposed implementation of incorporating global context into the task-specific encoder via X-GFB as it provides a more detailed bottleneck representation.
  4. Insights from Feature Visualizations (Reviewer 1): To provide clear insights, we have revised the description of Figure 3 clearly explaining each of the sub-figures. We show that the encoding after the application of the proposed X-GFB produces a better and detailed representation at the bottleneck of the network. The bottleneck compresses all learned features into a dense encoding. It is widely recognized that a more detailed and rich representation at the bottleneck enhances the model’s performance. Figure 3 clearly shows that our proposed method of fusing SAM encoding with our task-specific eU-Net3+ encoding using X-GFB provides a better representation leading to better performance which we show through our experiment results.

  5. Ablation Study (Reviewer 1, 3): A detailed ablation study is indeed needed for the validation of the proposed methodology but we are unable to do so due to the constraints of the rebuttal submission guidelines. But we do think that the significant performance improvements and achievement of SOTA results by incorporating our proposed methods demonstrate the effectiveness of our approach. Additionally, we have attempted to mitigate the absence of a full ablation study by providing justifications through intuitions from Figures 2 and 3, which illustrate the rationale behind our method.

  6. Description of Novelty (Reviewer 2): We agree that the novelty of our method could be highlighted more prominently and we have done so in Section 2.

  7. Experimental Details and Source Code (Reviewer 2, 3): We appreciate for pointing out the need for detailed and some missing experimental settings and made changes as below -

    The training process spans 50 epochs, with an initial learning rate set to 0.0001 and batch size of 16. We use Adam optimization along with drop out of 0.3. For the loss function, we use a combination of weighted Dice and focal loss with equal weights, aiming to balance the training focus between prevalent and rare segmentation targets. We plan on making the source code publicly available upon publication ensuring the reproducibility of our results.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors’ rebuttal has provided good responses and addressed the reviewers’ concerns regarding the implementation details and experimental studies. All reviewers are now inclined to accept this paper. The final version should incorporate those updates and further resolve the raised issues.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors’ rebuttal has provided good responses and addressed the reviewers’ concerns regarding the implementation details and experimental studies. All reviewers are now inclined to accept this paper. The final version should incorporate those updates and further resolve the raised issues.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I don’t think R1’s and R4’s concerns are addressed by the rebuttal. We need to judge the paper by its form at submission instead of relying on promised improvement.

    • Lack of comparison with SAM-based methods: There are many papers on reusing SAM-encoder and the paper doesn’t compare with them.

    • Lack of ablation studies: There are various design choices that need to be validated to understand where does the performance improvement come from.

    • Minor: The paper has too many typos and grammatical errors. The camera ready quality may not be guaranteed. The paper in its current form is below the bar of MICCAI publication.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I don’t think R1’s and R4’s concerns are addressed by the rebuttal. We need to judge the paper by its form at submission instead of relying on promised improvement.

    • Lack of comparison with SAM-based methods: There are many papers on reusing SAM-encoder and the paper doesn’t compare with them.

    • Lack of ablation studies: There are various design choices that need to be validated to understand where does the performance improvement come from.

    • Minor: The paper has too many typos and grammatical errors. The camera ready quality may not be guaranteed. The paper in its current form is below the bar of MICCAI publication.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the paper does need substantial improvement in its grammar and language, the overall idea is novel and very effective. Incorporating the review suggestions including ablation tests and comparison to other SOTA papers using SAM-encoder.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    While the paper does need substantial improvement in its grammar and language, the overall idea is novel and very effective. Incorporating the review suggestions including ablation tests and comparison to other SOTA papers using SAM-encoder.



back to top