Abstract

Nuclei segmentation in cervical cell images is a crucial technique for the automatic diagnosis of cervical cell pathology. The current state-of-the-art (SOTA) nuclei segmentation methods often require significant time and resources to provide pixel-level annotations for training. To reduce the labor-intensive annotation costs, we propose DES-SAM, a box-supervised cervical nucleus segmentation network with strong generalization ability based on self-distillation prompting. We utilize Segment Anything Model (SAM) to generate high-quality pseudo-labels by integrating a lightweight detector. The main challenges lie in the poor generalization ability brought by small-scale training datasets and the large-scale training parameters of traditional knowledge distillation frameworks. To address these challenges, we propose leveraging the strong feature extraction ability of SAM and a self-distillation prompting strategy to maximize the performance of the downstream nuclear semantic segmentation task without compromising SAM’s generalization. Additionally, we propose an Edge-aware Enhanced Loss to improve the segmentation capability of DES-SAM. Various comparative and generalization experiments on public cervical cell nuclei datasets demonstrate the effectiveness of the proposed method.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2521_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2521_supp.pdf

Link to the Code Repository

https://github.com/CVIU-CSU/DES-SAM

Link to the Dataset(s)

https://cs.adelaide.edu.au/~carneiro/isbi14_challenge/dataset.html https://pan.baidu.com/share/init?surl=0DpPbvYlt7urt34Mc6JOOg

BibTex

@InProceedings{Hua_DESSAM_MICCAI2024,
        author = { Huang, Lina and Liang, Yixiong and Liu, Jianfeng},
        title = { { DES-SAM: Distillation-Enhanced Semantic SAM for Cervical Nuclear Segmentation with Box Annotation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper focuses on nuclei segmentation in cervical cell images with the primary goal of reducing the resources required for annotating training images. For this purpose, the authors proposed DES-SAM a BB-supervised network that leverages SAM to generate pseudo-label and a self-distillation prompting strategy to maximize the performance of the downstream semantic segmentation task. Additionally, the paper proposed an Edge-aware Enhanced Loss to boost the segmentation capabilities of the proposed DES-SAM.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper targets an open issue in different research domains, i.e., reducing the amount of pixel-level annotations required to perform automatic segmentation without losing the overall accuracy.
    • The author(s) leverage well-established existing approaches (i.e., SAM, self-knowledge distillation) and propose an extension of the edge-aware loss to achieve their goal.
    • The obtained results seem to be promising.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Although interesting in the proposed application, the paper is hard to follow, and important details are missing. For example, the multi-scale light weight detector presented in Section 2.1 is not described at all. The authors only mentioned that the SAM image encode is employed and that “we utilize simple convolutions or deconvolutions operations to address the multi scale variation of the target cell nucleus”. How such a conv/deconv are used to achieve the goal?

    • In Section 2.1. an Ldet loss is introduced (not explained) and never used in the following of the paper.

    • The “Distillation Enhancement Segmentation” is again quite unclear for the reader (at least to me). How are the feature maps fed to both the teacher and the student? What is Tori? How are the learnable tokens learned?

    • Concerning the “edge-aware enhanced loss”, which seems to be one of the major contributions of the paper, it is not clear how the authors’ proposal is different w.r.t. to [11].

    • The overall model optimization flow is hard to understand (even with the figure provided).

    • Are the box prompts displayed in the light blue and light red boxes of Fig. 2 the same? How are they generated?

    • It is unclear how the numbers in Table 2 are obtained for the competitors. Is the protocol employed the same one used for the proposed DES-SAM? Why Target B is not considered in such a table?

    • For the evaluation, the author concentrated the majority of the experiments on the CNSeg Datataset which has been only recently released and none of the existing methods in the literature for nuclei segmentation have been evaluated on it. This makes the comparison with the fully supervised method (Table 3) weaker.

    • It is not clear whether tables report the average of multiple runs for each experiment or just a single run (as it seems to be). The performance improvement is relatively small and multiple runs with different initialization seeds should be performed to make the evaluation statistically relevant.

    Minor comments:

    • Double quotes are wrongly displayed in the document (e.g. in the caption of Table 1).
    • Non-explained wording is used for the annotation types in the caption of Table 1.
    • The font of Figure 2 is excessively big.
    • Small to no related work is provided in the paper.
    • In the supplementary material, what is the “Image Encoder” backbone reported in Table1?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although not mandatory, publishing the code will give the reader a chance to reproduce the experiments. There are a lot of details involved in the implementation that cannot be condensed in 8-pages paper, making the experiments impossible to be replicated.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposal seems to be promising, but in its current form, it is impossible to understand the details of the proposed model, discern the contribution w.r.t. to the existing literature, and replicate the experiments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The author(s) have addressed some of the concerns raised in the original reviews and clarified important aspects. Even if open issues are still present, I raised my score to Weak Reject. If the paper is accepted, I strongly encourage the author(s) to include the clarifications provided in the rebuttal also in the paper and I am looking forward to seeing the source code.



Review #2

  • Please describe the contribution of the paper

    A box-supervised segmentation network is proposed to increase the generalization capability of SAM, and is based on the self-distillation prompting strategy. An Edge-aware Enhanced Loss function is also proposed to refine the segmentation boundary of the cell nucleus.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1-A self-distillation prompting strategy is proposed to enhance the generalization capability of SAM for Nuclei segmentation. 2-The method is efficient in the case of a limited-labeled data regime. 3-An Edge-aware Enhancement loss function is proposed to increase performance, specifically at nucleus boundaries.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The size of K × K window used for Edge-aware Enhanced Loss is not mentioned in the paper.
    2. The visual results are missing in the main paper.
    3. Comparison of semantic segmentation with instance segmentation is performed any explanation for this?
    4. The prompt encoder and prompting strategy are not explained in the method section.
    5. The paper proposes a Distillation Enhancement Segmentation strategy, however it’s not evaluated how this method is better than using a single network, i.e. teacher model without the student model ?
    6. The effect of visual tokens as input, and learnable token for student model needs further explanation. How either of them adds value to overall architecture.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The naming strategy is not consistent between the figure 2 and the text used to describe the network architecture. in text the paper mentions student network D sand the teacher network D, while in the figure they are named as mask-head student and mask-head teacher.
    2. The mask-head teacher in figure 2, is not receiving image feature maps extracted by SAM, as mentioned in the text.
    3. The SAM reference is from arXiv better to use the original reference
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper propsoes several compnenets, however its not clear whats the motivation for each newly developed module.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    After reviewing the rebuttal, I acknowledge that the authors have addressed most of my concerns. However, one significant issue still needs to be addressed: evaluating the effectiveness of the Distillation Enhancement Segmentation strategy. Specifically, it needs to be clarified how this method outperforms a single network (i.e., the teacher model without the student model or vice versa). This is a critical aspect that needs to be demonstrated to validate the proposed method’s superiority. Additionally, there are still some issues regarding the paper’s clarity, including some figures that need improvement for better understanding. Given that most concerns have been satisfactorily addressed and considering the overall contributions of the paper, I am willing to raise my score from weak accept to accept.



Review #3

  • Please describe the contribution of the paper

    This manuscript addresses a practical task of nuclei segmentation with box annotations from cervical cytology images. To achieve efficient and generalizable nuclei segmentation performance, the authors first utilize the SAM backbone to leverage its advantages in rich semantic feature extraction. They then follow with a multi-scale lightweight detection module and a distillation enhancement segmentation module. To refine the nuclei boundaries, the authors propose a combination of window-based local pairwise loss and global pairwise loss. To further enhance the model’s generalization, the authors build a teacher-student network with a self-distillation prompting strategy. This work demonstrates the model’s effectiveness through comparison, ablation, and generalization experiments on two datasets, CNSeg and ISBI2014.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Task significance. This work emphasizes the importance of box-supervised and generalizable nuclei segmentation. Box annotations are more common than pixel annotations in cervical cytology images, making a box-supervised framework adaptable to more scenarios. Building a generalizable model that performs well on unseen datasets is crucial for the computational cytology community and provides valuable insights.

    2) Method novelty. This work primarily includes three methodological innovations: introducing the SAM backbone to equip the nuclei segmentation model with powerful semantic representations, designing a self-distillation prompting strategy to address the small data problem and develop a generalizable model, and further designing the Edge-aware Enhanced Loss to refine the segmentation boundaries.

    3) Writing. The writing in this manuscript is commendable for its logical flow, use of professional terminology, and accuracy in descriptions, especially for objective analysis of experimental results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major drawbacks of this manuscript are as follows:

    1) Comprehensiveness and rigor in data set utilization and experimental design. Firstly, I strongly suggest that the comparative and generalization experiments in this work need to be conducted separately. For example, Table 1 should not include the results of the PatchSeg train ISBI test, which is essentially a cross-dataset generalization test. Moreover, for generalization testing, Target B should also be included to avoid bias. Finally, the utilization of the ISBI dataset does not follow the official challenge [1] split, including train and test. If these major issues are addressed, I would highly recommend this work. [1] https://cs.adelaide.edu.au/~carneiro/isbi14_challenge/dataset.html

    2) Research scope. The research and dataset scope of this work are focused on specific cytology nuclei, but from the methodology and motivation, I did not see its limitations on pathology image nuclei, which have more application value and publicly available datasets. This is worth exploring in future work by the authors.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The following are high-level comments/recommendations from the reviewer:

    1) Please explain the consistency and the reason for choosing a different dataset and setting in the “Comparison to Fully supervised Methods’ experiment compared to the ‘Comparison to Weakly supervised Methods‘ experiment to avoid bias.

    2) Please explain why this work only focused on nuclei segmentation and ignored the cytoplasm, which is also significant in cytology images. If the focus is only on nuclei, the authors should conduct experiments on pathology images.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the mentioned strengths, drawbacks and recommendation of this manuscript.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I am looking forward to seeing the updates and analysis mentioned in the rebuttal in the final version, and I maintain my initial score.




Author Feedback

We highly appreciate the reviewers’ insightful feedback. We are encouraged that they found out our self-distillation prompting strategy (R4, R5), the clarity of our writing (R5), and the promising proposals we have put forward (R6).

  1. Details of multi-scale lightweight detector (R6) and the L_det (R6) Our lightweight detector is borrowed from VitDet (Li et al., 2020) which builds a simple multi-scale feature pyramid from a single-scale feature map using strided convolutions/deconvolutions, along with the detector head from Faster R-CNN and hence L_det shares the same regression and classification losses with Faster R-CNN.
  2. The details of Distillation Enhancement Segmentation and the overall model optimization flow (R4, R6) We initially employ SAM’s prompt encoder to encode the boxes output by the lightweight detector, thereby obtaining the corresponding box prompt tokens (the green boxes in Fig. 2) (R6). The teacher-student network, based on SAM’s mask decoder, takes image features and box prompt tokens concatenated with output tokens as inputs. The box prompts are fed into both teacher and student network (R6). Unlike the teacher network that uses a frozen mask decoder and frozen output tokens from SAM, the student network finetunes both the mask decoder and output tokens. We highlight the learnable output tokens as a visual prompt (the red box in Fig.2) (R4, R6). We apologize for the undetailed description, and we will add details and release the code for better understanding.
  3. The difference between the edge-aware enhancement loss and [11] (R6) and the size of local pairwise loss window (R4) Our Edge-aware Enhanced Loss is built upon [11] that rasterizes polygons predicted from the model to obtain masks, while our method associates proposals and masks through box prompts (R6). We calculate local pairwise loss with 3×3 window (R4).
  4. The experiments on TargetB (R5, R6), the separation of comparative/generalization experiments (R5) We conducted experiments on TargetB and the results will be included in the paper (R5, R6). We will also separate the comparative/generalization experiments and update the ISBI results following the official split (R5).
  5. The choice of CNSeg dataset (R5, R6) and the protocol of Table 2 (R6) Our current work primarily revolves around cervical nuclear segmentation, and CNSeg is the largest publicly available dataset currently accessible. We also applied our method to histology image nucleus segmentation (i.e., MoNuSeg and CoNSeP), and observed comparable performance to the SOTA methods (R5, R6). We were unable to include these results due to space constraints. All the models listed in Table 2 obey the same protocol (R6). 6 The marginal performance improvement (R6) Our model’s experimental results are averaged from three trials. We acknowledge our method only achieves comparable to SOTA methods such as BoxSnake [12]. However, it is important to note that our method significantly reduces the number of trainable parameters, amounting to only 40% of the parameters used in the BoxSnake.
  6. Comparison with fully supervised semantic segmentation and instance segmentation models in Table 3 (R4, R6) Semantic segmentation and instance segmentation models both serve the purpose of nucleus segmentation (R4). We evaluate on the new CNSeg dataset, posing a challenge against fully supervised methods. Yet, our weakly supervised approach enables comparable performance without extensive labeling. Despite challenges, we’re committed to our approach’s effectiveness and reliability (R6).
  7. Reproducibility and the backbone (R6) We will release the code, learned models, and training logs for reproducibility. The backbone reported in the supplementary material is SAM-Base’s backbone.
  8. Visual results and the teacher’s performance (R4) The visual results are included in the supplementary material and the teacher’s performance are listed in Table 4 without using prompting.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has successfully addressed most of the reviewers’ concerns. Overall, this paper presents interesting and inspiring ideas, and achieves promising empirical results. However, some minor issues still exist, including unclear methodology details, hard-to-understand figures, and lack of publicly available code repository. The final version should be revised to address those remaining concerns.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has successfully addressed most of the reviewers’ concerns. Overall, this paper presents interesting and inspiring ideas, and achieves promising empirical results. However, some minor issues still exist, including unclear methodology details, hard-to-understand figures, and lack of publicly available code repository. The final version should be revised to address those remaining concerns.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    accepts

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    accepts



back to top