Abstract

Deep learning methods have demonstrated promising results in cervical lesion cell detection. Training detection models that generalize well typically require a large amount of cell-level annotations that are expensive and time-consuming to obtain. Instead, weak slide-level annotations, which entail assigning a gigapixel whole slide image (WSI) with a single label, are easier to acquire. However, due to significant differences in annotation scales, they cannot be directly utilized to assist in the training of cervical cell detectors. To address this challenge, we propose a Twin-memory augmented Multiple Instance Learning (Twin-MIL) framework to refine cervical lesion cell detection. Firstly, we utilize the multiple instance learning to bridge the gap between cell-level and slide-level tasks. Then, we reduce false positives in conventional MIL by introducing a twin-memory module, which improves the classification capability by capturing more discriminative patterns of positive and negative cells. We also propose uncertainty-regulated negative instance learning to enhance the robustness of negative latent space against noisy instances and its separability from the positive one. Experiments indicate that our method is effective in enhancing different detection models trained on the datasets with varying annotation levels.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0999_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{FeiMan_Weakly_MICCAI2025,
        author = { Fei, Manman and Song, Zhiyun and Shen, Zhenrong and Liu, Mengjun and Wang, Qian and Zhang, Lichi},
        title = { { Weakly Semi-Supervised Cervical Lesion Cell Detection via Twin-Memory Augmented Multiple Instance Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {642 -- 652}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a weakly semi-supervised learning framework for cervical lesion cell detection by using limited cell images with fine-gained bounding box labels and a large number of whole slide images with global labels. A multiple instance learning strategy using twin memory augmentation and an uncertainty-regulated negative instance learning are proposed in the framework. The authors conducted experiments using various detectors and different dataset conditions to verify the effectiveness of the proposed method.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The way of using data (limited cell images with fine-gained labels and a large number of whole slide images with global labels) is new in the area of cervical cancer cell detection. (2) The proposed method is a plug-and-play module, which can be embedded into many advanced detection networks to improve performance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) The technical contribution might be marginal, as most of the components of the framework come from existing methods, such as top-K ranking-based MIL [7], memory bank [3], and uncertainty [16], (2) Many key details regarding the proposed method and model training are missing. (3) The overall clinical significance might be limited as the work mainly focuses on the detection of cervical cells (normal and cancer), however, for general cervical cancer diagnosis, the clinicians might be more interested in more fine-gained types, such as NILM, ASC-US, LSIL, ASC-H, HSIL, SCC, and AGC.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    (1) The description of the twin-memory module is not very clear. As shown in Fig. 3(a), there are two operations, “+” and “C”, which are not introduced in the main text. Please add these details. (2) Some key information might be missing in the following sentence: “When sending the negative embedding X^n to the positive memory bank, the positive query scores denoted as S_r^{n;p} should be 0 ∈ R^{NHW}. Similarly, the negative query scores S r n;n should be constrained to 1 ∈ R^{NHW}.” Please revise it. (3) Is BCE in Eq. (3) binary cross-entropy? Please specify it in the manuscript. (4) Some details about the augmented negative memory features and the augmented positive memory features are missing, and I am unclear about M_{aug}^{n}= SM^{n}. (5) The details of the network for slide classification are missing. (6) Are the two datasets used in this work public or private? If they are private, please specify their origin. (7) How to determine the value of the hyperparameters is not known. Please explain it. (8) The framework is trained in a cascaded manner. In the first stage, the detection work is trained with the labeled cell-level images. Then the trained model is used to select the top 8 tiles according to the detection scores. As a whole slide image is very huge, the computational cost would be costly in the inference of slides. Are the top 8 tiles of each slide fixed during the following fine-tuning? Besides, the selection rule is missing. The batch size is 2, including a positive case and one negative case. Do the authors mean two slides or two tiles?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work presents a new pipeline to leverage both limited labeled cell images and a large number of globally labeled whole slide images to improve the performance of cervical cancer detection. The comparison with existing methods and ablation study on key components verify the effectiveness of the proposed method. Despite the improvement, there are still many concerns, particularly the lack of important details about implementation and training. The current performance of the proposed method might be far away from the real-world deployment.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed most of my major concerns.



Review #2

  • Please describe the contribution of the paper

    This paper presents Twin-MIL, a novel framework designed to improve cervical lesion cell detection using only weak slide-level annotations instead of costly cell-level labels. The method leverages Multiple Instance Learning (MIL) to connect slide-level and cell-level tasks. It introduces a twin-memory module to reduce false positives by learning better patterns of positive and negative cells, and an uncertainty-regulated negative learning strategy to strengthen robustness against noisy data. Experiments show Twin-MIL effectively enhances various detection models across different annotation settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Methodology: Twin-MIL bridges the gap between cervical lesion cell detection trained on LCA and slide classification using WSA by sharing the detector with top-K ranking-based MIL. The method is simple yet effective.
    2. Robustness: Even with only 40% annotation, the model still can outperform the baselines by over 3% on different metrics.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Lack of comparison: The method are compared with powerful detection model in general/general medical domain, but was not compared to those to the cervical-specific cell detection methods, such as [1][2]. [1] Zhang, Zheng, et al. “Scac: A semi-supervised learning approach for cervical abnormal cell detection.” IEEE Journal of Biomedical and Health Informatics (2024). [2] Chai, Siyi, et al. “DPD-Net: Dual-path Proposal Discriminative Network for abnormal cell detection in cervical cytology images.” Biomedical Signal Processing and Control 89 (2024): 105887.
    2. Lack of details: For example, how many annotated cells from cytologists? How large is the memory bank, will it affect the training speef?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper uses MIL to help improving lesion cell detection in WSIs. It is expected that the authors can show how this cell detection method can be combined with MIL in cervical WSI analysis in the future.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The lack of details and comparison makes me give 3. If the authors can address my concern, I am willing to my increase score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I appreciate the authors’ efforts in addressing the concerns raised in the initial review. Based on the revisions and explanations, I am willing to revise my recommendation from a weak reject to a weak accept. However, I still have reservations regarding the lack of comparison with SOTA cervical-specific cell detection methods.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel framework to enhance detection performance by introducing an additional supervision. It first trains a tile detection model, and then trains a MIL classification model with a shared encoder. In MIL model, negative features are regulated to overcome the sparsity of positive tiles. Metrics demonstrate the effectiveness of the additional MIL process.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper manages to improve the detection performance at a low additional cost(WSA). This poses application value.
    2. The proposed addtional supervised training process can act as a plug-and-play module. It’s applicable in any encoder-decoder detection structure.
    3. The Twin-memory module and the UNIL module takes the sparsity of postive objects into consideration, which is a most challenging issue in cytopathology.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. “Weakly Semi-Supervised” may be a little misleading. I fully understand that from the perspective of cell detection, slide-level annotation is considered as weak supervision, but the proposed framework still relies on fully-annotated cell detection datasets, instead of only “weak” supervision(slides classification annotation) or “semi” supervision(partly annotated datasets). Maybe “multi-task learning” or “auxiliary learning” is more suitable to describe the application value of this paper?
    2. How is the training strategy determined? Why doesn’t the decoder in detection model require additional fine-tuning after training of MIL model? Why don’t train the detection and the MIL model altogether or jointly? This paper trains the main task first, and then trains the auxiliary task, which is an unpromising strategy in multi-task learning, mainly concerning on over-focusing on the auxiliary task and deoptimizing the main task.
    3. What’s the relationship between the LCA and WSA datasets? I guess tiles in LCA dataset are cropped from the WSI in WSA dataset. Please clarify it, due to this has a significant impact on the assessment of the application value of this article.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has significant application values in lesion cell detection. The MIL module is effective yet costless in improving the detection performance. This papers makes no modifications to the detector, while proposing two modules in MIL model to improvements to fit in the lesion cell detection task. This allows the MIL assisted detection improvement strategy to act as plug-and-play on any similar situations. Further elaboration could enhance the clarity of this paper’s practical values.‌

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Reviewer1

1.About the technical contribution. The novelty of our work lies in proposing a unified framework that effectively bridges the gap between slide-level annotations and cell-level annotations. Unlike previous studies, which primarily focus on slide or tile classification, our method directly utilizes slide-level labels to enhance cell-level detection, addressing a challenging and underexplored issue in cervical cytology. We introduce a twin-memory module that jointly models both positive and negative prototypes, improving supervision and mitigating the bias commonly seen in single-memory approaches. And we incorporate uncertainty-regulated learning to enhance the robustness and separability of feature representations, helping to reduce false positives. 2.The details about implementation and training. 1) In Fig. 3(a), “+” denotes element-wise addition, and “C” denotes the concatenation operation. BCE in Eq. (3) is binary cross-entropy. 2) The augmented memory features are obtained via M_aug = SM, where S indicates whether instances are similar to memory banks. M_aug generated by a read operation is represented as M_aug. M_{aug}^{n} = SM^{n} are obtained by the negative input through the negative memory bank. The augmented positive memory features are obtained by the positive input through the positive memory bank. 3) The details for slide classification: In Fig. 2, utilizes X_{aug}^{n} and X_ {aug}{p}, followed by a FC layer to generate classification results. 4) We conducted grid search for key hyperparameters and selected the set of parameters that performed best on the validation set. 5) The two datasets are private. All samples were obtained from Shanghai Medical College Hospital, Shanghai Cancer Hospital, and Suzhou Dushu Lake Hospital. 6) In the fine-tuning stage, the top 8 tiles per slide are fixed. Each batch contains 2 slides, with 8 tiles per slide. We will make revisions and additions in the final version. 3.About the clinical significance. Our work focuses on detecting potential lesion areas for further clinical attention. In real-world cervical cancer screening, identifying abnormal cells is crucial, as it enables early intervention and reduces missed diagnoses. While we are currently focused on binary detection, our framework lays a foundation for fine-grained classification, which we plan to explore in future.

Reviewer 2

1.About the “Weakly Semi-Supervised”. We will consider rephrasing the title or terminology to avoid confusion in future revisions. 2.About the training strategy. Thank you for your comments. The decoder does not require additional fine-tuning because the MIL-guided refinement directly optimizes the feature representations without altering the localization head. Joint training was considered, but in our preliminary experiments, it led to unstable optimization and performance degradation due to conflicting objectives. Our staged training strategy ensures that the main detection task remains the focus, while the auxiliary MIL task provides meaningful guidance without dominating the learning process. 3.About the LCA and WSA. The tiles in the LCA dataset are cropped from the WSI. We divided the datasets based on patient case IDs, ensuring there is no overlap.

Reviewer 3

1.About the comparison with cervical-specific cell detection methods. In Tab. 1, we have compared our method with Cascade RRAM and GRAM, which is a cervical-specific cell detection method. Our method serves as a plug-and-play module, which can be integrated into various detectors to improve performance. In Tab. 1, integrating our method into RetinaNet and DINO, demonstrating its effectiveness across different detectors. Additionally, we will include SCAC and DPD-Net in the Introduction, and incorporate comparisons in future. 2.About the cell annotations and memory bank size. The LCA dataset includes 5,035 tiles with 7,321 annotated cells. Considering both performance and speed, we set the size of the memory bank to 60.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal successfully convince the initial negative reviewers.



back to top