Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cervical cancer is the only cancer that can be eliminated, yet it causes over 300,000 deaths annually. Early detection of its precancerous lesions can significantly reduce both incidence and mortality rates, while the process is labor-intensive and demands highly trained professionals. The application of artificial intelligence for cervical cell detection shows great promise but frequently encounters challenges such as limited data scale and class imbalance, stemming from the difficulties associated with expert annotation and the diverse types of cervical cells. To address this, current studies tend to design advanced detection models, while little attention is given to the potential improvements of data augmentation. In this work, we innovatively present the first controllable image synthesis workflow with adaptive cell segmentation and style transfer to synthesize realistic cervical cell images with bounding box annotations. Specifically, an adaptive cell segmentation method was introduced to cut target cells of varying sizes and morphologies from real images. These cells are then controllably pasted onto blank backgrounds to synthesize coarse images, which were further refined to realistic ones through the style transfer approach. The extensive experiment on a private long-tailed dataset demonstrated that our proposed workflow can generate realistic cervical cell images, thereby enhancing model training and improving the performance of cervical cell detection, generally and categorically. The code is available at https://github.com/huyihuang/ImageSynthesisForCCD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2740_paper.pdf

SharedIt Link: https://rdcu.be/eHw8j

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_9

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HuYih_Controllable_MICCAI2025,
        author = { Hu, Yihuang AND Chen, Qi AND Liao, Linbo AND Lin, Weiping AND Wu, Huisi AND Wang, Liansheng},
        title = { { Controllable Image Synthesis Workflow for Enhancing Cervical Cell Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {85 -- 95}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed an image synthesizing based new data augmentation method. The experiment result demonstrated that the proposed method could generate realistic cervical cell images, thereby enhancing model training and improving the performance of cervical cell detection.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposed a controllable image synthesis workflow with adaptive cell segmentation and style transfer to synthesize realistic cervical cell images with bounding box annotations. The paper has a good structure and is well organized.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1.The author said this is a first approach to enhance the detection task from a data perspective. But after searching google scholar with keys ‘data augmentation’ and ‘cervical cell detection’. There are many other similar works studying this domain. For example, there is a review paper about this topic ‘Data Augmentation Techniques to Detect Cervical Cancer Using Deep Learning: A Systematic Review’. 2.The main purpose of cervical cell detection is to detect the abnormal cervical cells to prevent the cervical cancer. But from the results in the paper, the detection performance of some kinds of abnormal cells are low (0.139 for ASC-H, 0.232 for ASC-US, and 0.147 for SCC). 3.There are lack of comparison with other data augmentation methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are lack of comparisons with other data augmentation methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I agree to accept it as a poster paper.

Review #2

Please describe the contribution of the paper
The authors propose a novel generative data augmentation pipeline for cervical cell images aimed at improving the performance of deep learning-based cell detection models specifically YOLOv11. This approach addresses the common challenge of limited data availability in cervical cell image datasets.

The augmentation process consists of two stages:
- Coarse Approach: Single cells are extracted from training images using CellPose with an adaptive cell diameter. These cells are then arranged and pasted onto white backgrounds to generate synthetic training samples.
- Refined Approach: The synthetic images created in the coarse step are further enhanced using image-to-image style transfer techniques to create more realistic representations. Three style transfer models are evaluated: CycleGAN, FastCUT, and CUT.
The overall pipeline is inspired by the CutPaste augmentation framework (Dwibedi et al., 2017) but extends it with task-specific improvements such as dedicated cell segmentation and style transfer models.

Experiments are conducted on a private dataset comprising 11,000 images. The augmented data is shown to improve detection performance in specific scenarios compared to models trained only on real data. Among the style transfer methods, CUT leads to the overall best performance in the final detection task suggesting it is the most suitable for this augmentation pipeline.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors provide a good empirical analysis of their generative augmentation method, evaluating the impact across multiple stages: from segmentation models used to extract individual cells to the final detection performance on both general and specific cell types using YOLOv11.
- The proposed method creatively combines several established techniques notably CutPaste-style augmentation and image-to-image style transfern, tailored to cervical cytology.
- The authors demonstrated an impact on detection performance. The refined augmentation approach shows measurable (although small-scale) improvements in YOLOv11 performance compared to baseline training,
- The paper is generally well-written and accessible, with clear explanations and visualizations that make the methodology and findings easy to understand. The two-stage augmentation pipeline (coarse and refined) is logically structured and easy to follow.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Potential information leakage into validation set: The pipeline includes cutting cells from the validation set, which are later used to generate augmented training data for the detection model. While this does not affect the test set, the reuse of validation images during training is a form of information leakage that could inflate detection model performance estimates in the training phase. The paper does not address this issue or analyze its possible implications.

Lack of baseline comparison to standard augmentation: The effectiveness of the proposed generative augmentation strategy is not compared to conventional data augmentation techniques such as flipping, rotation, scaling, and cropping. This makes it difficult to assess whether the complex generative approach provides any benefit over simpler, widely used strategies.

Lack of related work in generative cell augmentation: The paper does not adequately acknowledge prior work on generative cell image augmentation, which has partly been shown to improve downstream deep learning tasks. E.g. SynCellFactory (Sturm et al., 2024), Mitogen (Svoboda et al., 2017), Cell Image Generator (Scalbert et al., 2019, also based on style transfer), and the broader overview by Kozubek (2020, doi/10.1002/cyto.a.23957). Also the authors mention that ‘few studies exist’ in applying CutPaste to cervical cells, but refrain from citing any existing previous approaches.

Unclear dataset composition and mixing strategy: It remains unclear for me how real and synthetic images are mixed during training. The exact ratio of augmented to real data is typically a critical factor in generative augmentation pipelines, yet the authors do not provide details or justify their design choices in this regard.

Limited evaluation of segmentation quality: Although CellPose is a key component of the pipeline, its segmentation performance is only demonstrated on eight curated examples without any quantitative metrics. This anecdotal evidence is insufficient, especially given that segmentation quality directly affects the realism and utility of the augmented images.

Propagation of segmentation errors: As segmentation errors from CellPose are directly embedded into the synthetic images, they may negatively affect downstream detection performance. No analysis is provided on how robust the detection model is to such imperfections in cell extraction.

Weak detection results and missing analysis of failure cases: The overall detection performance remains modest and lacks detailed discussion. Metrics such as false positives which would be interesting given that StyleTransfer models are known for producing artifacts are nor reported.

Lack of dataset availability: The dataset used in this study is private and does not appear to be released, which significantly hinders reproducibility and limits the ability of others to validate the findings or apply the method to related datasets.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

I noticed some minor typos: ‘sdvanced models’ and in table 3: ‘numers per images’.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While I acknowledge the authors’ effort in the design of the method and the extensive experimental evaluation, several critical limitations led to my decision:
- The approach is strongly tailored to a single private dataset with one specific cell type and visual style, raising concerns about generalizability.
- The lack of comparison to standard augmentation techniques (e.g., flipping, rotation, cropping) makes it unclear whether the proposed generative pipeline provides meaningful added value.
- The potential for error propagation from the CellPose segmentation step is not analyzed, despite being a crucial component of the pipeline.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I thank the authors for their detailed rebuttal.

They provided several missing but crucial details, which led me to revise my score to a ‘Weak Accept’. Most notably, they clarified that their data synthesis method is applied in addition to standard augmentation strategies already used in YOLOv11. They demonstrated a performance gain when their generative augmentation strategy is included, which addresses one of my main concerns.

They also promised to position their paper more clearly within the context of prior work and committed to releasing their dataset.

They also addressed key technical issues such as the mixing ratio between synthetic and real training data and provided quantitative evaluation of CellPose performance, which I consider sufficient at this stage.

However, significant limitations remain as also noted by other reviewers: In particular the performance gains are still moderate, and the issue of information leakage from the validation set remains unresolved. Although this does not affect the test set it undermines the use of validation data for tuning and should be more carefully handled or acknowledged.

In conclusion i decided on a ‘Weak Accept’.

Review #3

Please describe the contribution of the paper

This paper introduces a methodology to augment cervical cell detection datasets by copying cells within the dataset to blank background and using a style transfer methodology to make the images look more realistic.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposes an innovative and controllable approach to perform data augmentation that is very intuitive, controllable and explainable. It’s a much more interesting strategy than the typical “rotating”, “cropping”, “gaussian blurring approaches” while it also doesn’t require creating a latent space of data distribution as done typically with diffusion models. The figures do an excellent job of communicating the methodology. It’s great to see the methodology validated for a variety of diseases as well. I also appreciate the explanation on the performance of the models for head and tail categories, how tail categories improve with more images due to their rare nature.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

I would like to see how the methodology performs over other linear transformations (gaussian blur, cropping, rotating, contrast) or corruption of the dataset to perform data augmentation. Also there are a few research works that perform cervical cell data augmentation using GAN, it would be great if there was a comparison with this methodology in a “Related Work” section or ideally using the same dataset for evaluation. I would also like the authors’ opinions on how well the model would perform for an external dataset. So would the data augmentation help with the trained model perform well with unseen data?

CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification Cervical Cancer Single Cell Image Data Augmentation Using Residual Condition Generative Adversarial Networks
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

There are a few typos in the paper: (sdvanced in page 2, cervial in page 9, “numer per class”). The paper is well written and the methodology is well explained, clear and obviously improves the baseline. The only issues are further comparisons to other models and techniques. That would really elevate the paper.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea of stitching together different “interesting” parts of various images is interesting. The “refinement” to make them more realistic and thus make the models more generalizeable is also a nice idea. I’m not sure whether that essentially is rendered obsolete by the region-of-interest checks of a YOLO model since it is essentially the same training data but the results show some improvement. I’m however curious about comparison to other more traditional methodologies augmentation (latent distributions, rotate, gaussian blur, flip, etc). The paper does decent ablation studies, but surely there can be further comparisons with SotA methodologies. Also I’m curious to see how the model trained with the augmented dataset performs for unseen data from another dataset or population, which is the main value of data augmentation. Robust and reliable results from the model.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We appreciate the reviewers’ comments on novelty and superiority. Main issues are clarified below. Common Question CQ1: Comparison with regular augmentations: Regular augmentations involve complex combinations and hyperparameters, making exhaustive comparison impractical within space limits. Notably, default YOLOv11 applies various augmentations (HSV, translation, scaling, etc.) with fine-tuned hyperparameters, boosting mAP of YOLOv11x from 0.421 to 0.462 and YOLOv11n from 0.369 to 0.433. Unlike regular augmentation (altering existing images), our method generates new ones. The two are complementary. Together with our method, mAP of default YOLOv11x and YOLOv11n can be further improved to 0.483 and 0.481, showing the effectiveness. CQ2: Existing GAN-based augmentations: These generate coarse images based on random vectors and image-level labels, without ability of generating fine-grained details and detection annotations. So, they are limited to classification. To our knowledge, our method is first to enhance cervical cell detection by data synthesis. CQ3: Data availability: We will make it publicly available. Response to R1 Q1: First … from data perspective: Apologies for inaccurate wording — it should be “data synthesis perspective”. We carefully read the provided survey, which covers regular augmentation (e.g., rotation, not data synthesis) and GAN-based augmentation (only applicable to classification, not detection, please refer to CQ2). Q2: Comparison methods: Please refer to CQ1. Q3: Low performance: Detecting 8 cell types under a long-tailed distribution (real-scenario) is highly challenging. Moreover, the scarcity of rare-class test data typically leads to low mAP. In this context, preliminary experiments show YOLOv11x outperformed DETR and RCNN variants, achieving only 0.462 mAP. Even so, our method boosts YOLOv11x to 0.483 (YOLOv11n from 0.433 to 0.481), showing effectiveness on overall and rare-class performance. Response to R2 Q1: Comparison methods: Please refer to CQ1 Q2: Similar methods using GAN (including 2 provided papers): Please refer to CQ2. Q3: Opinions on unseen data: Our method generates entirely new images to enhance train data diversity. It is reasonable to expect the trained model may achieve better generalization. Response to R3 Q1: Comparison methods: Please refer to CQ1 Q2: Data availability: Please refer to CQ3 Q3: Potential information leakage: Model evaluation was conducted on test set which is entirely independent, guaranteeing no leakage risk. We reuse val set to enrich tailed cells, which is reasonable since train and val sets share overlapping regions. Q4: Unclear dataset composition and mixing strategy: The dataset is composed of 7048 train, 2350 val, and 2350 test images (detailed in Table 1). We adopted a simple mixing strategy by adding the same number of synthetic images to each class. Although it may not be optimal, our method outperforms baseline, demonstrating its effectiveness. Q5: Evaluation of segmentation and propagation of errors: We obtain 0.758 average IoU of our method (other methods<0.617) between ~25,000 segmented cells and their annotated boxes, indicating high matching quality, especially considering the variability of annotated boxes. Furthermore, using IoU> 0.5, our method achieves 93% cell hit rate (other methods<72%). While a lower error is preferable, current results sufficiently demonstrate our method’s effectiveness in segmenting cervical cells and enhancing the detection. Q6: Related work in generative cell augmentation. We carefully read the provided papers and will clarify their contributions and differences with ours. One main difference is they rely on amounts of real cells to generate fake cells, whose medical validity remains unverified. By contrast, our method uses only real cells. As for CutPaste applied to cervical cell detection, we conducted a thorough search and found no prior works. Q7: Fail case analysis: We will include it if permitted by authority

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This manuscript introduces a simple yet effective workflow for cervical cell image synthesis, which can be used to train a cell detection model for cervical cancer detection. Compared with other image generation approaches based on CycleGAN or CUT, the proposed method produces better cell detection performance. The rebuttal has addressed the reviewers’ concerns regarding comparisons with regular data augmentation, acknowledgement of prior related work, and technical details of data synthesis.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Controllable Image Synthesis Workflow for Enhancing Cervical Cell Detection

Author(s):