List of Papers Browse by Subject Areas Author List
Abstract
Computer-aided diagnosis (CAD) has become an essential solution for breast ultrasound (BUS) image analysis; however, the development of CAD systems is hindered by high-quality data scarcity and annotation challenges. We propose a novel clinical prior-guided tumor generation method that allows precise control over tumor characteristics, such as size, shape, and texture, using clinical knowledge from textual descriptions and structural masks. Additionally, our method enables cross-domain data generation, enhancing the adaptability of the synthetic data across different imaging conditions. Experiments on three public BUS datasets demonstrate the favorable generation quality and effective cross-domain adaptation of our method. Moreover, the improved accuracy in downstream classification and segmentation tasks further show the clinical utility and practical effectiveness of our synthetic images in supporting breast cancer diagnosis. The code is available at https://github.com/Violetphy/Clinical-Prior-Tumor-Generation.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1909_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/Violetphy/Clinical-Prior-Tumor-Generation
Link to the Dataset(s)
N/A
BibTex
@InProceedings{PanHao_Clinical_MICCAI2025,
author = { Pan, Haoyu and Mo, Junyang and Lin, Hongxin and Zhang, Chu and Wu, Zijian and Wang, Yi and Zheng, Qingqing},
title = { { Clinical Prior-Guided Tumor Generation for Breast Ultrasound with Cross Domain Adaptation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15965},
month = {September},
page = {57 -- 67}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposed a novel clinical prior-guided tumor generation method that allows precise control over tumor characteristics, such as size, shape, textual descriptions and structural masks. The overall framework is built on pretrained stable diffusion, the author also enables cross domain data generation using LoRA. Experiments on three public BUS datasets and downstream classification and segmentation tasks demonstrate the superiority of the method.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is clear, easy to follow
- Results are extensive and ablation is good
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The only weakness is the novelty is low, the proposed method is based on stable diffusion and ControlNet. Besides, I have some questions that would like the author to address:
- Could author report additional F1-score for classification? since the dataset is imbalanced
- during inferencing, how the mask is generated?
- Any reason on why full fine-tuning is worse than LoRA fine-tuning?
- For results on STU, does the author use STU data to fine-tune? if not, which one does the author use to generate synthetic STU data?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
see weakness
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
see strength&weakness.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper proposes a clinical prior-guided tumor generation framework for breast ultrasound images, which combines structural masks and clinical text descriptions to generate realistic and clinically relevant tumor images without manual annotations. The key contributions include the use of clinical text descriptions of tumor characteristics to guide tumor generation, enabling both the generation of synthetic tumor images as well as corresponding segmentation labels. Moreover, the paper utilizes cross-domain adaptation using low-rank adaptation, allowing the model to efficiently adapt to different clinical data with varying imaging characteristics due to acquisition and patient demographics, with a minimal computational cost. Finally, the authors show that the synthetic data generated by the proposed method improves the training and performance of downstream segmentation and classification models.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Strengths: 1) An effective combination of textual descriptions and masks that serve as a clinical prior to guide tumor segmentation and ensure relevant tumor characteristics are synthesized. 2) Introduced a method to control tumor shape, texture, and boundary characteristics, which is a significant step forward in generating realistic synthetic medical data with pathologies. 3) The incorporation of LoRA is an interesting approach, as the authors claim it allows the model to adapt efficiently to variations in data without full fine-tuning, thus reducing computational requirements. 4) The method is quite extensively evaluated, with results demonstrating that the proposed method outperforms traditional methods and newer approaches, in terms of tumor realism and generating clinically relevant features. 5) The paper includes cross-domain generation experiments, which validate the domain adaptation strategy and show the ability to handle different datasets with varying characteristics.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) While it can be seen that the model is able to adapt to various datasets, the 3 datasets do not fully represent the broader diversity of ultrasound images. Thus, the generalizability of the method could be actually limited. 2) The results highlight the optimal range of real-to-synthetic data (1:1 to 1:4). However, there is still the risk that overfitting could occur with excessive synthetic data. The impact of this and how it might affect long-term deployment in clinical settings needs further exploration. It is also somewhat understandable that 1:1 ratio is the most beneficial, but it is not entirely clear why 1:2 is performing worse in some cases compared to 1:4. 3) It is also evident that utilizing synthetic data alone leads to methods performing significantly worse if trained on BUSI data. The opposite seems to be the case when training with UDIAT in most classification tasks and 1 segmentation task. It is not entirely clear why this happens. 4) The LoRA adaptation strategy is mentioned, but the detailed implementation and its comparative advantages over other methods (such as full fine-tuning) could have been explained in more detail. 5) While the paper uses FID and KID to evaluate the quality of generated images, these metrics are not always ideal for medical image generation. Clinical relevance and visual fidelity (for example, how well the generated tumors match actual tumors in terms of diagnosis) are subjective aspects that may not be fully captured by these metrics. The absence of more specialized medical evaluation criteria (clinical expert review, real-word validation) limits the practical interpretation of the results. 6) The method heavily relies on pre-trained diffusion models and pre-trained text encoders. This introduces a dependency on these external models, which may limit the method’s adaptability in case those models become outdated or unavailable. More discussion on the limitations on the dependency and potential ways to mitigate it would be helpful. 7) The evaluation does not mention how the method performs in real-world clinical settings, under noisy conditions and with other artifacts present. 8) It is not clear whether the authors have trained/fine-tuned the methods used for comparison and how were the images shown in Fig. 4 generated. In particular, how did exactly GAN, DDPM and other models accept a mask as an input to generate the synthetic image? Given that some of these models cannot be conditioned, how has this comparison been performed (it is not fair in this case)?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a novel approach for clinical prior-guided tumor generation using textual descriptions and structural masks, which is a valuable contribution to improving synthetic data generation in medical imaging. The integration of Low-Rank Adaptation (LoRA) for cross-domain adaptation is also a solid strength, ensuring the model can generalize well to different datasets with minimal computational cost. Additionally, the extensive experimental validation and improvements in segmentation and classification tasks highlight the practical utility of the approach.
The paper could benefit from a more diverse evaluation across additional datasets and a more detailed discussion of real-world deployment. There is also potential concern about overfitting with synthetic data and the lack of expert evaluation of the generated tumors. The LoRA adaptation is well introduced, but it could be explored in more depth to better understand its comparative advantages.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This manuscript focuses on the field of breast ultrasound imaging and aims to address the challenges of limited high-quality data and annotation difficulties in computer-aided diagnosis (CAD) systems.
Methodologically, a Variational Autoencoder (VAE) is first employed to encode real ultrasound images, with multi-level noise added progressively. At each denoising timestep, two types of auxiliary information are introduced: (1) clinical textual embeddings extracted using a pretrained text encoder, and (2) structural features derived from structural masks encoded by a ControlNet module. These features are integrated to guide the model in progressively predicting and removing noise, thereby achieving reverse image reconstruction.
In addition, an elastically deformed elliptical model is proposed to synthesize structural masks. These are combined with clinical text descriptions to guide the generation process, enabling the synthesis of high-quality breast ultrasound images with consistent anatomical structures and semantic alignment. To enhance the cross-domain adaptability of the generative model, the LoRA strategy is adopted for lightweight fine-tuning.
Experimental results demonstrate that the generated images significantly improve model performance in downstream classification and segmentation tasks, validating the effectiveness of the proposed approach in data augmentation and cross-modal guided image synthesis.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The manuscript introduces a unique diffusion-based controllable generative framework. It innovatively combines clinical knowledge from textual descriptions and structural masks to precisely control tumor characteristics like size, shape, and texture during breast image generation.
-
By introducing LoRA fine-tuning strategy in the UNet module, the model can efficiently adapt to different datasets with minimal computational overhead and a reduced risk of overfitting.
-
The evaluation of the proposed method is comprehensive. It is tested on 3 public BUS datasets, namely BUSI, UDIAT, and STU. The results demonstrate the favorable generation quality of the method, its effective cross-domain adaptation, and improved accuracy in downstream classification and segmentation tasks.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- In the experiments, the combination of real and synthetic data generally outperformed using synthetic data alone. This may suggest that real data still contains certain information that the synthetic data fails to capture or learn, resulting in a potential bias in the generated data. However, the manuscript does not discuss this issue.
- Given the ongoing debate in the clinical community regarding the use of synthetic data, it may be necessary to elaborate more on whether the generated data holds clinical significance.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Similar structures and methods are applied in other fields, but it is highly novel in the context of breast ultrasound. It uniquely incorporates the BI-RADS standard specific to breast imaging and utilizes specially designed synthetic masks tailored to breast characteristics, along with diagnostic text descriptions, to guide dataset generation. This approach helps expand the dataset and partially mitigates the scarcity of high-quality annotated and described data. Experimental results demonstrate the effectiveness of the method across a broad range of clinical data. However, further discussion may be needed regarding the clinical significance of the generated data, as its applicability clinical diagnose remains somewhat controversial.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
The reviewers acknowledge the novel integration of clinical priors like BI-RADS standards and synthetic masks with stable diffusion models for breast tumor generation, along with thorough validation across datasets and downstream tasks. They recognize the computational efficiency of LoRA adaptation while suggesting deeper clinical relevance discussions, broader evaluations, and expert validation to strengthen real-world applicability. We sincerely appreciate these constructive feedback points for improving our work.
Reviewer #1: On the gap between synthetic and real data: While our method prioritizes fidelity to BI-RADS standards and anatomical masks, biases may arise from incomplete modeling of low-level details or underrepresented edge cases in training. We agree that further analysis of these discrepancies would strengthen the discussion. Future work will explicitly compare latent representations of real vs. synthetic data to identify and mitigate such gaps.
Reviewer #1: On the clinical significance of synthetic data: We will clarify in the revised manuscript that our synthetic data aims to augment (not replace) real datasets, particularly for scenarios with scarce annotated examples. By embedding clinical priors (BI-RADS descriptors, structural masks), the generated tumors reflect diagnostically relevant features, but their clinical utility ultimately depends on expert validation. We will propose collaborations with clinicians to evaluate the diagnostic equivalence of synthetic data in downstream tasks in the future.
Reviewer #3: Mask generation during inference: First, a base ellipse is drawn using random center coordinates and size parameters within clinical ranges. The tumor type (benign or malignant) determines the aspect ratio (e.g., wider-than-tall for benign, taller-than-wide for malignant). Then, we apply elastic deformation to simulate realistic irregularity. The degree of deformation is controlled by a parameter sigma, which is sampled from a type-specific range. For benign tumors, sigma is sampled from a lower range to maintain relatively smooth shapes. In contrast, for malignant tumors, sigma is sampled from a higher range to introduce more irregularity and lobulation.
Reviewer #3: Full fine-tuning vs. LoRA: Full fine-tuning underperformed likely due to overfitting on limited medical data and catastrophic forgetting of the pretrained diffusion model’s generalizable features. LoRA’s low-rank adaptation preserves foundational knowledge while efficiently adapting to medical domains, achieving better generalization with fewer parameters.
Reviewer #3: STU dataset: Since it only contains 42 images and lacks classification labels, no STU data was used in generation training pipeline. STU was utilized as an external test set in downstream segmentation tasks.
Reviewer #4: Generalizability across datasets: We plan to test the method on additional public datasets in future work.
Reviewer #4: Real-to-synthetic ratio and overfitting risks: The performance variance across ratios likely reflects domain-specific data distributions (e.g., UDIAT’s smaller size may benefit more from synthetic augmentation). We will discuss feature diversity in synthetic vs. real data and propose regularization strategies to mitigate overfitting in long-term deployment.
Reviewer #4: Dependency on pre-trained models: We will discuss risks (e.g., model obsolescence) and mitigation strategies, such as open-sourcing our LoRA adapters and exploring self-supervised pretraining tailored to medical data in future work.
Reviewer #4: Comparison method fairness and implementation: For fairness, we reimplemented all baselines with mask conditioning. We will clarify this in the methodology.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A