Abstract

Fully-supervised lesion recognition methods in medical imaging face challenges due to the reliance on large annotated datasets, which are expensive and difficult to collect. To address this, synthetic lesion generation has become a promising approach. However, existing models struggle with scalability, fine-grained control over lesion attributes, and the generation of complex structures. We propose LesionDiffusion, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks. By utilizing a structured lesion report template, our model provides greater control over lesion attributes and supports a wider variety of lesion types. We introduce a dataset of 1,505 annotated CT scans with paired lesion masks and structured reports, covering 14 lesion types across 8 organs. LesionDiffusion consists of two components: a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. Extensive experiments demonstrate that LesionDiffusion significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models. Code is available at \href{https://anonymous.4open.science/r/LesionDiffusion}

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0764_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HengruiTianSJTU/LesionDiffusion

Link to the Dataset(s)

kits23: https://kits-challenge.org/kits21/ MSD: http://medicaldecathlon.com/

BibTex

@InProceedings{LeiWen_LesionDiffusion_MICCAI2025,
        author = { Lei, Wenhui and Tian, Hengrui and Dai, Linrui and Chen, Hanyu and Zhang, Xiaofan},
        title = { { LesionDiffusion: Towards Text-controlled General Lesion Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {327 -- 337}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces LesionDiffusion, a text-controlled latent diffusion framework designed to synthesize both lesions and their corresponding masks in 3D CT images. Trained on a dataset of 1,505 annotated CT scans, covering 14 lesion types across 8 organs, the framework allows fine-grained control over lesion attributes, overcoming key limitations in current lesion synthesis models. LesionDiffusion significantly improves segmentation performance, generalizes to previously unseen lesion types and organs, and outperforms existing state-of-the-art methods in both motion artifact removal and T1→T2 translation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper introduces a diffusion model trained on 14 lesion types across 8 organs, demonstrating its ability to generalize effectively to unseen lesion types and organs. This generalization capability is a significant strength, as it allows for broader applicability in medical imaging tasks.
    2. The authors propose a two-stage approach for image generation: (1) lesion mask generation and (2) lesion image inpainting. These stages ensure that the generated images are both lesion-specific and realistic, capturing fine anatomical details while maintaining biological plausibility.
    3. Experimental results show that when segmentation networks are trained on synthetic images produced by LesionDiffusion, they achieve measurable performance gains compared to models trained solely on real data. This highlights the practical value of LesionDiffusion in enhancing downstream tasks such as lesion segmentation.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. It is unclear whether the generated lesion masks are positioned accurately. Additionally, there should be more clarification on whether the lesion attributes generated by the report LLM (Large Language Model) align with realistic medical expectations.

    2. It would be beneficial to hire multiple physicians to review the generated images and assess their clinical realism. Testing whether medical professionals can distinguish between real and generated lesion images would provide valuable insight into the quality and practical applicability of the method.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I found the text in Fig. 2 to be quite small and difficult to read. A version with a larger font size and fewer words would improve clarity and make the figure more accessible.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend a weak accept for this paper. The proposed LesionDiffusion framework presents an innovative approach to synthetic lesion generation with strong generalization to unseen lesion types and organs. The two-stage process for lesion mask generation and image inpainting is well-structured, and the results show measurable improvements in downstream tasks like segmentation. However, there are some concerns, particularly regarding the clinical realism of the generated lesions and the limited dataset used for training. Despite these limitations, the methodology shows promising results, and with further validation, it could be highly valuable for the field.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    Presenting a robust framework for CT scans synthesis, that goes from mask generation to lesion inpainting with specific lesion attribute reports, for guided lesion generation. Benefiting from the effectiveness of diffusion models and GANs with a fine-grained process using text reports to improve the clinical relevance of synthetic scans.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • An extensive dataset with coverage over multiple organs for training and testing on unseen organs.
    • Two phase framework, with first LMNet for lesion mask synthesis and LINet for Lesion inpainting
    • Generalizable over not only cancerous lesions but also stones and cyst lesions
    • Guiding lesion generation with a context based report including specific and lesionrelated attributes
    • Good approach using the Bbox weighted loss function in the denoising process
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Need more details on how the lesion reports are generated, what should the prompt given to the LLM include
    • Choice of only nnUnet v2 for segmentation task not justified
    • Need more details on what is used to the generate the Bbox based on the 4 attributes
    • The steps and process of use of each process / part of the framework is not clear, mismatch between the figure and the description
    • Not clear if the categorization process of lesion locations was automated or done manually
    • Could have compared the model performance to other generative models (maye PASTA-Gen), other than DiffTumor, since ,as the name suggests, it is specific for cancerous lesions
    • Unsufficient details on what the three figures from (a) in Fig4 represent, on convexity-sphericity.
    • Have you considered that to perform the diffusion process solely within the lesion site may neglect the pathological aspect of the organ in cancer cases, since the organ shape may get affected and not only the lesion region … which may lead to clinically non relevant synthetic scans.
    • As future work you could have opened the work to automate the lesion report generation process for ease of use of the framework.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Some format notes : Introduction, Pg 2 : …CT lesion synthesis models Fig 1 : 11 attributes Fig 2 : b: meant VQ-GAN List the abreviation of Dice Score before use of DSC Downstream usability : you are using “models” in plural while you have tested one segmentation model

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an interesting and relevant direction by integrating lesion reports in CT scans synthesis. However, it lacks clarity in several methodological aspects and choices. Have not compared the model performance to similar purpose architectures to assess it’s effectiveness. While these issues limit the paper’s clarity and completeness, the core idea is promising and could contribute to the field with further refinement.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a text-controllable lesion synthesis framework, named LesionDiffusion, for 3D CT imaging. The framework uses a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. It provides control over lesion attributes and supports a wider variety of lesion types. Extensive evaluation also shows the effectiveness of the proposed method in improving the performance of downstream segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a text-controllable lesion synthesis framework, named LesionDiffusion, for 3D CT imaging. The framework incorporates a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. It allows control over lesion attributes and supports a wider variety of lesion types.

    Extensive evaluations also demonstrate the effectiveness of the proposed method in improving the performance of downstream segmentation tasks.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Only one comparison method was included in the evaluation, which may limit the generalizability of the results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel method which is specifically designed for this task, extensive experiment results prove the effectivenss of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Dear Reviewers,

Thank you for taking the time to review our manuscript entitled “LesionDiffusion: Towards Text-based General Lesion Synthesis” (Paper ID: 764). We sincerely appreciate your constructive feedback and insightful suggestions. Due to constraints in length and formatting, some clarifications may have been insufficient in our initial submission. We address your concerns below.

Q1: Comparison with other methods(Reviewer 1) R: We acknowledge that using only DiffTumor as a baseline may limit the generalizability of our evaluation. At the time of our experiments, it was the most relevant and performant method for text-conditioned lesion synthesis. Our focus was to benchmark downstream segmentation performance (DSC), where DiffTumor served as a strong representative. Nonetheless, we agree that future versions should include broader comparisons, e.g., PASTA-Gen, and we plan to expand our evaluations accordingly.

Q2: Accuracy of lesion positioning and bbox generation(Reviewer 2&3) R: We thank the reviewers for highlighting this. Our method ensures anatomically valid lesion positioning through a rule-based pipeline guided by the structured report attributes: “organ”, “organ type”, “lesion location”, and “size”. These determine valid sampling regions on segmentation masks obtained from TotalSegmentator. For example, lesions in hollow organs are restricted to the lumen or wall margin depending on their type, and parenchymal lesions are constrained to lie entirely (or centrally, if protruding) within the organ. If the sampled position cannot accommodate the specified size, we resample; repeated failures trigger LLM-based regeneration of the report. This process is fully automated and consistently enforces spatial constraints.

Q3: On alignment with clinical realism(Reviewer 2) R: We consulted collaborating physicians who found the generated results clinically plausible. However, due to space limitations, this feedback was not included in the submission. We agree that a more systematic evaluation (e.g., a reader study) would be valuable, though challenging to implement given the diversity of lesions across 8 organs and 14 types. We are exploring how to incorporate such studies in future work.

Q4: Framework clarity and figure presentation(Reviewer 2) R: Due to MICCAI’s strict page limits, some pictures were condensed and may have led to confusion. We will refine both the figures and accompanying descriptions to improve clarity in the camera-ready version.

Q5: Details of report generation and prompts(Reviewer 3) R: The full prompt used for LLM-based report generation is available in our open-source repository (pipeline/preprocess.py). While omitted due to space, we will clarify this in the final version.

Q6: On segmentation model choice(Reviewer 3) R: We chose nnUNet v2 due to its strong performance and wide adoption in the field. To avoid confounding factors, we maintained a consistent backbone for all comparisons. Nonetheless, we agree that evaluating with other segmentation models will be valuable in future extensions.

Q7: Clarification on Fig. 4 and image inpainting region(Reviewer 3) R: In Fig. 4(a), the three examples correspond to samples of an esophagus tumor, bladder tumor, and liver cyst. This illustrates the model’s morphological control under different shape conditions. Regarding image inpainting, we note that lesion masks used during training cover the full pathological extent, including surrounding tissues. Additionally, attributes like “surface characteristics” guide the inpainting model to reflect contextual abnormalities beyond the lesion core.

Q8: Formatting and minor corrections(Reviewer 3) R: We thank the reviewers for the careful notes on abbreviation usage, attribute count, and figure labels. These have been addressed and will be corrected in the camera-ready version.

Sincerely,
The Authors




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top