Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Accurate diagnosis of vertebral diseases is vital for preventing severe complications, but data imbalance between abundant normal and rare pathological cases poses a substantial challenge to diagnostic performance. Medical image generation offers a promising solution by synthesizing pathological samples. However, existing diffusion-based methods, pre-trained on natural images, often fall short in capturing complex pathological features due to the pre-training knowledge gap, as well as struggling to obtain precise lesion masks and ensure seamless integration between lesions and the background. To overcome these challenges, we propose a novel diffusion-based medical image generation framework called \textbf{MedSoft-Diffusion}, which involves leveraging detailed medical knowledge to ensure that generated images are not only semantically consistent with the specified pathological conditions but also anatomically accurate. Our framework includes a Medical Semantic Controller (MSC) designed to enhance the alignment between textual prompts and lesion characteristics, ensuring the synthesis of semantically accurate pathological images. Furthermore, the Soft Mask Inpainting Strategy (SMIS) is proposed to combine soft masks with blurring techniques to improve the realism of synthesized images. Experimental results on two vertebral disease datasets demonstrate notable improvements in both image quality and classification performance using our approach. Code is available at https://anonymous.4open.science/r/MedSoft_Diffusion-1422.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0620_paper.pdf

SharedIt Link: https://rdcu.be/eG4Dr

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_33

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HeShi_MedSoftDiffusion_MICCAI2025,
        author = { He, Shidan AND Hu, Enyuan AND Tang, Zixuan AND Chen, Bin AND Yu, Dongdong AND Hong, Yuan AND Liu, Zhenzhong AND Li, Mengtang AND Liu, Lei AND Zhao, Shen},
        title = { { MedSoft-Diffusion: Medical Semantic-Guided Diffusion Model with Soft Mask Conditioning for Vertebral Disease Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {333 -- 342}
}

Reviews

Review #1

Please describe the contribution of the paper

The proposed model is a text-based inpainting Latent Diffusion Model (LDM). To avoid hard inpainting borders, the authors incorporate a blending approach from vertebral body segmentation, which they refer to as “soft segmentation.” The model is trained in two stages: A text/image encoder is trained using an existing image encoder and an existing text encoder. A Latent Diffusion Model with text and text/image embeddings is trained on masked images, where only the masked region is backpropagated. The mask is applied at the latent level. Most of the LDM remains frozen, with only the new text/image cross-attention being optimized.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The modifications to existing literature are minimal but well-founded. The model can be extended to any general text/mask/image dataset. The evaluations are methodologically sound and include a sufficient number of baselines. However, correctness is only demonstrated through a weak proxy task: improving classifier performance using artificially generated data. To fully substantiate the claim of accurate translation, a Turing test with multiple radiologists or additional user studies would be necessary.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

To Fig2) The produced images look realistic and the examples are close to the GT. However, the textural description is not very clear and very ambiguous in many ways. In addition, an anatomical description like at the upper lower endplate, central, etc is missing. The round lesion is more like the outline of a circle rather than round fill and the hyperintense lesion at the endplate has a hypointense wall. Thus, from the image textural descriptions, a differentiation between different pathological conditions is not possible, also limiting the applicability of the synthetic images.

Text) Self-evaluative language such as “innovative” should be avoided. Additionally, the term “significantly” should not be used unless a statistical test has been performed and a p-value is reported
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces adaptions to a specific use case. Novelty exists in a limited way. The evaluations and results have room for improvement, but overcome the necessary hurdals.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

A diffusion model was proposed as a method to augment abnormal image data containing lesions. The study addressed the limitations of existing diffusion models, which had been pre-trained on natural images rather than medical images, and introduced a lesion mask specification method called the Soft Mask Inpainting Strategy(SMIS) to generate images in which lesions are naturally embedded. The effectiveness of the proposed methodology was demonstrated through both quantitative and qualitative evaluations, conducted by comparing it with other similar models across two datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

As illustrated in Fig. 1 and described on page 4, the utilization of both textual pathology descriptions and masked images during training can be regarded as a distinguishing feature compared to other studies. In this paper, the mathematical concepts employed during the training and inference processes were clearly expressed through Equations (1) to (8), and the key parameters required for training were described in detail. Furthermore, since the model was quantitatively evaluated using open data, it can be stated that the study ensured reproducibility.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Although it was stated that the lesion descriptions generated by GPT-4o were reviewed, the study lacked a professional evaluation of the diagnostic validity of the synthetic images, which is essential for assessing their clinical reliability. In particular, the analysis of potential limitations was insufficient in cases where the prompt descriptions of the lesions were inaccurate or the soft mask was incorrectly designated. There remains a possibility that the generated synthetic images may fail to accurately reflect the exact location or boundary of the lesions, yet the study did not provide a thorough analysis of these limitations.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The primary novelty of this study lies in the development of a diffusion model that utilizes textual information and soft masks to address the issue of abnormal and imbalanced datasets in the medical domain. However, due to the lack of a clear analysis of the study’s potential limitations, the overall evaluation was determined as a “weak accept.”
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The manuscripts “MedSoft-Diffusion: Medical Semantic-Guided Diffusion Model with Soft Mask Conditioning for Vertebral Disease Diagnosis” describes an AI-method to artificially transform MRI scans of a healthy column into MRI scans with certain local pathologies. Thereby they use a masked approach that is only altering local areas of the entire scan. Special attention is payed to the transition area between altered and non-altered structure. They show improvement over alternative approaches and provide code. The imputation of pathologies is text-based.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Combination of text-based and image based AI methods.
- good dataset
- sound statistics
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- please remove bold sentences in the text.
- remove Point 4: “Improved Performance” from your list of contributions, this is more a direct result of your contribution mentioned before and not a contribution itself.
- remove min and max from equation 1,having a mask with values between 0 and 1 and applying a Gaussian won’t produce values outside [0, 1] anyway. If your mask is not a Gaussian with sum = 1, and min and max are not introduced for confusion only, please clarify.
- 2.3 is not clear at all, are you denoising or noising? Both exist.
- Fig 2 is too small.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Results seem good.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

To Reviewer #1: (1) Regarding Equation 1: Thank you for your insightful comment. In numerical practice, values like -1e-7 or 1 + 1e-7 may occur due to floating-point operations and we handle this by constraining the values within the [0, 1] range in the code, but this detail does not need to be explicitly mentioned in the formula of the paper. We appreciate your feedback, and we will remove the min and max operations in the final version of the paper to avoid confusion. (2) Clarification of Section 2.3: Our approach incorporates both noise addition and denoising during the inference process, which is a common paradigm in diffusion models for inpainting, similar to the approach used in RePaint [11]. Specifically, during the inference process at time step t, the model’s task is to generate a clearer \hat{z}{t-1} from the current \hat{z}_t. In this process, the background part of \hat{z}{t-1} (i.e., the regions outside the mask) is obtained by adding t-1 steps of noise to the original image, while the lesion part (i.e., the regions inside the mask) of \hat{z}_{t-1} is predicted through the denoising of \hat{z}_t. In this way, we ensure that the background region remains consistent with the original image, while the lesion region is generated according to the target description. The inference process proceeds from step T to step 1. By the time we reach time step 1, \hat{z}_0’s background part will be the original image (or the original image with 0 steps of noise), while the lesion part will be the result of the denoising process after T steps. This ensures that the final generated image has a seamless integration of the lesion and background, with the background part being consistent with the original image, thus ensuring the authenticity and consistency of the generated result. (3) Formatting and Contribution List: We will address these issues in the final version by enlarging Fig. 2, removing bold formatting from sentences, and deleting “Improved Performance” from the contribution list.

To Reviewer #2: (1) Lesion Description and Anatomical Detail: Thank you for your valuable comments and for recognizing the realism and GT similarity of our generated images. We acknowledge that the current lesion descriptions lack precise anatomical detail, and some generated structures do not fully capture the expected radiological characteristics. This limitation arises partly due to the coarse granularity of current prompt descriptions and the restricted spatial controllability of soft masks. We are actively working on improving the anatomical precision of both text prompts and mask conditioning in our future work. This includes integrating structured anatomical priors and enhancing spatial attention mechanisms to better guide lesion generation in a radiologically faithful manner. We appreciate your insightful feedback, which will help guide our next steps. (2) Subjective Language: We have removed all subjective or self-evaluative terms, ensuring that all claims are objectively stated and statistically supported where appropriate.

To Reviewer #3: Thank you for highlighting the importance of assessing the clinical validity of the generated images. We agree that a thorough diagnostic evaluation is essential, especially in scenarios where prompt descriptions may be imprecise or the soft mask may be suboptimally defined. While our current work has focused primarily on methodological development and indirect validation (e.g., downstream classification performance), we fully recognize the need for more comprehensive clinical assessment. In our ongoing work, we are collaborating with radiologists to conduct structured evaluations of lesion realism, anatomical accuracy, and diagnostic plausibility. We also plan to explore more robust prompt engineering strategies and refine our mask conditioning to improve lesion localization and boundary control.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

MedSoft-Diffusion: Medical Semantic-Guided Diffusion Model with Soft Mask Conditioning for Vertebral Disease Diagnosis

Author(s):