Abstract

Image-to-Image translation models can help mitigate various challenges inherent to medical image acquisition. Latent diffusion models (LDMs) leverage efficient learning in compressed latent space and constitute the core of state-of-the-art generative image models. However, this efficiency comes with a trade-off, potentially compromising crucial pixel-level detail essential for high-fidelity medical images. This limitation becomes particularly critical when generating clinically significant structures, such as lesions, which often occupy only a small portion of the image. Failure to accurately reconstruct these regions can severely impact diagnostic reliability and clinical decision-making. To overcome this limitation, we propose a novel post-training framework for LDMs in medical image-to-image translation by incorporating lesion-aware medical pixel space objectives. This approach is essential, as it not only enhances overall image quality but also improves the precision of lesion delineation. We evaluate our framework on brain CT-to-MRI translation in acute ischemic stroke patients, where early and accurate diagnosis is critical for optimal treatment selection and improved patient outcomes. While diffusion MRI is the gold standard for stroke diagnosis, its clinical utility is often constrained by high costs and low accessibility. Using a dataset of 817 patients, we demonstrate that our framework improves overall image quality and enhances lesion delineation when synthesizing DWI and ADC images from CT perfusion scans, outperforming existing image-to-image translation models. Furthermore, our post-training strategy is easily adaptable to pre-trained LDMs and exhibits substantial potential for broader applications across diverse medical image translation tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2317_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/snuh-rad-aicon/Diffusion-LAPT

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LeeJun_LesionAware_MICCAI2025,
        author = { Lee, Junhyeok and Kim, Hyunwoong and Chung, Hyungjin and Eom, Heeseong and Jang, Joon and Sohn, Chul-Ho and Choi, Kyu Sung},
        title = { { Lesion-Aware Post-Training of Latent Diffusion Models for Synthesizing Diffusion MRI from CT Perfusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {280 -- 290}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Follwoing are the strength of the paper:

    1. Synthesizing DWI from CT perfusion in stroke imaging is a high-impact application where time and accuracy are critical.
    2. Introducing lesion-aware pixel space objectives for post-training addresses a key limitation in LDMs.
    3. Comparisons against a wide range of baselines including GANs (Pix2Pix, CycleGAN) and recent diffusion models (cLDM, BBDM).
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. It is compared with baseline models including the recent ones
    2. Loss weight problem is addressed.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Radiological validation is missing.
    2. While it’s noted that ADC-based expert annotation was used, further clarity on inter-rater agreement or lesion volume statistics would be helpful.
    3. Architecture novelty is limited.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The architecture novelty is minimal and radiologist validation is missing.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel post-training framework for LDMs in medical image-to-image translation by incorporating lesion-aware medical pixel space objectives to enhance overall image quality and improve the precision of lesion delineation.

    • Post-training framework for LDMs
    • Generating DWI-equivalent MRI images from CTP
    • Task-specific objective for ischemic lesion areas to enhance the accuracy of lesion delineation.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • For medical imaging applications where precise reconstruction of small, clinically essential regions such as ischemic lesions is required, the proposed formulation such as lesion-aware image space objectives will be useful.
    • Usage of deep generative models to create synthetic DWI/ADC from CTP provides a clinically meaningful solution.
    • Usage of a binary mask-based region-specific loss that selectively penalizes reconstruction errors in areas with ischemic lesions.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The reason their post-training approach performs better than existing models has not been thoroughly examined. For instance, is the gain mainly due to greater lesion fidelity? Is it easier to identify certain infarct locations than others? What is the contribution of each loss term?
    • Metrics such as Dice score for lesion segmentation, Lesion volume agreement, etc. have not been evaluated to show the clinical importance of generated output images.
    • The model integrates VQGAN, latent diffusion, and a time-conditional U-Net, resulting in a complex and computationally intensive architecture. With its dual-space design, operating in both latent and image space, how is the added complexity and resource demand justified in comparison to simpler alternative models?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work can be considered for acceptance based on the relevance of the addressed area and approach discussed. However, there are unanswered questions about the suggested architecture’s efficiency given it’s complexity.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a lesion-aware post-training framework for Latent Diffusion Models (LDMs), addressing the issue of critical detail loss in medical image generation caused by compressed latent spaces through the introduction of medical pixel-space objectives. In the task of converting CT images to diffusion MRI (e.g., DWI/ADC), the method combines pixel-level loss with lesion-specific optimization, improving overall image quality and the delineation accuracy of acute ischemic stroke lesions. Experiments on multimodal data from 817 patients demonstrate the framework’s strong adaptability, making it widely applicable to medical image generation tasks. By synthesizing high-quality MRI, it alleviates the challenges of high costs and limited accessibility of clinical MRI, providing technical support for early stroke diagnosis and treatment.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper incorporates a medical image spatial loss to generate more realistic medical images.
    2. This paper introduces a task-specific objective for ischemic lesion regions. These innovative contributions ensure that the generated images retain lesion-related information without loss.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The text mentions that “efficient one-step inference” is more effective but does not explain the specific reasons or provide relevant literature/experimental support. It is recommended to supplement explanations or citations.
    2. How is $M_{lesion}$ in Equation 7 obtained? How is its accuracy ensured? Does the acquisition process increase task difficulty? Additional clarification is recommended.
    3. Figure 2 does not clearly illustrate the training and testing workflow of the method. A step-by-step explanation is suggested for better reader comprehension.
    4. The data comparison in Table 2 is not intuitive. It is recommended to use line charts or other visualization methods to more clearly reflect the impact of parameter changes on performance.
    5. Although the visualization results show that the CT-generated DWI data are close to the gold standard, further clinical validation is still needed. The authors are advised to explore this in future work.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a novel method that integrates pixel-level loss functions with lesion-specific optimization, thereby significantly enhancing the conversion performance from CT images to diffusion MRI. The research content generally meets the academic standards of the MICCAI conference, but the following aspects of the methodological details require further refinement: 1) The acquisition process of lesion masks is not clearly explained; 2) The key steps in the algorithm flowchart lack sufficient elaboration. It is recommended that the authors provide focused clarification and supplementary justification for these technical details during the rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We are deeply grateful to all reviewers for their thorough evaluation and constructive comments on our work. We particularly appreciate the positive feedback highlighting the novelty and efficacy of our proposed framework for medical image-to-image translation using latent diffusion models (LDMs). Below, we address the primary concerns and questions raised by the reviewers.

Major Weaknesses

  1. Radiological / Clinical Validation (R1W1, R2W2, R3W5): We fully agree that rigorous radiological and clinical validation is essential. In our future work, beyond standard evaluation metrics, we will report segmentation performance metrics (Dice, IoU) and also compute ASPECTS scores to assess stroke-specific clinical relevance. These additions will quantitatively demonstrate both the anatomical fidelity of our generated images and their utility for stroke assessment.

  2. Details of Lesion Statistics and Acquisition (R1W2, R3W2): A brief summary of the lesion mask acquisition process is provided in Section 3.1 (Experimental Setup - Datasets). Comprehensive details regarding the acquisition process and lesion volume statistics will be added to the final manuscript.

  3. Architectural Novelty and Complexity (R1W3, R2W3): We would like to clarify that our primary novelty is a framework for post-training latent diffusion models (LDMs) with task-specific objectives in the medical imaging domain. We build on standard LDM architectures and established backbones in state-of-the-art imaging models. By operating in a compressed latent space via a frozen VQ-GAN encoder, LDM training and post-training demand less computation than pixel-space diffusion models. Although adding image-space objectives raises per-epoch cost, the sharp reduction in epochs yields lower overall training cost than training from scratch. Importantly, inference complexity matches that of a conventionally trained LDM.

  4. Questions regarding Quantitative Results (R2W1, R3W4): Recent work shows that post-training latent diffusion models using image-space losses generate noticeably sharper and more realistic medical images. However, a global loss alone often overlooks small lesions, so we added a lesion-aware objective that concentrates on the ischemic areas. In practice, the image-space loss improves overall image quality, while the lesion-focused loss ensures that lesions are accurately reconstructed. Table 2 compares models with λ_image>0, λ_lesion=0 against those with λ_image>0, λ_lesion>0, revealing each term’s relative contribution. In addition to the tabulated results, to make the impact of λ_image and λ_lesion more intuitive (R3W2), we will include in Figure X a line chart plotting the evaluation metrics against these weight settings, thereby visually summarizing how each objective term drives performance.

  5. Other Comments One-step inference (R3W1): We will include further explanations or relevant citations regarding one-step inference in the final manuscript. Questions regarding Figure 2 (R3W3): As indicated in its title and caption, Figure 2 is intended solely to illustrate the post-training process of our framework. The overall training and testing workflows follow the standard procedures for training LDMs.

We sincerely thank all reviewers once more for their valuable comments and suggestions. We hope these responses satisfactorily address the questions raised and will revise the final manuscript accordingly.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top