List of Papers Browse by Subject Areas Author List
Abstract
Automatic prediction of dose distribution maps wields considerable influence in clinical radiotherapy treatment. Recently, deep learning-based approaches have been explored to automatically predict the dose map from structure images and obtain promising results. However, these methods mainly focus on extracting anatomical features from CT and organ masks, ignoring abundant visual knowledge inherent in the domain of dose map. To address this limita-tion, we innovatively propose a visual prompt-guided dose prediction model, named ViPDose, to effectively predict radiotherapy dose distribution for can-cer patients. Specifically, our ViPDose is structured with two key stages: 1) a prompt pre-training stage and 2) a prompt generation stage. In the pre-training stage, we train a prompt encoder to encode dose maps alongside structure im-ages into compact prompt vectors. Then, in the prompt generation stage, we design a fast prompt generator fulfilled with a diffusion adversarial network (DAN) to efficiently produce the prompt vectors that closely approximate those generated by the prompt encoder, thus enriching the model with abundant visual prompt information. By adopting DAN in such highly compressed latent space, our method can guarantee high-quality predictions with relatively low computation costs. Comprehensive experiments on a clinical rectal cancer dataset with 130 cases have verified the superior performance of our method over other state-of-the-art methods.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2353_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{FenZhe_Leveraging_MICCAI2025,
author = { Feng, Zhenghao and Wen, Lu and Cui, Jiaqi and Wu, Xi and Xiao, Jianghong and Peng, Xingchen and Shen, Dinggang and Wang, Yan},
title = { { Leveraging Visual Prompt with Diffusion Adversarial Network for Radiotherapy Dose Prediction } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15974},
month = {September},
page = {305 -- 315}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper presents a visual prompt-guided dose prediction model for rectal cancer radiotherapy planning, with three main contributions based on the authors’ claim. First, the model leverages prompt-based learning to incorporate dose distribution knowledge, improving prediction accuracy. Second, it introduces a computationally efficient prompt generator trained adversarially to model prompt vectors in latent space, maintaining quality predictions while reducing computational overhead. Third, several experiments on an in-house clinical dataset of 130 rectal cancer patients demonstrate that the proposed method outperforms existing state-of-the-art approaches. By integrating prompt-guided learning and adversarial training, this work advances dose prediction in radiotherapy, offering both precision and efficiency. Although there is not enough evidence in the article for some of the claims.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Good Literature Review: Relevant studies have been properly reviewed, although some relevant studies like (https://doi.org/10.1016/j.artmed.2024.102961) and (10.1088/1361-6560/ad209a) have been overlooked.
- Logical Structure and Coherence: The sections of the paper are well organized and it is easy to follow. The formulation of the problem is of great help in understanding the contents.
- Clear and Convincing Argumentation: Logical flow from hypothesis to conclusion exists. The challenges of previous studies are well addressed.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Limited Novelty: The proposed method is based mostly on well-known methods. Although the model overall looks interesting.
- Insufficient or Weak Evidence: It is unclear whether Modulation Convolution Blocks (MCBs) were developed by the authors. If the answer is yes, there should be sufficient evidence and appropriate testing in the paper to justify the choice of such an architecture. If the answer is no, the reference should be cited and the reason for the choice should be explained. Besides, given that a combinational loss function is used and two terms of this function have weights W1 and W2, appropriate ablation studies should be mentioned to select the values of these variables, but the authors have stated that these parameters were chosen empirically. There is insufficient evidence to explain the contribution of the study on low computational cost. Although the authors list the number of model parameters in Table 1, previous models with fewer parameters are also included in the table.
- Lack of Reproducibility: Many details of the model implementation are not mentioned in the text, and given the authors’ lack of explicit mention of releasing the code, it is impossible to reconstruct the model. It also used an in-house dataset that is likely not publicly available, which would make it difficult to reproduce the results.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Using publicly available datasets can further enhance the value of the work. Therefore, I recommend that the model be tested on such datasets in the future.
- Using statistical tests to examine the significance of the results can increase the validity of the outcome.
- The authors should also explain the future direction of the study in one sentence.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The ease of following the paper, the use of advanced methods available, and the importance of automating radiotherapy treatment planning are my reasons for making this decision.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper proposed ViPDose, a novel dose prediction model that integrates visual prompt learning with a diffusion adversarial network (DAN) to enhance radiotherapy dose distribution. Unlike previous methods that rely solely on anatomical inputs such as CT scans and organ masks, ViPDose extracts visual prompt from ground truth dose maps during training and generates them during inference using a diffusion GAN. This approach improves both prediction accuracy and computational efficiency, particularly in preserving high-frequency details of the dose distribution.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Utilizing Visual Prompts for Dose Prediction: The approach of using prompt vectors—derived from compressed dose maps—as guidance for dose prediction is highly novel in the field of radiation therapy. Moreover, compared to the conventional diffusion model methods (DiffDP), this method offers improved computational efficiency during inference.
- The proposed approach outperforms existing state-of-the-art methods across all evaluation metrics. Additionally, visualizations such as DVH curves and error maps effectively illustrate its advantages.
- A thorough ablation study has been conducted, validating the usefulness of each component in the proposed method.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The explanation of the knowledge derived from the visual prompt is insufficient. While the authors state that dose maps are utilized as visual knowledge (prompts), the current explanation does not clearly convey why this approach is effective.
- There is no description of the learned or generated visual prompts (VP). It is important to understand what kind of visual prompts are obtained as a result of training, and what types of VPs are generated. Therefore, the authors should visualize the feature distribution of the visual prompts and evaluate the results accordingly.
- The figures suffer from poor visibility. Specifically, the text and dose maps in Figs. 2 and 3 are too small to be easily readable.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
I would encourage the authors to consider making the dataset publicly available, if possible. Doing so would be highly beneficial to the research community and would further enhance the value and impact of this work.
The small text in the figures impairs visual clarity. Improving this would contribute to a higher-quality presentation of the paper and is therefore strongly recommended.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper proposes ViPDose, a novel dose prediction model that integrates visual prompt learning with a diffusion adversarial network (DAN) to improve radiotherapy dose distribution. ViPDose demonstrates both novelty and strong performance in dose prediction. However, there are several concerns (please see Comment 7) that need to be addressed.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The paper presents a novel visual prompt-guided dose prediction model by designing a fast prompt generator with adversarial training.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Fast prompting
- Reach the SOTA performance when comparing with other recent methods.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The ablation study did not include the experiments of the optimal iteration needed. The ViPDose needs 4 iterations, but why the number 4 is used?
- In table 1, some esstential metrics do not show the statistical significance, could the authors explain that?
- Given that the iteration number is small, and why using the diffusion process is necessary in this paper?
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
It is a good paper that reaches SOTA performance when comparing with other methods.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #4
- Please describe the contribution of the paper
a. Proposes ViPDose, a novel dose prediction framework that incorporates visual prompts extracted from ground-truth dose maps to guide prediction. b. Introduces a two-stage training pipeline (1) A prompt pretraining stage, where a prompt encoder compresses dose maps and structural images into compact latent vectors. (2) A prompt generation stage, where a modified diffusion adversarial network (DAN) learns to generate similar prompt vectors using only anatomical inputs. c. Utilizes diffusion models in latent space combined with adversarial training, allowing for efficient generation of prompt vectors with reduced computational cost. d. Demonstrates improved dose prediction accuracy on a clinical rectal cancer dataset, showing superiority over existing dose prediction methods.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
a. Introduces an original application of visual prompts in radiotherapy dose prediction, a novel angle within the domain. b. Leverages diffusion-based generative modeling for compact prompt vector generation, aiming to balance quality and efficiency. c. Achieves strong experimental performance across various metrics on a 130-case clinical dataset, outperforming SOTA baselines. d. Ablation studies are thorough, highlighting the contribution of each component of the proposed framework.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
a. Limited novelty in components. While the integration of prompt learning and diffusion is novel in this context, most technical components (DDGAN, etc.) are adaptations rather than novel innovations. b. Evaluation dataset is limited. A single in-house dataset of 130 patients from one institution may not sufficiently validate generalizability of the algorithm or clinical applicability. c. Weak clinical interpretability. The use of latent visual prompts and compact representations makes it hard to interpret or validate the predictions clinically.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
a. Including results on a publicly available dataset or providing open-source code would strengthen impact and transparency. b. More insight into the actual benefit of prompt-based guidance over direct learning would help clarify the necessity of the proposed method.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The technical design is effective and novel - validated with thorough ablation experiments. However, the proposed method may be overly complicated and lacks validation beyond a single internal dataset. A more broadly validated or simplified version with interpretability considerations could significantly increase the impact.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We sincerely thank all reviewers for their constructive and insightful comments. In the final version, we have addressed the raised concerns, including adding missed citations, improving figure clarity, and providing more in-depth analysis of experimental results. Below are our detailed responses to several key questions mentioned in the reviews. 1.The innovation of visual prompt For most existing methods, they attempt to learning the direct mapping from the domain of patients’ structure images to the domain of dose maps, without any prior knowledge about the underlying data distribution of dose map. Instead of using specific textual prompts, our “visual prompt” represents a compact encoding of the dose distribution conditioned on the patient’s structure images. It serves as a high-level prior “hint” sampled from the data distribution of dose maps, which guides the prediction network to better understand what the ideal dose map should look like. These prompts are extracted by the prompt encoder, which is trained during the prompt pre-training stage. In this stage, the encoder and the prediction network are jointly optimized, enabling the model to automatically learn and transfer visual dose knowledge for downstream prediction. 2.The superiority and iteration setting of DAN In the prompt generation stage, our goal is to generate a high-quality visual prompt conditioned solely on the patient’s structure images, which is essentially a data generation task. Inspired by DDGAN, which effectively addresses the generative learning trilemma (i.e., high sample quality, fast sampling, and mode coverage), we designed our Diffusion Adversarial Network (DAN) as the prompt generator. We adopt 4 sampling iterations in DAN, consistent with DDGAN’s settings, which strike a balance between efficiency and performance. As shown in Table 3 (second and third rows), DAN outperforms direct regression-based prompt generation, demonstrating the superiority of the diffusion-based approach in generating reliable prompt vectors.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A