Abstract

Evidence is accumulating in favour of using stereotactic ablative body radiotherapy (SABR) to treat multiple cancer lesions in the lung. Multi-lesion lung SABR plans are complex and require significant resources to create. In this work, we propose a novel two-stage latent transformer framework (LDFormer) for dose prediction of lung SABR plans with varying numbers of lesions. In the first stage, patient anatomical information and the dose distribution are encoded into a latent space. In the second stage, a transformer learns to predict the dose latent from the anatomical latents. Causal attention is modified to adapt to different numbers of lesions. LDFormer outperforms a state-of-the-art generative adversarial network on dose conformality in and around lesions, and the performance gap widens when considering overlapping lesions. LDFormer generates predictions of 3-D dose distributions in under 30s on consumer hardware, and has the potential to assist physicians with clinical decision making, reduce resource costs, and accelerate treatment planning.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0380_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0380_supp.pdf

Link to the Code Repository

https://github.com/edwardwang1/LDFormer

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_Latent_MICCAI2024,
        author = { Wang, Edward and Au, Ryan and Lang, Pencilla and Mattonen, Sarah A.},
        title = { { Latent Spaces Enable Transformer-Based Dose Prediction in Complex Radiotherapy Plans } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors presented a novel two-stage latent transformer framework (LDFormer) designed for predicting doses in lung SABR plans, even when dealing with varying numbers of lesions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper is well written and divided into proper sections. Also, the authors presented a nice approach predicting doses in lung SABR plans.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The comparison is conducted with their previous work, only. Further, very few significant results are presented in this paper. It is suggested to perform further experiments and compare the results with atleast 2 to 3 state-of-the-art approaches.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well-written. However, there are a few grammatical mistakes present. It is suggested to enhance the quality of the paper by addressing these errors. Furthermore, conducting additional experiments and comparing the results with 2 to 3 state-of-the-art approaches would strengthen the findings.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper presents a new approach to predicting doses in complex radiotherapy plans. Yet, very few significant results are presented in this paper. Also, the comparison is conducted with their previous work. It is suggested to perform further experiments and compare the results with 2 to 3 state-of-the-art approaches.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed the concerns I raised earlier.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a transformer-based methodology for dose distribution generation from input images and contours of the target volume(s) (tumor) and organs at risk (OARs). The proposed method encodes both inputs (CTs, OARs, PTV/IGTV) and targets (dose distributions) into latent vectors via a VQAE which the paper suggests enables the model to accept non-uniform inputs (variable number of tumors).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-organized and clearly written. There is logical reasoning for the proposed methodology, and the approach makes use of well-established techniques such as the use of a VQAE to encode inputs and targets into latent vectors. The evaluation metrics are clinically inspired, and allow a very thorough evaluation of model outputs.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My first concern with this paper is what I believe to be limited clinical utility. Modern radiotherapy planning is already generally a partially automated process, but significantly is highly constrained to the radiation delivery devices available to a given institution. For this reason, it is impossible to know whether the dose distributions generated by the proposed method are even feasible for a given institutional device (as the authors correctly state). Furthermore, it is unclear how the results of this paper would be incorporated into clinical workflows. The paper claims that “real-time dose prediction tool would allow ROs to quickly compare potential treatments both visually and quantitatively to select the best one,” but I don’t believe this to be substantiated. How would one take the a generated dose and recreate that in current radiotherapy treatment planning pipelines? And how does this method propose to allow the creation of “multiple potential treatments” that can be compared? The paper would be strengthened dramatically by the results of the proposed “prospective validation series to quantify the resource savings of introducing LDFormer,” as it is currently unclear to me why one should expect any resource saving by synthetic dose generation.

    Aside from the clinical relevance of the task, I have concerns with the lack of comparative experiences performed in this paper. Only one baseline is used despite other papers (some even cited by the paper) (1,2,3) studying the dose generation task. The one baseline is also redacted due to being performed by the authors, so it is difficult to ascertain the value of that comparison. Furthermore, there are no ablation experiments demonstrating the value of different modules of the proposed method. For instance, each PTV is currently encoded into its own latent sequence. Why can all PTVs be summed into a single mask and used as input both in the latent and non-latent feature spaces to serve as an ablation experiment? This would justify the need for a multi-target solution proposed by this paper. Additionally conformality metrics could be provided for the ground truth doses to have an additional comparative approach.

    1. Kearney et al. DoseGAN: a generative adversarial network for synthetic dose prediction using attention-gated discrimination and generation

    2. Zhan et al. Multi-constraint generative adversarial network for dose prediction in radiotherapy

    3. Liu et al. A cascade 3D U‐Net for dose prediction in radiotherapy

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It appears that the dataset will not be made publicly available, making it difficult to reproduce these results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I think that most significantly this work needs to be better justified for clinical relevance. The paper suggests a compelling experiment that can be performed to provide this justification: namely a determination of whether resources are saved by use of LDFormer in the clinic. Since a full study is outside the realm of possibility for the MICCAI review period, perhaps performing a small proof of concept study on a few cases could be feasible. Even easier would be a more concrete explanation of how LDFormer could be incorporated into the clinic and why one should expect resources to be saved. Again, I am concerned that there is no obvious way to use the outputs of this model given that predicted doses have no guarantee of real-world reproducibility, and it is unclear how then a generated dose would make generating an actual feasible dose faster or easier.

    Secondly, I strongly believe that additional comparisons and baselines are required to substantiate the actual value of the proposed method. Dose conformality metrics for the ground truth doses should be a minimum requirement for the rebuttal as this would give readers unfamiliar with dosimetric values some insight into how closely generated doses mirror real doses in those metrics. However I believe that ablation experiments should be reported to substantiate claims such as the need for a multi-lesion system created by embedding each PTV into a separate latent feature vector. Previous methods have combined all targets into a single mask and used this as input to GAN and CNN models and seemed to adequately account for multiple targets.

    Additionally, there are several previously published works in this domain (see weaknesses comment), some of which have publicly available code and others which are described in great detail in their respective papers. A comparison with at least a handful of these methods would give greater weight to the results reported here as they currently stand on their own without adequate comparisons.

    Finally, there is no reason to believe that this method would be limited to working on this private SABR dataset. While the paper claims the method is particularly well suited to this scenario of multiple lesions, the paper seems to provide no reason for why this method would not be successful in other dose prediction tasks. If the authors were to provide results on a second public dataset such as (4), then this would also strengthen the claims of the paper while also enabling reproducibility of the work as it does not currently seem that the private SABR dataset will be made public.

    1. Babier et al. OpenKBP: The open-access knowledge-based planning grand challenge and dataset
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper is well written and the methodology is clearly described, I have large concerns surrounding the clinical utility of the task and questions regarding comparative experiments. I think that each of these two concerns must be addressed to some degree for me to consider recommending the paper for acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have made an effort to address each of my critiques. Specifically, I have a better appreciation for the specific way in which LDFormer will be clinically valuable, and hope that the authors do include some of this information in the updated paper. I also think believe their clarification on why the ML-SABR task is novel enough to prohibit the fair comparison of previous models is valuable. I still think there are some weaknesses with this paper, particularly the lack of comparisons and ablations as well as that the clinical utility will be most demonstrable in their follow-up work, but I believe that it is worthy of a weak accept from my previous weak reject grade.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a transformer-based method for predicting radiation therapy doses for multiple lesions. The authors claim that, apart from one of their previous works, there have been no other studies focusing on radiation dose prediction for multiple lesions in the lungs, which sufficiently demonstrates the novelty of this work. Specifically, it proposes a two-stage dose prediction framework. In the first stage, patient anatomical information and dose distribution are encoded into latent space. In the second stage, a transformer is utilized the learned anatomical features for dose prediction. Due to the relatively limited research in this area, the authors only compare their method with their previous work. The results indicate that the proposed method can accomplish dose prediction within 30 seconds and also showed visualization results for dose prediction of multiple lesions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel application - for dose prediction in stereotactic ablative radiotherapy, it can better assist doctors. Moreover, the Introduction mentions that designing a dose by doctors would take 7.5 hours for a single plan, while the method proposed in this paper only takes 30 seconds, which can greatly improve the efficiency of doctors’ work. A particularly strong evaluation - Section 2.6 clearly describes the evaluation metrics for this study. It includes multiple indicators, and Figure 2 also visualizes the prediction results. Data utilization - A very detailed data processing procedure and scheme are provided, which can serve as a reference for research in this field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited comparison with the latest methods: The authors only compare their work with their previous work, but in fact, the Introduction mentions some transformer-based or similar relevant works for dose prediction. Lack of clarity in the method description: The description in the Methods section is not clear enough; the connection between the two stages and the relationship between the tokens in the second stage and the first stage are not sufficiently explained.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper provides github links in Section 2.5 implementation details, and each part is described in detail, showing a high probability of reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The description of the two-stage method in the Methods section is not sufficiently clear, particularly regarding the relationship between the first and second stages and why encoding is required initially. In Figure 1, it is unclear whether the green encoders are the same and if their parameters are consistent. Additionally, the value 5865 in Figure 1B’s IDE part is not well understood. I would appreciate it if the authors could provide some relevant details. Why are there two instances of 3 and two instances of 5 in the row 136925865749863 after merging tokens for OARs, IDE, and PTV in Figure 1B? Furthermore, there is a question about how to conduct this method in inference time. I would appreciate it if the authors could provide some relevant details.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The entire work is innovative and interesting. Moreover, the Introduction highlights that designing a radiation dose plan manually would require 7.5 hours, whereas using the model proposed in this work takes only 30 seconds, greatly increasing the treatment efficiency. The Introduction section also discusses related work in the field. Furthermore, the paper provides clear and detailed descriptions of the model’s construction, implementation details, and evaluation metrics. However, the description of the overall scheme, especially the relationship between the two stages and how tokens from the first stage are utilized in the second stage, needs further refinement.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ insightful comments. Our data is available with a data-sharing agreement. A. Clinical Utility (R3,R4): Treatment planning involves selecting prescriptions and inverse planning (IP) to calculate a dose distribution (DD) and delivery machine parameters. Prescription selection in multi-lesion SABR (ML-SABR) is challenging (Intro para. 2). If the prescription is too high, the DD does not meet constraints, requiring iterative prescription lowering and IP. If the prescription is too low, patients may receive insufficient radiation or even miss out on treatment. LDFormer allows ROs to bypass IP to compare prescriptions, and therefore IP only needs to be performed for one prescription, saving time. To illustrate, a patient at our centre was initially treated for bilateral lesions. He then developed 2 new lesions, which were treated with 35/5 each. One lesion recurred, and the RO decided against further retreatment. Retrospectively, we showed that a 3rd treatment with 35/5 to the recurrence was possible and created an acceptable plan in the Eclipse treatment planning system. We also showed that the two original progressions could have been treated to 55/5, possibly preventing the recurrence. To prospectively validate clinical utility, we are integrating LDFormer directly into Eclipse and will randomize patients to be planned with or without the model. We will collect planning times and determine whether escalated treatment was possible. We will better highlight clinical utility and provide more detail on the prospective study in the Discussion. The predicted DD can also be used as the optimization target in IP (Discussion para. 1). This will allow us to assess the deliverability of predictions. B. Comparison to Existing Work (R1,R3,R4): To our knowledge, there are no other ML-SABR dose prediction models to compare against. Existing models were created for single-lesion (SL) plans [5,13,14,25] or same-fraction plans that do not account for the radiobiological impact of different fractionation schemes [12,28], with many created for OpenKBP. In OpenKBP [4], up to 3 PTVs were treated synchronously with 3 dose levels over 35 fractions. PTVs do not overlap because intersecting regions were relabeled to the highest dose during data curation [4]. In contrast, our ML-SABR dataset is more complex as it contains up to 5 PTVs treated with varying doses and fractions, overlap, and asynchronous treatment. A major limitation of applying existing models to ML-SABR is that using a single mask for PTVs [5,12,14,25,28] is not possible due to overlap (i.e. retreatment). We will expand on the discussion of existing work in the Introduction. While LDFormer can be used for existing datasets [4], it may not beat all SOTA models on simpler SL or same-fraction plans, as the error caused by lossy VQVAE encoding may outweigh the benefit of self-attention. Finally, to clarify, our previous work is an adaptation of DoseGAN [14] for ML-SABR. C. Conformality Metrics (R3): The mean±SD of the HI, D1cm (Gy), and D2cm (Gy) of the test set ground truth doses are 1.58±0.35, 113±52, and 80±49 for all PTVs, and 1.67±0.32, 126±55, and 119±46 for overlapping PTVs. These will be added to Tab. 1. D. Ablation Study (R3): The latent of a combined PTV mask is prohibitively large since the mask must be encoded in 3D to preserve relative PTV locations. During development, we experimented with removing the IDE to reduce sequence length. We found LDFormer could not correctly place hotspots without the IDE, and so did not pursue a formal ablation study. E. Model Design (R4): Data encoding via VQVAE (first stage) is necessary because transformers operate on integer sequences. Both the dose and IDE are encoded/decoded with the same VQVAE. The two 5s in Fig. 1 are multiple copies of the same vector in the IDE latent. The first 3 is an OAR latent vector, and the second 3 is the PTV fraction. Position encoding allows for differentiation between token meanings.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewer comments are adequately addressed.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewer comments are adequately addressed.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed the reviewers reservations and have improved the paper. With the scores of 4 4 5 (avg 4.5), I recommend acceptance

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed the reviewers reservations and have improved the paper. With the scores of 4 4 5 (avg 4.5), I recommend acceptance



back to top