Abstract

Accurate organ segmentation is crucial for prostate cancer radiotherapy, but cone-beam computer tomography (CBCT) based models are hindered by low image quality and annotation scarcity. Existing approaches rely on deformable registration, which struggles with soft-tissue deformations, or direct CBCT training, which suffers from domain shifts and low-quality labels. We propose a domain adaptation framework that enables robust prostate segmentation on CBCT using cross-modality supervision from planning CT (pCT). A cycle-consistent generative adversarial network translates pCT into synthetic CBCT, enabling segmentation models to train on high-quality pCT-derived annotations while adapting to CBCT characteristics. Additionally, anatomy-aware augmentation enhances robustness to organ deformations across diverse patient anatomies. Using a multi-center dataset, our approach achieves segmentation accuracy comparable to pCT-trained models. By eliminating the need for manual CBCT annotations, our method enables practical AI-driven segmentation for adaptive radiotherapy.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2561_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/DKFZ-OpenMedPhys/3DcycleGAN https://github.com/MIC-DKFZ/anatomy_informed_DA

Link to the Dataset(s)

Gold Atlas dataset: https://zenodo.org/records/583096 SPARK dataset: https://hdl.handle.net/2123/31090 In-house dataset: private

BibTex

@InProceedings{KovBal_CrossModality_MICCAI2025,
        author = { Kovacs, Balint and Stanic, Goran and Weykamp, Fabian and Ebert, Florian and Bounias, Dimitrios and Tawk, Bouchra and Niklas, Martin and Liermann, Jakob and Jäkel, Oliver and Maier-Hein, Klaus H. and Floca, Ralf and Giske, Kristina},
        title = { { Cross-Modality Supervised Prostate Segmentation on CBCT for Adaptive Radiotherapy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {120 -- 130}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a novel domain adaptation framework for prostate segmentation on CBCT by training on synthetic CBCT images generated from planning CTs using a CycleGAN. The approach leverages high-quality pCT-derived annotations without requiring manual CBCT labels and incorporates anatomy-aware augmentation to improve robustness to organ deformations. The clinical use case is automated segmentation for adaptive radiotherapy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a novel method that reverses the typical direction of domain adaptation by degrading high-quality planning CT (pCT) images into synthetic CBCT using a CycleGAN, allowing segmentation models to be trained on realistic low-quality data with high-quality annotations. This approach circumvents the need for manual CBCT labels. The inclusion of anatomy-aware augmentation to simulate soft-tissue deformations appears to be an effective strategy to enhance robustness. I like Figure 1 as well - it nicely summarises the problem and proposed solution.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper’s main weaknesses relate to the limited scope of its validation and lack of benchmarking against existing methods. The clinical evaluation is based on only three in-house CBCT cases, annotated by a medical student. While the SPARK dataset is used, only one center is included in the independent test set. Another weakness is the omission of some structures (seminal vesicles and femoral heads) from the test set, despite being included in training, without any justification.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the methodology is novel and well-motivated, I have concerns around the limited validation and potential lack of generalizability.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Increased score post rebuttal.



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel cross-modality supervised segmentation framework for prostate cancer radiotherapy using CycleGAN-based domain adaptation. It leverages planning CT (pCT) and its high-quality annotations to generate synthetic CBCT (synCBCT), which is used for training nnU-Net. The paper also integrates anatomy-informed augmentation to simulate soft-tissue deformation across diverse patient anatomies.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths are as follows: (a) use of a multi-center dataset, (b) the approach achieves segmentation accuracy comparable to pCT-trained models, and (c) by eliminating the need for manual CBCT annotations, the method enables practical AI-driven segmentation for adaptive radiotherapy. Overall, the proposed strategy is practical and clinically motivated. It intelligently uses synthetic CBCT to bridge the annotation gap and supports segmentation accuracy with anatomy-informed training.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The major weaknesses are as follows: (a) Lack of comparative baselines, limited validation cases, and under-discussed assumptions hold it back. (b) CycleGAN training is known to be unstable, especially in unpaired 3D settings like CT→CBCT. (c) There’s no mention of convergence monitoring, failure cases, or whether anatomical consistency is preserved (e.g., did prostates deform unrealistically in synCBCT?). (d) No anatomical consistency metrics (e.g., landmark overlap or shape similarity) are reported. (e) Performance on real CBCT may still suffer due to residual domain shift. (f) This assumption is fragile — CycleGANs are not pixel-wise aligned, and shape distortions can lead to label-to-image mismatch. (g) Only three real CBCTs with annotations (done by a medical student) are tested — but no baseline CBCT-trained model is used for fair comparison. (h) All improvements (e.g., 0.01–0.04 DICE gain) are small and within reported standard deviations. There’s no p-value or confidence interval analysis, making it difficult to know if gains are meaningful. (i) The method offers no interpretability tools to inspect when and why segmentation errors occur, especially in complex CBCT with severe artifacts. No error heatmaps, confidence estimation, or misalignment visualizations are provided. (j) Domain adaptation with 3D CycleGAN + nnU-Net + deformation is computationally expensive. (k) The paper does not quantify training time, GPU usage, inference speed, or deployability in time-critical radiotherapy workflows. Overall, these weaknesses can be addressed for the manuscript.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The abstract has major weaknesses, but they are addressable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a novel method for segmenting structures on CBCT images for prostate cancer radiotherapy. It overcomes the limited availability of high quality manual segmentations on CBCT images by utilizing planning CT (pCT) scans and their corresponding segmentations, which are much more readily available. They use a cycleGAN to convert pCT scans into synCBCTs, and then train a segmentation network (nn-UNet) using the synCBCTs and corresponding segmentations from the pCTs. The segmentation network is evaluated on both synCBCTs and 3 real CBCTs which have been manually segmented

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper uses an innovative approach to overcome the limited availability of manual CBCT segmentations by simulating CBCT images from pCTs. While this general idea is not entirely novel, as [1] seems to have done something similar, [1] used Monte Carlo (physics based) simulations whereas this paper uses cycleGANs (learning based). More importantly, the approach of simulating CBCTs is very far from fully explored, and certainly warrants more research exploring different methods, so the use of this approach is definitely a strength in my opinion, even if it’s not entirely novel. The proposed method is relatively straight forward and is clearly explained and easy to follow. It makis good use of established networks and methods for both the image synthesis and segmentation tasks. An existing anatomy-informed augmentation method was also incorporated for training the segmentation network. The appropriate use of suitable existing methods, and not making their method unnecessarily complicated, are major strengths of the paper in my opinion. Although the method was only evaluated on 3 patients with real CBCTs and corresponding manual segmentations, the authors deliberately selected challenging cases for the evaluation. They also presented and discussed the visual results for each individual patient, which I personally find more informative and useful for evaluating their method tables of metrics

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    It seems to me that the key to success of the proposed method is generating realistic synCBCTs, so that the segmentation network trained on them will also work well on real CBCTs. However, the paper does not discuss this issue, and does not evaluate or even show any of the synCBCTs. I understand that objective quantitative evaluation of the synCBCTs is very difficult (as you do not have GT CBCTs to compare them to), but some subjective qualitative evaluation of them would still be very informative and help guide future work, e.g. do the synCBCTs include realistic beam hardening artefacts as seen in the real CBCT scans? Related to the point above, I think the arguments against the main competing methods in the intro are a bit weak. I agree that monte carlo methods can be complex and computationally demanding, perhaps limiting how many simulations can be run, but they do not need to be device specific (or rather they can be used to simulate more than one specific device), and while they will inevitably contain modelling errors, the synCBCT will contain ‘synthesis errors’ which I suspect will have just as much if not more impact than the modelling errors. Furthermore, the cycleGAN approach will also be device specific if it is only trained with data from a single device (as I presume was the case in this paper), and the monte carlo approach allows more control over the simulation, which could be important for ensuing a diverse range of synCBCTs. Note, I am not saying that I think monte carlo (or other ‘physics based’) approaches are clearly better or should have been used instead of cycleGAN – and further research into both physics and learning based methods is required to determine which produces the best simulations for training the segmentation network – but likewise I do not think that the reasons you give in the intro mean your proposed methods is clearly better than the method in [1]. And while I agree that trying to simulate MRIs from CBCTs is fundamentally limited (and in my opinion not a very good approach), I think this is less true for simulating pCTs, and there are numerous papers that claim to generate good synCTs from CBCTs, and a few papers that then proceed to use the synCTs to generate segmentations using models trained on real pCTs. To me, it is not obvious that the task of adding realistic artefacts to a pCT to generate a synCBCT is necessarily easier than the task of removing artefacts from a CBCT to generate a synCT. The only way to really establish which approach is better would be to perform a direct comparison – and your study seems to already have everything you need to do exactly that, as the other branch of your cycleGAN can produce the synCTs from the real CBCTs, and the segmentation model trained on the pCTs can then be used to segment the synCTs. I know this additional experiment and results cannot be included in your miccai paper now, but I strongly encourage you to include it in any follow up papers as I think such a direct comparison of the two approaches would be fascinating and provide potential insights for improving both approaches, and would hopefully (for you) provide hard evidence that your proposed method does indeed give better resuls.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper made good use of open datasets to provide the pCT and segmentations, although the total number of patients in both the open and in-house datasets were not very large, and it would be good to try and use larger datasets in future work, as all of the data except the annotated CBCTs should be readily available clinically (although I know all too well that doesn’t mean its readily available for research!)

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    nice solution with promising results which could have real clinical impact

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    i think the authors did a good job of rebutting the reviews. they also made a good point that their method, which can directly segment CBCTs, will be faster than a method that first converts the CBCT to a synCT and then segments the synCT - although I still think it would be very interesting to directly compare the two approaches in a follow up journal paper.




Author Feedback

We appreciate all Reviewers’ (R) positive feedback on our work and their constructive comments. Due to space limitations and the lack of supplementary material, we could not elaborate on all of these details in depth, but we provide explanations for their questions, hoping that they will increase their rating.

  • R2, R3 require more clarification regarding CycleGAN training and the quality of the generated images. Quantitatively, we evaluated the network performance using structural similarity, mean squared error, and Jensen-Shannon divergence. In all cases, the generated images were closer to the target modality than the input modality. Furthermore, significant anatomical inaccuracies would have led to degraded downstream segmentation performance. The competitive results achieved (high-quality pCT vs synCBCTs, see Table 2) provide indirect evidence of the generated image quality and structural correctness. We can extend the manuscript with this aspect. Qualitatively, we visually inspected the generated images, confirming their anatomical fidelity and realism. Due to space limitations, we could not include them, but we will make synCBCT scans available in our public code repository upon acceptance.
  • R1 misses the justification regarding the selected organs for training and evaluation, and the inclusion criteria for data centers. Both decisions were purely methodological, briefly mentioned in the manuscript due to space limitations. We acknowledge that a more detailed explanation of our systematic data curation would be beneficial. As prostate segmentation was the primary target of this study, we required prostate delineations, which led to the exclusion of certain centers’ scans where such annotations were unavailable. Only the prostate, bladder, and rectum were consistently annotated across all included centers, which defined the focus of our evaluation. Consequently, “The 29 patients were stratified by medical center and available GT structures”, noted in the Methods section. These decisions ensured a consistent and reliable evaluation across centers while allowing us to maximize the use of correct labels during training. Including data with incomplete or inconsistent annotations would have limited the performance of our framework and reduced the comparability of results. We will extend our clarification in the revised manuscript.
  • R3 had a question about the deployability in time-critical radiotherapy (RT) workflows. While the training of our framework is resource-intensive (~2 days, with 1 day/network), it is a one-time offline process. During deployment, only the segmentation network operates directly on CBCTs. Inference times on an NVIDIA RTX 2080 (<10.7 GB) were consistently <30sec /CBCT, which is substantially faster than methods requiring sequential application of both synthesis CBCT→synpCT and segmentation on synpCT during deployment. In prostate RT delivered in 5-30 fractions, particularly for ART and SBRT, the contouring occurs with the patient on the treatment table directly after imaging, where minimizing latency is crucial to avoid anatomical changes that can quickly render images outdated. Our method aligns well with these clinical demands by providing fast and accurate predictions, thereby potentially reducing contour review and correction time. We agree that this important aspect was underexplained in the manuscript, and we can include it in the discussion.
  • We see the limitations resulting from the limited sample size, as mentioned by all three reviewers. Accessing large numbers of consistently annotated CBCT-pCT pairs is a common challenge in the field, which also restricts the baseline comparisons. The need for a more detailed evaluation can be mentioned in the Discussion. Despite this, we achieved performance comparable to the upper performance baseline while ensuring high clinical applicability. Our study serves as an important exploratory step demonstrating the method’s feasibility and potential.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper makes an incremental contribution of using a cycleGAN with a relatively small number of cases to generate cone beam CT scans and train a nnUnet to generate segmentation of prostate gland and associated organs at risk structures for prostate cancer adaptive radiotherapy. The technical contribution of the work is limited. The idea of generating images from a modality with limited numbers or poor quality segmentations has been explored for various disease sites including prostate cancers. So the real conceptual novelty in comparison to prior work is unclear. If the contribution is clinical application, then there needs to be substantial number of baselines and benchmarking experiments, which again are missing. The reviewers 3 and 2 pointed out major issues that would be concern for training (e.g. how to ensure good registration for training/evaluation using image pairs) and many others including lack of benchmarks (there is no comparison to even a cone beam CT based segmentation), evaluation of image generation quality, which were only partially addressed in the rebuttal. Carefully addressing such major concerns and providing details of the methodology and experimental evaluation would require a substantial revision of the paper. Taken together, despite the accept rating provided by the reviewers, we unfortunately have to reject the paper given the major issues that cannot be addressed as part of utmost minor revisions allowed for MICCAI.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors did a good job of rebuttal. This paper received three positive ratings from the reviewers. The AC concurs with the reviewers’ evaluations, and this paper is ready to be accepted at MICCAI’25.



back to top