Abstract

Cone-beam computed tomography (CBCT) is gaining prominence in clinical radiology, particularly for intraoperative guidance, owing to its lower radiation dose and faster acquisition speed compared to computed tomography (CT). However, CBCT images often exhibit compromised quality, characterized by increased noise, artifacts, and diminished soft-tissue contrast, which can hinder their direct clinical application. While CBCT-to-CT translation presents a promising solution, this task faces significant challenges in multi-institutional settings where diverse imaging protocols introduce substantial domain shifts, especially when paired CBCT-CT data is scarce. Current unsupervised domain generalization (UDG) techniques often struggle to simultaneously maintain robust anatomical accuracy and preserve domain-specific characteristics—both crucial for clinical reliability. To address these limitations, we propose a novel disentangled representation learning framework for UDG-based CBCT-to-CT translation. Our method uniquely separates domain-invariant anatomical content from domain-specific styles, while leveraging learnable domain-style prototypes to dynamically capture key stylistic characteristics. To ensure high-quality translation, we implement a dual-level consistency mechanism that guarantees both anatomical fidelity and style alignment. By utilizing unpaired data for training and enabling flexible content-prototype combinations, our framework effectively generalizes to new institutions without requiring paired data. Extensive validation across three distinct institutional domains demonstrates that our method achieves superior anatomical accuracy and style fidelity compared to state-of-the-art approaches, establishing a clinically practical UDG paradigm with inherent cross-institutional interoperability.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4095_paper.pdf

SharedIt Link: https://rdcu.be/eHaVM

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04965-0_30

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LonXin_MSDGStyleNet_MICCAI2025,
        author = { Long, Xin AND Liu, Xinrui AND Gan, Fan},
        title = { { MSDG-StyleNet: Multi-source Unsupervised Domain-Generalized CBCT-to-CT Translation with Style-Consistent Disentangled Representations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {317 -- 326}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a novel disentangled representation learning framework that explicitly separates anatomical content and image style in the feature space to improve the quality of CBCT-to-CT translation. It introduces learnable style prototypes to dynamically model style variations across different institutions, enhancing cross-domain adaptability. A dual-level consistency mechanism is designed to simultaneously preserve anatomical fidelity and style alignment, thereby improving the clinical applicability of the translated images. The proposed method is validated on multiple real-world institutional datasets, demonstrating strong generalizability and superior performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposes a novel medical image translation framework that explicitly disentangles anatomical structures from image styles, enabling more stable and controllable cross-domain translation. Within this framework, the authors introduce a learnable style prototype mechanism and a dual-consistency loss. Extensive cross-institutional experiments demonstrate the generalizability and superiority of the proposed method, showcasing strong cross-domain transferability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The modules and methods proposed in this paper have already been extensively explored in the field of artistic image style transfer, and many of the referenced works are relatively dated.

The baseline methods used for comparison are primarily from the natural image style transfer domain, raising concerns about the fairness of the evaluation.

Is there any experimental validation or theoretical justification for treating images from the same modality but different institutions as distinct domains?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a novel disentangled representation learning framework for unsupervised CBCT-to-CT translation under domain generalization settings. The introduction of learnable style prototypes and dual-level consistency losses is conceptually sound and demonstrates promising cross-domain generalization capabilities. Extensive multi-institutional evaluations further support the method’s practical value in clinical settings.

However, several concerns limit a stronger recommendation:

Some of the core techniques are well-studied in other domains such as artistic style transfer, and the paper lacks sufficient novelty in this regard.

The comparisons are limited to natural image style transfer baselines, which may not be entirely fair in a medical imaging context.

Despite these limitations, the paper presents a solid integration of known techniques with thorough experiments and demonstrates promising results, warranting a weak accept.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper introduces a novel unsupervised and unpaired domain generalization (UDG) framework for CBCT-to-CT translation in multi-institutional settings. The method utilizes separate encoders and decoders to disentangle anatomical content from domain-specific styles and uses learnable style prototypes and a dual-level consistency mechanism to ensure both anatomical accuracy and stylistic fidelity. Experimental results suggest it superiority over state-of-the-art methods on CBCT-to-CT task.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The overall paper is well organized and easy to follow.
2. The proposed learnable style prototype captures the domain style characteristics while allowing individual image styles to be modeled.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Limited novelty: The proposed method shares notable similarities with prior works [1] and [2], which raises concerns regarding its novelty. The introduced domain-style prototype learning mechanism appears promising but lacks sufficient explanation. Section 2.2 outlines only a high-level view of the domain-style prototype framework, omitting important implementation details. For instance, if S_a and S_b are directly encoded from the input images, it is unclear how the domain-specific prototypes S^d_a and S^d_b are obtained. Moreover, Figure 2 shows only outgoing arrows from S^d_a and S^d_b, with no incoming connections, leaving readers uncertain about how these prototypes are derived from S_a and S_b. A more detailed explanation would help clarify the learning process and better support the method’s contribution. [1] Liu, Peng, et al. “Disentangling latent space better for few-shot image-to-image translation.” International Journal of Machine Learning and Cybernetics 14.2 (2023): 419-427. [2] Liu, Jiwei et al. “CBCT-based synthetic CT generation using generative adversarial networks with disentangled representation.” Quantitative imaging in medicine and surgery vol. 11,12 (2021): 4820-4834. doi:10.21037/qims-20-1056
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Minor Comment:

There lacks a hyperparameter tuning experiment explaining how the values of lambda is selected.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is clearly written and the proposed style prototype idea is interesting, especially in how it separates domain and image-level style information. However, the novelty is somewhat limited, as the method closely resembles existing work. Also, key implementation details are missing, which makes it hard to fully evaluate the contribution. With more clarification and comparison, the paper could be stronger—so I lean toward a weak accept, depending on the rebuttal.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper introduces a novel framework, MSDG-StyleNet, for multi-source unsupervised domain-generalized CBCT-to-CT image translation. The proposed method effectively disentangles anatomical content and domain-specific styles, incorporating learnable domain-style prototypes and a dual-level consistency mechanism to enhance cross-institutional generalization.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The task is a bit novel and has clinical research significance
- The results of each indicator are optimal, which fully demonstrates the effectiveness of the proposed method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The method has minor novelty, which is quite similar to MUNIT [1]. Can author explain the differences?
- Can you provide the results of downstream tasks? For instance, improve the performance of downstream task by transferring CT to CBCT.
[1] Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 172-189.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel task, clear writing, limited methodological improvements
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

N/A

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

MSDG-StyleNet: Multi-source Unsupervised Domain-Generalized CBCT-to-CT Translation with Style-Consistent Disentangled Representations

Author(s):