Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cone-Beam Computed Tomography (CBCT) is widely used for diagnostics and treatment planning in oral and maxillofacial field due to its low radiation dose and high spatial resolution. Still, its clinical utility is limited by low contrast and incorrect Hounsfield Unit (HU) values. In contrast, multi-detector CT (CT) provides high contrast and reliable HU measurements, with a higher radiation dose. In this work, we present a novel two-stage framework for unpaired CBCT-to-CT synthesis that ensures the exact preservation of anatomical structure, maintains high resolution, and achieves accurate HU value. In thefirst stage, we generate pseudo-paired CT images. In the second stage, weutilize a UNet++ generator enhanced with Interpolation and Convolution Upsampling (ICUP), Edge-Conditioned Skip Connections (ECSC), and a dual discriminator strategy for a semi-supervised approach. Consequently, we generate realistic CT images using pseudo-paired CT images. Extensive quantitative and qualitative evaluations demonstrate that our method outperforms existing unpaired translation techniques, producing realistic CT images that closely match CT images in both HU accuracy and exactly preserve anatomical structure of the CBCT. The code is available at https://github.com/HANJIYONG/Semi-Supervised-Deformation-Free-I2I.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2727_paper.pdf

SharedIt Link: https://rdcu.be/eHwRM

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_55

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HANJIYONG/Semi-Supervised-Deformation-Free-I2I

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HanJi_SemiSupervised_MICCAI2025,
        author = { Han, Ji Yong AND Yang, Su AND Kim, Sujeong AND Kim, Sunjung AND Lim, Sang-Heon AND Yun, Heejin AND Kim, Dahee AND Yi, Won-Jin},
        title = { { Semi-Supervised Deformation-Free Image-to-Image Translation for Realistic CT Synthesis from CBCT } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {577 -- 586}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper describes a two-stage framework for unpaired CBCT-CT translation. In stage 1, unsupervised contrastive unpaired translation (CUT) is used to generate a pseudo CT from a CBCT input. In stage 2, the pseudo CT is used to provide edge and content information to guide the final CT generation, with true unpaired CTs introduced into the loss through a discriminator to account for potential errors in the pseudo CT images. The method was tested on a CT/CBCT dataset and showed improvements over some comparison methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The two-stage approach allows unsupervised image translation with anatomical structure preservation.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. There is no ablation study on the many loss functions used in this study. Their functions and individual contributions should be justified through quantitative evaluations.
2. With many loss functions and network tricks applied, the network developed can be prone to overfit challenges. A robustness evaluation against domain shifts (like CT/CBCT scans of different anatomical sites) should be performed.
3. There is very little information provided on the CT/CBCT dataset used in this study. Are the images paired? And if they are, are they re-arranged to create unpaired datasets to train the corresponding models?
4. Comparison with state-of-the-art approaches on CBCT to CT conversion, especially those using the latest diffusion models, should be performed. For instance, 10.1016/j.compmedimag.2024.102344
5. In Fig. 1, it looks the artifacts from the CBCT image were not corrected in the translated image by the two-stage approach.
6. The title ‘semi-supervised’ is a bit misleading, as semi-supervised learning involves partial paired data.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall methodology is OK, but the study needs clarifications and more results to justify its loss function design, demonstrate its robustness, and show its advantages over the state-of-the-art methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ responses are satisfactory.

Review #2

Please describe the contribution of the paper

An unpaired image translation model to generate pseudo-paired CT images, followed by a paired translation stage, is proposed. The CBCT-to-CT synthesis framework aims to preserve anatomical shape during translation while maintaining high resolution and accurate HU values. Results demonstrate improved qualitative and quantitative performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Clearly written and well-motivated research problem
- Illustrations are well-designed and effectively enhance the clarity of the work
- The method improves accurate quantitative HU analysis, not just visually appealing image-to-image translation.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Current evaluation lacks HU-specific fidelity assessments metrics like histogram divergence and Pearson correlation, making it hard to assess HU consistency accurately.
- There is no region-specific quantitative analysis across different tissues like bone, soft tissue, and air, which is essential for clinical relevance.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1. Figure 1 is visually clear and well-illustrated, however its caption may be somewhat misleading. Both CycleGAN and CUT results exhibit broad high-frequency content (evident from strong oriented spectral lines in the Fourier-transformed images), which contradicts the caption’s implication. Additionally, UNIT’s contrast appears qualitatively comparable to the proposed method, while the target CT image looks blurred—this inconsistency requires clarification. Consider adding regional histogram comparisons to support the visual observations. Also, there is some redundancy in the text, particularly between Sections 2.1 and 2.2 regarding CUT.
2. Please provide the mathematical notation for the multi-scale discriminator to maintain consistency with the edge-conditional generator notation already presented in Section 2.2.
3. While SSIM and NCC metrics (Table 2) are useful for structural and intensity similarity, they do not directly assess HU fidelity. To better evaluate HU accuracy, a Kullback–Leibler divergence comparison between ground-truth and generated CT distributions should be included. Additionally, report variability measures (e.g., ± standard deviations) for completeness, as was done in Table 1.
4. To demonstrate clinical validity, include voxel-wise Pearson correlation between HU values of the real and synthetic CT scans to quantify global HU consistency.
5. HU accuracy should be assessed separately across clinically relevant tissue types—bone, soft tissue, and air—due to differing tolerance levels (e.g., ±100 HU for bone). Use segmentation masks derived from ground-truth CT or propagated labels to isolate these regions and enable precise region-specific HU evaluation.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Paper easy to read and good illustrations; method is rational and systematic; focus on qualitative and quantitative image transaltion aspects.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ response is satisfactory - it offers few explanations and several commitments to future investigation; however, we do not expect new experiments to be provided at this stage (rebuttal).

Review #3

Please describe the contribution of the paper

In this paper, a two-stage semi-supervised pipeline for generating synthetic CT from CBCT in an unpaired environment is presented. The main contribution resides in the ability of the proposed approach to maintain high resolution while correcting the HU of the CBCTs, as demonstrated by the evaluation metric and results.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The two-step approach shows strong performance compared to the other known solutions for image translation in an unpaired data environment. The novel addition of Edge-Conditioned Skip Connections seems to significantly improve the shape consistency.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The dataset is very limited and there is a lack of suitable paired CBCT-CT cases for a fair assessment. Although the main contribution is the ability to process unpaired data, paired data should be used for the evaluation. The quality of the dataset should be improved to better evaluate the solution. In addition, several loss functions are used. Possibly some of them are redundant. Ablation studies should be performed in a next phase.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Minor comments:

Acronyms should only be defined in the abstract if they are used more than once in the abstract.

“Moreover, since CT typically exhibits lower spatial resolution than CBCT, the high-resolution advantage of CBCT is compromised during translation”: Figure 1 only shows discrepancies in the shape, but not in the spatial resolution of CT and CBCT.

“prevents structural distortion and preserves resolution”: missing a dot.

Section 2.2: No information on the loss function is given for this part. Is it the same as in the CUT paper (reference 10). If so, this should be stated. It is not clear that the gradient edge map is extracted from the corresponding CBCT. Also, are 3 discriminators used that serve the 3 different scales, as in the original paper (multiscale discriminator)? The clarity of this part should be increased.

Fig. 2: fl of the ECSC is not explained. The resize is also not explained. The ECSC module should be better explained. There is no clear information about how are edge maps created, and what happens if a lot of noise/artefacts are present? Also, this figure is not mentioned in the text.

Section 2.3: Clarify whether the DR is used to ensure true HU values and the DP to ensure the structure. Or are both used for both tasks? Are LPIPS, style and content losses used as in the original (referenced) papers? If so, this should be clear so that the reader can find a reference source to learn more about these losses. It should me mentioned what segmentation model was used for air loss.

Table 2: The HU values for each structure are the mean values?

Section 3.1: What method was used to resize the image? Padding, cropping, interpolation (which)? What was the probability of occurrence of the individual data arguments? Rotation by 20 degrees? It is mentioned the initial learning rate. Does this mean that the learning rate is changed during training? And how? These metrics assume that the corresponding CT images of ground truth are available. Is that the case? It is not clear from the explanation of the dataset.

Section 3.2: It is not clear how was the ‘exact slice consistency’ is quantified.

Ablation studies of ICUP and ECSC: It is not clear whether the ICUP was replaced only by interpolation or by transposed convolution when it was removed. Transposed convolution should be used for a fair comparison since interpolation has no learnable parameters. I believe Fig. 5 should be Fig. 4.

For better readability, the tables and figures should be placed closer to where they are mentioned in the text.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the dataset may limit the merit of this study, the pipeline appears to have great potential and should be explored further.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The ablation studies mentioned in the review should be performed and, if possible, a larger dataset should be used in future works. Also, the study of the fidelity of HU for each region (pointed by the other reviewers) should also be done in future experiments.

Author Feedback

We sincerely thank all reviewers for their thoughtful and constructive comments.

[HU Fidelity Metrics and Region-Specific Evaluation – Reviewer #1] We appreciate the reviewer’s suggestion on HU-specific metrics. Table 2 presents HU mean and MAE across four hard tissues (enamel, dentine, cortical, trabecular bone), offering clinically relevant evaluation. We agree that including broader metrics such as KL divergence and Pearson correlation would enhance HU fidelity analysis. Additionally, incorporating soft tissue and air in future region-specific HU analysis is a valuable direction we plan to explore.

[Clarification on Figure 1 – Reviewer #1, #2] We thank both reviewers for pointing out the potential misinterpretation in Figure 1. The Fourier-domain magnitude maps illustrate spatial frequency content objectively. CBCT shows broader frequency energy distribution, implying higher spatial resolution, while CT is concentrated in the low-frequency range, indicating smoother but lower-resolution structures. We acknowledge that some methods (e.g., CycleGAN, CUT) retain partial high-frequency content, and we will revise the caption to clarify these nuances. Additionally, while some residual artifacts may appear visually, our method significantly improves structural consistency, as reflected in the quantitative scores. We also appreciate the suggestion to provide regional histogram comparisons and will consider it for future evaluations.

[Loss Function Ablation – Reviewer #2, #3] We agree that isolating the effect of each loss would be informative. In our design, each loss serves a distinct and complementary purpose: L_style aligns texture and HU values; L_content preserves anatomical features; L_air ensures air region consistency; L_PCL stabilizes local structure via contrastive learning; and LPIPS captures perceptual similarity. While Table 3 reports ablations on ICUP and ECSC, we acknowledge that loss-level ablation is also important and will consider this in future work. [Dataset Clarification – Reviewer #2, #3] Thank you for allowing us to clarify. Our dataset includes CBCT and CT from the same patients, forming nominally paired sets. Due to clinical variation (e.g., patient movement, timing), perfect alignment is not feasible. Thus, we use a two-stage training: unpaired translation followed by pseudo-paired refinement using Stage-1 outputs. For evaluation, we measure structural consistency relative to CBCT input and HU fidelity against CT, treating images as unpaired due to residual misalignment.

[Semi-supervised Terminology – Reviewer #2] We appreciate the reviewer’s comment. Although our method utilizes unpaired images and incorporates real CT data within a supervised loss framework, we acknowledge that the term “semi-supervised” may be misleading. We are open to revising it to “weakly supervised” or “pseudo-supervised” to improve clarity.

[Comparison with State-of-the-Art – Reviewer #2] Thank you for pointing out the potential of diffusion-based methods. Although not explicitly stated in the manuscript, our design objective was to enable real-time clinical applicability. While diffusion models are promising, they remain computationally intensive and currently impractical for real-time deployment. We plan to explore lightweight, diffusion-based approaches that can support faster inference and clinical integration in future work.

[Technical Clarifications – Reviewer #3] Regarding Figure 2 and ECSC: edge maps are derived from CBCT gradients and fused via skip connections. The “fl” operator refers to feature-level fusion. Multiscale discriminators follow [12] and operate at three scales. DR targets HU fidelity; DP promotes structural integrity. L_air uses a U-Net trained on manually segmented air regions. Table 2 shows mean HU values. Images were resized using bilinear interpolation; augmentations were applied probabilistically (0.5 per transform). The learning rate decayed linearly after epoch 25.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

Reviewers raised concerns regarding lack of clarity and some concerns regarding the presentation, as well as lack of metrics to assess accuracy of CT synthesis. Major concerns were also raised regarding the thoroughness of the ablation studies and comparisons to other methods. Please address all of reviewers’ concerns in the rebuttal.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Semi-Supervised Deformation-Free Image-to-Image Translation for Realistic CT Synthesis from CBCT

Author(s):