List of Papers Browse by Subject Areas Author List
Abstract
Computed tomography (CT) reconstruction from X-ray images possesses significant advantages, including lower radiation exposure, reduced costs, and better accessibility than direct CT imaging. However, insufficient effective input samples caused by data volume under the moderate level or occlusion of partial soft tissues by skeletal structures in X-rays often hold back achieving high-quality image reconstruction. Additionally, contrasted with voxel-level differences, the texture and structure features are significant for image reconstruction. In virtue of these challenges, this study proposes an efficient approach named Dual-branch CT Network (DCT-Net). It first integrates a conditional diffusion model for data augmentation, which mitigates data scarcity and achieves bone suppression. Subsequently, a dual-branch network in DCT-Net is leveraged to parallel process both augmented and raw data. In the framework, a perceptual loss based on high-level semantic features performs as the contrastive loss. Furthermore, it combines the voxel-level and adversarial losses to optimize the generator. However, the discriminator optimization only depends on the adversarial loss. Experimental results on two public datasets demonstrate that DCT-Net outperforms the state-of-the-art works, appearing to have promising potential among clinical applications.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2669_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{ZhaZhi_DCTNet_MICCAI2025,
author = { Zhang, Zhiyu and Shen, Cong and Tang, Jijun and Liao, Zhijun},
title = { { DCT-Net: Dual-branch CT Reconstruction from Orthogonal X-rays with Diffusion Model and Contrastive Learning } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15962},
month = {September},
page = {153 -- 163}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper introduces DCT-Net, a novel dual-branch network architecture for reconstructing 3D CT volumes from a pair of orthogonal 2D X-ray images. Its key contributions include:
-
A conditional diffusion model for generating bone-suppressed X-ray images to address data scarcity and enhance organ visibility.
-
A dual-branch GAN-based architecture that processes original and augmented X-rays in parallel, allowing the model to learn complementary features.
-
A multi-path loss function, incorporating voxel-level losses (reconstruction + projection), adversarial loss, and a perceptual contrastive loss that improves semantic consistency.
-
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The authors use a conditional diffusion model for data augmentation to synthesise bone-suppressed X-rays, which is novel. This strategy improves the visibility of soft tissues by reducing bone occlusion, which is often a bottleneck in reconstruction tasks.
-
The paper introduces a dual-generator setup, where one branch processes real X-rays and the other processes bone-suppressed X-rays. This is novel in fusing complementary information and appears to help in both reconstruction quality and robustness.
-
The network additionally uses perceptual contrastive loss across anatomical views, targeting semantic consistency and structural integrity of the reconstructed CT, which enhances high-level feature learning.
-
Extensive evaluation on two public datasets shows consistent improvements over several state-of-the-art methods.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The method is validated on digitally reconstructed radiographs derived from CT volumes, not on real X-ray data. This limits conclusions about clinical feasibility and generalization. It should include validation on actual clinical X-rays, which are noisier, more variable, and possibly misaligned due to motion.
-
There are missing clinical usability details such as no inference time is reported, which is import for clinical use as the designed model is large combining both diffusion model and GAN.
-
The authors should clarify how the ground truth bone-suppressed X-rays are obtained.
-
The authors should clarify the structures of dual-generator reconstruction GAN as the self-attention in the structure is not clear.
-
For the description of discriminator, the equation is not clear. What is D(Y)? should it be D(G(Y)/GT)? And what is the structure of the discriminator?
-
In the training configuration, the batch size is 1 and maximum number of epochs is 100. Would this be sufficient? I am a little bit concerned about the stability of the model.
-
The statistical significance test is needed and important for the quantitative results while the authors don’t provide it.
-
In the ablation study, the authors say “LPIPS increases by 0.062 in Table 2”. I think this should be Table 1 (possibly a typo). Also in the ablation study to test the effectiveness of perceptual contrast loss, the author replace this loss with projection loss. This can only prove that perceptual contrast loss is better than projection loss instead of proving the incremental effectiveness (ablate a component) of the contrast loss. Fair comparisons should always be considered in evaluation.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I think this paper proposes a novel pipeline for 3D CT reconstruction from orthogonal projections. However, it is only based on synthetic projections from CT data, the feasibility of this model on real X-ray images is unknown. Also some clarification and more details are needed (See weakness).
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper introduces an innovative approach called DCT-Net for CT image reconstruction from X-ray images. The primary contributions include: 1) The authors propose using a conditional diffusion model to generate bone-suppressed X-rays, addressing data scarcity and improving the visibility of internal organ structures in CT reconstruction. 2) A novel dual-branch network architecture is introduced, which processes both raw and bone-suppressed X-rays in parallel, enhancing reconstruction accuracy and robustness. 3) The integration of a multi-path loss module balances voxel-level details and high-level semantic features, optimizing the reconstruction process.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) The use of a conditional diffusion model for bone suppression in X-ray images is a novel approach. While diffusion models have been explored in medical imaging, their application for generating bone-suppressed images is innovative and could potentially overcome data scarcity issues in CT reconstruction. 2) The proposed dual-branch architecture is a creative solution for simultaneously processing raw and augmented data. This design improves reconstruction accuracy and robustness, which is particularly valuable in clinical settings where data variability is common. 3) The paper demonstrates the potential clinical application of the proposed method through experiments on public datasets. The experimental results are thorough, comparing the proposed method with state-of-the-art approaches. The inclusion of multiple evaluation metrics (e.g., PSNR, SSIM, NMSE) provides a robust assessment of the method’s performance.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) The conditional diffusion model is used solely for generating bone-suppressed X-rays, but its integration into the overall training process is not well-explained. It appears to be an incremental design, which weakens its contribution compared to other diffusion methods.
2) The paper does not provide sufficient details about the training process, particularly how the multiple components (e.g., the dual-branch GAN and the conditional diffusion model) are trained together. This lack of clarity raises concerns about the practicality and efficiency of the method.
3) The introduction does not sufficiently address the relationship between the proposed method and existing diffusion models in medical imaging. This gap makes it challenging to fully assess the novelty of the approach. A more comprehensive review of diffusion models related to medical image reconstruction or transformation is necessary in the introduction section. For instance, the paper should reference the following works:
[1] Cui J, Zeng X, Zeng P, et al. MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 467-477.
[2] Kim J, Park H. Adaptive latent diffusion model for 3d medical image to image translation: Multi-modal magnetic resonance imaging study[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024: 7604-7613.
[3] Zeng P, Zeng X, Wang Y, et al. Multi-modal Long-Short Distance Attention-based Transformer-GAN for PET Reconstruction with Auxiliary MRI[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2025.
4) The paper refers to the perceptual loss as a “contrastive loss”, but this terminology is inconsistent with the actual implementation. A contrastive loss typically involves pairwise comparisons of positive and negative samples, which is not evident in the described method.
5) The term “bone suppression” is introduced in abstract without any medical context or explanation for readers. A brief definition would improve accessibility.
6) The abstract contains grammatical errors (e.g., false capitalization of “While” in “the generator. While the discriminator optimization only depends on the adversarial loss”).
7) In Equation (7), the variables m and n are used without clear definitions, making it difficult to understand the mathematical formulation of the loss function.
8) The dataset description only mentions the included CT images and does not provide sufficient details on how the raw and bone-suppressed X-ray images were constructed, which is crucial for ensuring reproducibility.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents an innovative approach to CT image reconstruction from X-ray images. However, it lacks a thorough discussion of related work, particularly with regard to diffusion models, and fails to provide adequate details on the training process and dataset construction, which reduces the overall impact of the paper. Additionally, the mischaracterization of certain terms, such as “contrastive loss,” detracts from the professionalism of the work. Therefore, I recommend a weak accept for this paper.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper proposes a dual-branch network called DCT-Net for CT reconstruction from orthogonal biplanar X-rays. The main contributions are as follows:
- A conditional DDPM is used for bone suppression in the input X-rays, aiming to enhance the model’s focus on soft tissue structures.
- A dual-branch GAN-based reconstruction framework is introduced. For the original X-rays, the generator is optimized primarily through adversarial loss and voxel-level volume and projection consistency losses. For the bone-suppressed X-rays, the model emphasizes semantic alignment between the two inputs via latent space consistency.
This method presents a novel and effective solution to the problem of CT reconstruction from biplanar X-Rays.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The strengths of this paper are as follows:
- It proposes a novel approach that incorporates conditional DDPM, dual-branch GAN with different losses and contrastive-style loss to enhance the highly limited information available from biplanar X-rays. The use of a dual-branch GAN structure allows the model to process the original and enhanced inputs separately, with carefully designed losses from different perspectives to emphasize their individual contributions to the final reconstruction. The introduction of contrastive-style losses for the enhanced input is a valuable design choice. Many aspects of this framework are potentially applicable to other scenarios involving limited input data.
- Notably, after enhancing the original X-rays using the conditional DDPM, the method applies distinct strategies to extract information from the original and enhanced inputs, rather than simply merging them. Each component is well-motivated, which strengthens the overall contribution of the work.
- Another interesting aspect of this method is the way it enforces 3D consistency in volume reconstruction by projecting along three orthogonal directions and applying losses in a 2D-like manner.
- The method is compared against PerX2CT, the current state-of-the-art for this task, and achieves superior performance, demonstrating the effectiveness of the proposed approach.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The description of the CTSpine1K dataset in Section 3.1 is incorrect. The dataset does not consist of 784 COVID-19 CT images; rather, it comprises subsets from multiple sources. In fact, only 40 scans were selected from the COVID-19 subset. I strongly recommend that the authors carefully review and correct the dataset description to accurately reflect the data used in this study.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method demonstrates a certain level of novelty, particularly in its use of a conditional diffusion model for data enhancement, followed by distinct strategies for leveraging original and enhanced data. The use of contrastive-style losses to enforce semantic consistency is also a valuable contribution that may inspire future work in the CT reconstruction community. Furthermore, the paper includes comprehensive experiments and comparisons with state-of-the-art methods. Overall, the work meets the standards of MICCAI and is suitable for presentation at the conference.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors’ responses to all of the reviewers’ comments addressed my concerns, and I believe this paper deserves to be showcased at MICCAI. Regarding the reviewers’ suggestion to include real X‑ray images, I view the manuscript as fundamentally methodological—the simulation results already demonstrate the approach’s potential applicability in real‑world scenarios. While adding genuine X‑ray data would undoubtedly strengthen the paper, the current version is still worthy of publication.
Author Feedback
Thanks for the insightful comments from all the reviewers. The following are the responses to the comments.
Reviewer 1 Q1: The conditional diffusion model in the training process is not well-explained. A1: Thanks for your valuable suggestion. Compared to the end-to-end diffusion approach, our DCT-Net is a trade-off between usability and efficiency, with fewer parameters and faster inference speed.
Q2: Insufficient details of the GAN and the conditional diffusion in the training process. A2: Our DCT-Net employs a two-stage training strategy. The first stage involves training a conditional diffusion model to generate bone suppression X-ray images for data augmentation. The second stage focuses on training a GAN-based CT generation network.
Q3: More comprehensive review of diffusion models. A3: Considering the sufficiency of the review and the layout limitation, we have listed the most necessary references. We will review your suggested references in the proof.
Q4: The actual implementation of the perceptual loss is inconsistent with a contrastive loss. A4: Similar but different from traditional contrastive loss in the pairs of positive and negative samples, the perceptual loss is a variant of contrastive loss without negative samples. Q5: “Bone suppression” is introduced without any medical explanation. A5: To balance the illustration and layout limitations, more details of bone suppression are not listed, which will be provided in the subsequent work. Q6: The word “While” in the abstract part. A6: As a subordinating conjunction, “While” is misused here. We will substitute this word with “However, ” in the proof. Q7: In Eq.7, the variables m and n are used without clear definitions. A7: m and n denote the patches of one CT horizontal and vertical slice, respectively. Q8: The dataset description of the raw and bone-suppressed X-ray images. A8: The training data for the diffusion model was obtained from the JSRT dataset, which contains the original and the corresponding bone suppression X-ray images.
Reviewer 2 Q1: The method is not validated on real X-ray data. A1: Due to the scarcity of samples equipped with both real X-ray images and corresponding CT data, the reconstruction of radiographic images is significant.
Q2: No inference time is reported. A2: On the NVIDIA 3090Ti GPU, the average inference time per sample is about 1.2 seconds. Q3: Clarify how the ground truth bone-suppressed X-rays are obtained. A3: The training data for the diffusion model was obtained from the JSRT dataset, which contains the original and the corresponding bone suppression X-ray images. Q4: The authors should clarify the structure of the dual-generator reconstruction GAN. A4: In the manuscript, we have described that self-attention is incorporated with each layer in the decoder to capture long-range dependencies in the reconstructed CT features. Q5: The equation of the discriminator is not clearly described. A5: D(Y) discriminates CTs generated from the original X-ray images Y. The input to the discriminator is G(Y) or GT. Regarding the discriminator structure, we adopt a PatchGAN-style fully convolutional discriminator. Q6: Sufficiency of 100 epochs and 1 batch size. A6: Due to the limitation of the GPU, one batch size is feasible in practice. Specifically, after 70 epochs of training, the model has reached asymptotic convergence. Q7: Statistical significance test. A7: This study compared our DCT-Net with the SOTA methods in accuracy metrics.
Q8: Effectiveness of perceptual loss in the ablation study and a typo. A8: LPIPS increases by 0.062 in Table 1, which should be corrected. The ablation of perceptual loss aimed at verifying the reasonableness of introducing the semantic-level differences between G(X) and G(Y).
Reviewer 3 Q: The description of the CTSpine1K is incorrect. A: Thanks for your careful review. This study used the COLONOG subset of the CTSpine1K, which contains 784 CT images, which should be corrected.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The authors only reported simulation results, which is concerned by reviewers.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A