Abstract

Computed tomography (CT) plays a significant role in clinical practice by providing detailed three-dimensional information, aiding in accurate assessment of various diseases. However, CT imaging requires a large number of X-ray projections from different angles and exposes patients to high doses of radiation. Here we propose VolumeNeRF, based on neural radiance fields (NeRF), for reconstructing CT volumes from a single-view X-ray. During training, our network learns to generate a continuous representation of the CT scan conditioned on the input X-ray image and render an X-ray image similar to the input from the same viewpoint as the input. Considering the ill-posedness and the complexity of the single-perspective generation task, we introduce likelihood images and the average CT images to incorporate prior anatomical knowledge. A novel projection attention module is designed to help the model learn the spatial correspondence between voxels in CT images and pixels in X-ray images during the imaging process. Extensive experiments conducted on a publicly available chest CT dataset show that our VolumeNeRF achieves better performance than other state-of-the-art methods. Our code is available at https://www.github.com/Aurora132/VolumeNeRF.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3061_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3061_supp.pdf

Link to the Code Repository

https://www.github.com/Aurora132/VolumeNeRF

Link to the Dataset(s)

https://www.cancerimagingarchive.net/collection/lidc-idri

BibTex

@InProceedings{Liu_VolumeNeRF_MICCAI2024,
        author = { Liu, Jiachen and Bai, Xiangzhi},
        title = { { VolumeNeRF: CT Volume Reconstruction from a Single Projection View } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes VolumeNeRF used to reconstruct CT volumes from a single-view X-ray. The likelihood images and the average CT images are employed in the model to incorporate prior anatomical knowledge. Further, the projection attention module is designed to learn the spatial correspondence between voxels and pixels in X-ray images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper proposes a nerf-based method for single-view-based CT reconstruction. The introduced average and likelihood images are used to provide prior knowledge, and the projection attention module seems useful for aligning the voxel and pixel mismatchment.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The average and likelihood images seem to have dataset bias, since differently collected CT datasets differ in age and other characteristics.
    2. The experiments with more X-ray views for reconstruction can be further conducted to verify that the model can not only process the single-view setting.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Improve the introduced anatomical prior. Different from simply computing the average image and likelihood image, clinical knowledge is necessary for building a template that can provide general anatomical knowledge without much bias.
    2. The framework seems not limited to single-view CT reconstruction. While the single image provides too limited information, I wonder if the current framework can be further applied for reconstruction with more views.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned, the likelihood image and the average image are computed from the current dataset. While different from the template-based method, the employed two images contain bias from the used dataset (across from different ages), which makes it hard to apply to out-of-domain CT imaging

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors answer my concerns, and I will improve my score.



Review #2

  • Please describe the contribution of the paper

    The authors propose a method for reconstructing CT volumes from a single projection. The method utilizes neural radiance fields (NeRF) and incorporates prior anatomical knowledge via an average CT volume and log likelihood projection. Furthermore, it employs data consistency between the original and projected X-ray image from the generated volume. The approach outperforms other state-of-the-art methods and the authors demonstrate the validity of the method components through an ablation study.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is interesting and leveraging prior anatomical information enriches the method’s depth and context. The paper is well-structured which enhances readability. Furthermore, the authors provide accessible code and utilize a publicly available dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The related work section appears to be incomplete, lacking for instance references for CT reconstruction with multiple views. Also closely related work is missing, such as [1] and [2]. Furthermore, the visual results of comparison methods appear inadequate and should be improved for a more fair evaluation. [1] MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray https://arxiv.org/abs/2202.01020 [2] NeAT: Neural Adaptive Tomography https://arxiv.org/abs/2202.02171

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    • Properties of Light and X-ray: The authors say: “Considering the similarity between the natural light imaging process and the Xray imaging process, migrating the NeRF model to CT reconstruction problems appears feasible.” Please also discuss the significant differences between natural light and X-ray imaging processes, and address any necessary adaptations needed for the NeRF model in the context of X-ray imaging. • Depth Information in X-ray: The authors speak about the “depth information in an x-ray”. Please clarify that there is no inherent depth information in a single projection X-ray image due to the image formation process, to avoid misunderstanding among readers. • Related Work: Include references for CT reconstruction with multiple views and provide a brief explanation of NeRFs for readers unfamiliar with the concept. Please also look at the major weaksnesses for references to two closely related publications that should be mentioned. • NeRF in Methodology: Provide more information about NeRFs, particularly concerning Figure 1b, to help readers understand which part of the figure corresponds to the NeRF model. • Single projection view: The rationale behind the method’s use of a single projection view NeRF approach needs clarification. Since NeRF allows for the generation of multiple images without requiring knowledge of relative geometry, it’s important to discuss why the single projection view approach was chosen over utilizing multiple projections. Addressing this point would enhance understanding of the method’s design and its implications for the reconstruction process. • Figure 1: Explain Figure 1a (and b) more thoroughly and refence in text. Please also explain the term “modulate”. • Average CTs: Clarify whether the average CTs are registered before use. • Edges: Please explain how the “edges of the CT images” are defined and how the loss is computed. • Fusion Block: Define the fusion block mentioned in the text but not depicted in Figure 1. • Voxel Size and Cropping: Clarification is needed regarding the statement “the area with a resolution of 128 × 128 × 128 is cropped from the center of each scan.” Is the resolution specified in millimeters or voxels? Additionally, it would be beneficial to explain why only a cropped area with a resolution of 128 × 128 × 128 is used instead of utilizing the entire image. • Figure 3: Clarify the caption and text for Figure 3, including the purpose of the surface plot and the content of the second row. Maybe (b) and (c) could be its own figure? • Comparison methods: The results of the comparison methods in Figure 3 do not meet the expected level of refinement. When compared to reconstructions from original works, the results from the proposed method appear blurry. This discrepancy raises concerns about the fidelity and quality of the reconstructions produced by the proposed method. • Ablation Study: Please specify the “major components” included in the ablation study by naming them in the text. • Coordinate Offset Visualization: Please improve the clarity of the visualization of planes and provide a more detailed description of the experiment in the text. The experiment is not reproducible as is. • Likelihood Images: Please describe the experiment and results of the analysis of the likelihood images more clearly. Clarify the shape of the distribution observed in the likelihood images and discuss why the average image may not adequately represent the dataset. Additionally, explain the concept of the two classes (far/near) and their relevance to the likelihood images. Provide insight into how these classes are utilized in the analysis and interpretation of the likelihood images to ensure a thorough understanding of the results. • Table 1 and 2: Include standard deviations for PSNR and SSIM in Tables 1 and 2 for a more comprehensive presentation of the results. • Perception Loss: Consider incorporating a perception loss, similar to LPIPS error, for improved evaluation of the results. • X-ray Images: Address whether the X-ray images are consistently taken from the same view or if testing is conducted with different angles to evaluate the method’s robustness. Additionally, discuss the implications of using synthetic X-rays instead of real X-rays and any potential limitations associated with this choice. Providing this information would offer a clearer understanding of the experimental setup and the method’s applicability in real-world scenarios. • Clinical Applications: Provide a more extensive discussion of the clinical applications of the proposed method, including why it relies on a single projection and addressing any concerns about the use of synthetic X-rays. • Tumor Reconstruction: it would be interesting to address what happens with tumors in the reconstructed images, as the mean image does not seem to include them. • Volumetric Rendering: The term “rendering” does not accurately describe the process of a forward projection in this context. Instead, consider using the term “projection” to better describe the transformation of volumetric data into X-ray images.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an intriguing idea with promising potential. However, there are areas where organizational improvements and clarifications are necessary, particularly in the related work and experiments sections.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    After reviewing the comments and rebuttals from other reviewers, I tended to keep the weak accept.



Review #3

  • Please describe the contribution of the paper

    The work presented VolumeNeRF, a model that adopts NeRF in the domain of X-ray imaging to reconstruct 3D CT volumes from a single projection view. During training, the network learns to generate a continuous representation of the CT scan conditioned on the input X-ray image and render an X-ray image similar to the input from the same viewpoint as the input.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clear writing.
    2. Prior Anatomical Knowledge Incorporation is interesting.
    3. In Fig. 3, the 3D view looks good.
    4. Clear motivation statement for the paper.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. When learning the correspondence between measurement and 3D volume. There are a large number of published methods. For example: [1] Lin Y, Luo Z, Zhao W, et al. Learning deep intensity field for extremely sparse-view CBCT reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023 : 13-23. [2] Wang Q, Wang Z, Genova K, et al. Ibrnet: Learning multi-view image-based rendering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4690-4699. This work is not compared with these methods because this paper is an MIC paper, so the superiority of the method needs to be demonstrated.

    2. Is µ_p a real number or an n-dimensional vector (n greater than 2)? The paper should specifically explain the calculation method of u and σ.

    3.Need to compare with other nerf-based methods: [3] Zha R, Zhang Y, Li H. Naf: Neural attenuation fields for sparse-view cbct reconstruction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2022: 442-452.

    [4] Corona-Figueroa A, Frawley J, Bond-Taylor S, et al. Mednerf: Medical neural radiance fields for reconstructing 3d-aware ct-projections from a single x-ray[C]//2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022: 3843-3848.

    1. Because this paper is a supervised learning method, it needs to be compared with the classic supervised learning method: [5] Jin K H, McCann M T, Froustey E, et al. Deep convolutional neural network for inverse problems in imaging[J]. IEEE transactions on image processing, 2017, 26(9): 4509-4522.

    2. Table1 and Table2 should add the mean and variance, because neural network training is related to hyperparameters, and it is difficult to determine the effectiveness and final performance of the ablation experiment by only reporting one result.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is necessary to focus on explaining Prior Anatomical Knowledge Incorporation. Please give more explanation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Good writing and the Prior Anatomical Knowledge Incorporation is interesting, but it lacks explanation.

    2. No comparison with some existing methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The author’s feedback explained some of my doubts, so I raised my rating.




Author Feedback

We thank the reviewers for their insightful and positive feedback, e.g., clear writing (R1&R3&R4); useful modules (R1); intriguing idea with promising potential (R3); interesting modules, clear motivations (R4). Below, we will address four main comments.

  1. Further explanations on the anatomical prior (R3&R4). To represent the commonality of anatomical structures among various patients, we perform deformable registration to align all resampled CT volumes and calculate the average value of CT volumes in the training set. Then we use likelihood images to quantify the differences between individuals and the population for X-ray images. Specifically, we assume the intensity of each pixel obeys a one-dimensional Gaussian distribution. To compute the parameters, we apply maximum likelihood estimation to fit the training set data to the corresponding distribution. During CT reconstruction for a patient, the negative log-likelihood value for each pixel is computed and forms a likelihood image. If a pixel deviates from the prior distribution, its log-likelihood value increases, indicating substantial differences between the voxels (situated on the ray connecting the X-ray source and the pixel) of the patient and the average. Thus, the model can capture discrepancies between the target and the average and reconstruct CT volumes from the mean CT. To analyze the impact of our module, patients are divided into two groups based on the average of their likelihood images. PCA is used to visualize the 3D feature maps from the final layer of the 3D encoder. Based on our previous analysis, it is expected that the CT volume of patients with a low likelihood average should approximate the mean CT. Fig. 3(c) demonstrates that patients with a low average (near class) exhibit semantic consistency with the average CT, whereas others (far class) show inconsistency, as expected.
  2. The design of the anatomical prior (R1&R3). Thanks for the suggestions. While we acknowledge that the anatomical prior may theoretically introduce dataset bias, it is noteworthy that the dataset we used comprises 1,018 volumes, including various types of patients and thus generally representing the population. We also evaluate the performance of the model on an external validation set (iCTCF dataset containing 1,521 CT volumes) and find the results decline by only 1.8%, indicating small bias. For future work, we will collect diverse cases and integrate an anatomy atlas into our model to provide prior knowledge without introducing bias. Additionally, we will design more statistical measures to better represent the population. For example, variances can be used to characterize the variation range of each voxel and the confidence level of the corresponding mean.
  3. Multiple-view CT reconstruction (R1&R3). Given the clinical rarity of paired multiple projection view data, we designed VolumeNeRF, which can directly reconstruct CT volumes from medical X-ray machine outputs to serve clinical purposes. Our method can be further applied for multiple-view reconstruction. Specifically, we calculate distribution parameters and generate the corresponding likelihood image for each view. The X-ray and the likelihood images for each view are concatenated to form the input, and the edge and render loss are applied to all views. We find that the performance of the model improves by 7.0% and 14.3% when using 2 and 5 views, respectively.
  4. Comprehensive comparisons (R3&R4). We find that our VolumeNeRF performs on average 13.3% better than the following methods: MedNeRF [Corona-Figueroa A et al., EMBC2022], DIF-Net [Lin Y et al., MICCAI2023], NAF [Zha R et al., MICCAI2022] and FBPConvNet [Jin K H et al., TIP2017]. The first method focuses on new-view synthesis without 3D supervision, while the last three typically require multiple views, resulting in poor performance.
  5. To improve clarity, we will adopt the suggestions from the reviewers concerning both the related work and results sections.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents VolumeNeRF, a novel method for reconstructing CT volumes from a single X-ray projection using neural radiance fields (NeRF). Reviewers commended the clear writing, innovative use of prior anatomical knowledge, and strong evaluation. The authors’ rebuttal effectively addressed key concerns, such as potential dataset bias and the method’s applicability to multiple views. Additionally, comprehensive comparisons demonstrated the method’s superior performance. Given the paper’s significant contributions to CT reconstruction and the authors’ thorough responses to critiques, this paper should be accepted for presentation at MICCAI 2024.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents VolumeNeRF, a novel method for reconstructing CT volumes from a single X-ray projection using neural radiance fields (NeRF). Reviewers commended the clear writing, innovative use of prior anatomical knowledge, and strong evaluation. The authors’ rebuttal effectively addressed key concerns, such as potential dataset bias and the method’s applicability to multiple views. Additionally, comprehensive comparisons demonstrated the method’s superior performance. Given the paper’s significant contributions to CT reconstruction and the authors’ thorough responses to critiques, this paper should be accepted for presentation at MICCAI 2024.



back to top