Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cone Beam Computed Tomography (CBCT) is widely used in medical imaging. However, the limited number and intensity of X-ray projections make reconstruction an ill-posed problem with severe artifacts. NeRF-based methods have achieved great success in this task. However, they suffer from a local-global training mismatch between their two key components: the hash encoder and the neural network. Specifically, in each training step, only a subset of the hash encoder’s parameters is used (local sparse), whereas all parameters in the neural network participate (global dense). Consequently, hash features generated in each step are highly misaligned, as they come from different subsets of the hash encoder. These misalignments from different training steps are then fed into the neural network, causing repeated inconsistent updates, which leads to unstable training, slower convergence, and degraded reconstruction quality. Aiming to alleviate the impact of this local-global optimization mismatch, we introduce a Normalized Hash Encoder, which enhances feature consistency and mitigates the mismatch. Additionally, we propose a Mapping Consistency Initialization (MCI) strategy that initializes the neural network before training by leveraging the global mapping property from a well-trained model. The initialized neural network exhibits improved early training stability, faster convergence, and enhanced reconstruction performance. Our method is simple yet effective, requiring only a few lines of code while substantially improving training efficiency on 128 CT cases from 4 different datasets, covering 7 distinct anatomical regions. https://github.com/iddifficult/NI_NeRF.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1077_paper.pdf

SharedIt Link: https://rdcu.be/eHxeG

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_34

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/iddifficult/NI_NeRF

Link to the Dataset(s)

Covid-19 dataset: https://www.nature.com/articles/s41467-022-30695-9 Pancreas-CT dataset: https://www.cancerimagingarchive.net/collection/pancreas-ct/ Han-seg dataset: https://pubmed.ncbi.nlm.nih.gov/36594372/

BibTex

@InProceedings{XuZhu_NeRFbased_MICCAI2025,
        author = { Xu, Zhuowei AND Li, Han AND Sun, Dai AND Li, Zhicheng AND Li, Yujia AND Kong, Qingpeng AND Cheng, Zhiwei AND Navab, Nassir AND Zhou, S. Kevin},
        title = { { NeRF-based CBCT Reconstruction needs Normalization and Initialization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {354 -- 364}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper describes two methods for improving CBCT reconstruction with neural fields: (1) layer normalization for the features from hash grids and (2) transfer learning using a pretrained NeRF from a different subject.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper adds some interesting components to the CBCT reconstruction pipeline that are worthy of consideration. In particular, the fact the layer normalization improves the utility of the features learned by a hash grid is insightful. However, the experiments are conducted in such a way that it is difficult to independently assess the importance of each of these components to the reconstruction pipeline.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The usefulness of the results presented by the authors are limited by the lack of a comprehensive ablation study. While I appreciate that Fig. 8 shows a single ablation study, it would have been much more useful to include these in Table 1 to see trends across multiple datasets. Additionally, a separate ablation of NAF+MCI would be illustrative.

The description of Mapping Consistency Initialization (MCI) is insufficient. It sounds like the authors pretrain a NAF model with voxel supervision on one subject, then transfer the pretrained model to a new one. What happens if the subject used for pretraining is not representative of new patients. For example. you pretrain on an overweight man, then finetune on a new patient who is a average-sized woman. In such a situation, wouldn’t your pretrained model get trapped in a local minimum given its inductive biases?

Furthermore, there are many papers in the broader computer vision literature that have developed more generalizable transfer learning strategies (e.g., MetaSDF [https://arxiv.org/abs/2006.09662] and pixelNeRF [https://arxiv.org/abs/2012.02190]). Specifically for CBCT reconstruction, previous papers have been published at MICCAI that also pretrain NeRFs for this task (e.g., DIF-Net [https://arxiv.org/abs/2303.06681]). If you’re going to propose a transfer learning method, you should compare against other pretrained baselines for a comprehensive evaluation.

Also, there’s a typo in Table 1: “CMI” -> “MCI”

In Fig 8, it looks like the SSIM for NAF is still increasing. In my experience, NAF requires more than ~30 minutes to train (I typically train it for ~2 hours). Would training for longer have closed the SSIM gap that you illustrate?

Recent work has shown that reconstructing CBCTs with voxelgrids vastly outperform neural field methods (e.g., DiffVox [https://arxiv.org/abs/2411.19224]). How does this baseline compare to the method you propose?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The most interesting contribution made by this paper is the suggestion to use layer normalization on the hash grid features. While this simple trick could improve the performance of NeRF-methods for CBCT reconstruction, the experiments presented in this paper are insufficient. More comprehensive evaluations and ablations on more datasets are needed to demonstrate utility.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

This paper proposes some interesting ideas for NeRF-based CBCT reconstruction (layer normalization and patient-to-patient transfer learning). However, the experiments lack the necessary rigor to support the authors’ claim that their method is superior to every neural CBCT recon method that has come out in the past 3 years. Furthermore, this paper is not written very poorly, and there is ample room for a reader to misunderstand the specific methods that the authors are proposing. There are a lot of buzzwords throughout that add nothing to the discussion of the underlying method (FFT, PCA, etc). This paper would have benefitted from a shorter-but-more-thorough set of experiments and a more focused methods section without the extraneous visualizations.

Review #2

Please describe the contribution of the paper

To mitigate the local-global training mismatch that occurs while training NeRF-based models, the authors proposed (1) a normalized hash encoder to minimize misaligned hash embeddings and (2) a mapping consistency initialization to obtain well-initialized network weights.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have well described the limitations encountered in training NeRF-based models. They also leverage concepts from CT physics such as the Beer-Lambert law in Eq. (1) and the registration of misaligned embeddings in Section 2.3 to address each problem.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
There are a few things to consider carefully.

(1) The authors proposed a normalized hash encoder to mitigate the misalignment between hash embeddings obtained fro two points on a penetration line. However, the authors did not verify this alignment changes.
- Is the misalignment mitigated by the normalized hash encoder?
(2) In Section 2.3, titled ‘Normalized Hash Encoder’, the authors mention FFT in the last paragraph. However, I can not find any parts related to the FFT and/or the channel level of the hash features.
- Please, the authors should clarify why they used this keyword.
(3) In Section 3.3, titled ‘Ablation study’, the authors stated as: “We also test our method performance on SAX-NeRF, which imporves SSIM from 0.921 to 0.932.” However, this sentence is not relevant to the ablation study.
- Please, the authors should clarify why they used this sentence.
(4) Although Fig. 7 is not cited in this paper, the authors included the paper.
- Please, the authors should delete any content not cited in the paper.
(5) Table. 1 shows the PSNR/SSIM performance according to various methods. However, the colorization is wrong for the results related to SAX_NeRF.
- Please, the authors should carefully check and correct it accurately.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

-
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors clearly defined the limitations of previous studies and designed a proposed method to solve the problem based on correlated domain knowledge. However, some validations were missing.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper addresses the issue of local-global training mismatch between the hash encoder and neural network in conventional NeRF models used for CBCT reconstruction. To overcome this problem, this paper propose two techniques. The first is the Normalized Hash Encoder, which applies layer normalization to the outputs of the hash table in order to maintain a unified global mean and variance across hash encodings. The second is the Mapping Consistency Initialization strategy, which initializes a new neural network using a pre-trained network based on medical volumes. To validate these methods, this paper present experimental results comparing their approach with state-of-the-art models.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

One of the major strengths of this paper lies in its novel focus on the interaction between hash encoding and neural networks within NeRF-based models. This perspective that has rarely been explored in previous work. This shift in attention from sampling strategies to the encoder-network relationship presents a fresh and insightful direction in the field.

Additionally, the proposed method demonstrates strong empirical performance, outperforming recent state-of-the-art approaches such as R2-Gaussian in the context of CBCT reconstruction. This improvement suggests the practical effectiveness and relevance of the method, especially in a challenging and clinically meaningful task.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While the figures suggest clear performance improvements, the manuscript lacks sufficient narrative explanation and interpretation of the results. In particular, the analysis following the segment quality performance section feels overly terse and difficult to follow. The paper would benefit from a more detailed discussion of what the results imply, why the improvements matter, and under what conditions they occur. For the sake of clarity and scientific rigor, the authors should more thoroughly explain the key findings and provide better contextualization within the main text.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The graphs in the paper appear to lack labeled axes, which can be confusing for readers. Please clarify what each axis represents.
- In Figure 5, what does the reported time refer to? Is it the average training time or the reconstruction time for a specific volume?
- Figure 7 is not referenced anywhere in the text.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Previous NeRF-based methods have typically focused on improvements in terms of sampling strategies. However, to the best of my knowledge, this is the first paper that specifically investigates the relationship between hash encoding and neural networks. The Normalized Hash Encoder component appears to be a meaningful improvement to NeRF models that could potentially generalize beyond the medical domain to natural images as well. Moreover, the results, which include comparisons with recent state-of-the-art methods such as R2Gaussian—previously one of the strongest models for CBCT reconstruction—suggest that the proposed approach is indeed effective.

However, the Experiments section leaves several things to be desired. While the figures indicate performance improvements, the manuscript lacks sufficient descriptive explanations and interpretations of the results. In particular, from the segment quality performance section onward, the interpretation of results feels rather sparse and uninformative. A deeper discussion of the implications of the improvements, the conditions under which they occur, and their significance is needed. This is essential for both clarity and scientific rigor. The authors should clearly articulate these points and ensure the main text provides sufficient context.

In summary, while the novelty of the proposed methods and the empirical results are convincing, the manuscript would benefit greatly from more detailed analysis and interpretation of the experimental findings.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The auther has adequately addressed my concerns, and i have decided to recommend acceptance.

Author Feedback

Code We will release code and datasets to the public.

R1.1 Misalignment mitigation Thanks. To measure it, we use a variance-based metric which measures the variance difference among the features of all the points from one case. During training, the metric(0.75e-2) increases to 5e-2 without LN, but drops to ~0 with LN, confirming LN’s effectiveness.

R1.2 FFT usage We apologize for this. The FFT operation is used solely to reduce model size and has no impact on reconstruction speed or quality. We will clarify this.

R1.3 Why mention SAX-NeRF in ablation Sorry, our intention was to demonstrate that our method generalizes well to hash-encoded NeRF variants like SAX-NeRF while maintaining strong performance. We will fix it.

R2.1 Ablation of LN on more datasets & Ablation of MCI These results were omitted due to page limits and are now included in the paper. Covid-19 dataset: NAF/NAF+LN/NAF+MCI achieves 26.12/26.61/26.60 PSNR and 0.7149/0.7290/0.7288 SSIM; Pancreas_CT[18], 33.53/35.04/34.45 PSNR and 0.8961/0.9149/0.9056 SSIM;GS_data , 32.52/36.37/36.47 PSNR and 0.8810/0.9200/0.9217SSIM; HAN_seg[17], 34.09/34.51/34.38 PSNR and 0.9533/0.9555/0.9552 SSIM. Additionally, the NAF/NAF+MCI ablation yields 0.862/0.876 SSIM. These consistently support our effectiveness.

R2.2 Comparison with transfer learning methods Sorry, our approach is NOT transfer learning at all. We just provide a better initialization for MLP because we found MLP consistently learns similar mappings across different cases(Sec. 2.4). It is more accurately described as meta-initialization, not transferring weights from one scene to another. We will remove the word “transfer”. In contrast, your mentioned methods, e.g., DIF-Net, are true transfer learning: they train on a set of scenes and then directly apply the frozen parameters to new ones. They perform well under extremely sparse views(10 views), but perform worse under more common sparse-view(e.g., 50 views), achieving 29.31 PSNR significantly « 3DGS(31.96) & NAF(32.67). This is likely because they lack of case-specific retraining. As for MetaSDF, we think the reviewer offers a wrong reference as it’s a signed distance function, not a view-synthesis method.

R2.3 Potential bias introduced by MCI We believe this concern stems from a misunderstanding that our MCI involves transfer learning. As discussed above, MCI does not transfer knowledge from specific data/case, but leverages the similar functional mapping that the MLP inherently learns across different cases(regardless of body shape/gender/anatomical region). To further validate this, we pretrain MLP using diverse images(fat/thin, male/female, chest/abdomen) and evaluated on a same test case. Results show no significant PSNR difference : 34.12 vs. 33.99; 34.08 vs. 34.07; and 33.97 vs. 34.01. Notably, all of our original experiments use one abdominal case for pretraining, yet it consistently improves performance across head, chest, and abdominal cases. This also proves that MCI dose not introduce case-specific bias.

R2.4 Increasing training duration improve NAF performance? Sorry, we actually trained NAF for epochs(e)=32000(>4h) but omitted some parts. During the full training, PSNR peaked at e=3,500(31.31). The performance at e=3,500(31.31) is nearly identical to e=3,000(31.29). Therefore, the misalignment issue cannot be resolved through extended optimization.

R2.5 Comparison with DiffVox We evaluated DiffVox on abdominal scans, it yielded lower SSIM(0.84) vs. ours(0.89).

R3.1 Clarification on segmentation quality evaluation Sorry, we aim to evaluate the quality of anatomical structure reconstruction using AI methods (motivated by[Maisi, arXiv:2409.11169(2024)]) rather than human experts. Specifically, we utilize TotalSegmentator to get segmentation masks for both ground truths and the reconstructed CTs, then compute Dice scores to represent the quality.

R3.2 Time annotation in Fig. 5. Sorry, time means average convergence duration.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The paper has received divergent evaluations from the three reviewers, leading to a wide range of scores. The authors are therefore encouraged to submit a rebuttal to clarify and support their work. Key issues include the limited comparative analysis with NeRF-related methods in the broader computer vision field, as well as the need for a more thorough ablation study. Additionally, reproducibility should be enhanced, ideally through the release of source code or datasets. Given the constrained space for the rebuttal, the authors are advised to focus on the most significant concerns highlighted in the reviews to ensure a concise and meaningful response.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

After reading the rebuttal and reviewer comments. I recoomend to accept this paper. But the clarity of the paper should be improved.

back to top

NeRF-based CBCT Reconstruction needs Normalization and Initialization

Author(s):