Abstract

Existing implicit 3D reconstruction methods utilizing NeRF and its variants for internal CT often overlook anatomical priors of target objects, limiting accuracy in ultra-sparse view scenarios. We present TP-INR, a novel framework that leverages sparse-view projections to generate high-quality anatomical priors for structural encoding of objects. By combining prior-based structural encoding with positional encoding, TP-INR enhances implicit representations for precise CT reconstruction with minimal supervision in these challenging conditions. Additionally, we tailor the implicit framework for medical applications through refined network design and adaptive ray-based training, improving both accuracy and efficiency. Experimental results across various organ regions demonstrate that TP-INR outperforms state-of-the-art methods in reconstruction quality and efficiency, relying solely on projection data. Code is available upon request.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2213_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CaoQin_Target_MICCAI2025,
        author = { Cao, Qinglei and Tang, Ziyao and Tang, Xiaoqin},
        title = { { Target Prior-enriched Implicit 3D CT Reconstruction with Adaptive Ray Sampling } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {600 -- 609}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper describes a sparse-view CBCT reconstruction method based on implicit neural representation (INR). Compared with previous INR methods, TP-INR, as developed in this study, customizes the optimization towards challenging rays with relatively large losses. In addition to conventional spatial encoding of INRs, it also incorporates a target prior structure encoder to generate structural encoding. The structural and spatial encodings are combined to feed into the INR framework to yield voxel outputs for iterative reconstruction.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The customization of the optimization towards challenging rays is novel.
    2. The use of structure prior encoding is novel.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. There is no ablation study to investigate the efficacy and pros/cons of the optimization focus strategy (for the challenging rays). For some rays, a relatively large loss may be caused by the high attenuating materials (like bones) the rays pass through, not necessarily due to a (relatively) worse reconstruction. It is unclear how this factor may affect the reconstruction accuracy of the proposed strategy.
    2. A thorough ablation study, on the focused ray strategy, the use of target prior, and the use of local variation regularization, are needed.
    3. Some comparison methods, like NeRP, are reduced versions due to the removal of the prior information. Other methods, like NAF, did not use regularization (In contrast, TP-INR method used the LV regularization). Thus the true reconstruction accuracy of these comparison methods can be much higher. The lack of comparable regularization terms used in the other methods biased the reconstruction comparison. Adequate regularizations should be added to the other methods for a fair comparison.
    4. The target prior with local variation regularization seem like a local smoothing operator with data fidelity considered. How does it compare to the more straightforward TV regularization?
    5. With prior information removed, how NeRP can still be much better than NAF in reconstruction accuracy? They are now essentially the same method and the only difference is the spatial encoding method being used (Fourier vs. Hash).
    6. Table 2 is confusing. How is the target prior generated for methods like FDK or SART-TV?
    7. Eq. 2 is incorrect.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study is interesting and can potentially help to improve sparse-view CBCT reconstruction. But the current study has many issues in the presented results and comparison studies, which render its scientific merit unclear.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel implicit neural representation (INR)-based method for sparse-view CBCT reconstruction. The key contribution is the target prior structural encode module, which provides a strong structural prior to guide the network in modeling the CT volume. Besides, the adoption of adaptive ray selector (ARS) accelerates the training process. Experiments demonstrate that the proposed method achieves good performance and superior efficiency compared to existing approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper introduces a novel approach to encode structural priors for INR-based CBCT reconstruction. Specifically, it optimizes a prior grid using the discrepancy between rendered and real projections. Besides, it adopts adaptive ray selector (ARS) to pick up challenging rays. The idea is interesting and useful, as demonstrated by the experimental results.
    2. The proposed method outperforms existing approaches in both reconstruction quality and computational efficiency.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The experiments were conducted on a small-scale dataset consisting of only four cases. It is recommended to evaluate the method on a larger dataset, such as the one used in SAX-NeRF, to better validate its generalizability.
    2. The paper lacks important details about the training process. In particular, it is unclear how often the TPE module (Algorithm 1) is applied—is it used in every training iteration? Additionally, the total number of training iterations is not specified.
    3. The explanation of the TPE procedure in Algorithm 1 is somewhat unclear. It appears that the prior is re-initialized from zero each time it is used. If so, it raises the question of how the prior is queried during testing.
    4. The discussion around the Adaptive Ray Selector (ARS) is insufficient. The definition of a “challenging” ray and the strategy for selecting such rays are not clearly explained. Moreover, there is no ablation study comparing ARS with random ray sampling to justify its effectiveness.
    5. The second column in Table 2, labeled “Prior/Pred,” is unclear. It would be beneficial to add more explanation regarding Table 2.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. It is recommended to unify the formatting of all tables (e.g., Table 1 and Table 2) for consistency. Additionally, placing figures or tables within paragraphs disrupts the reading flow. For example, moving Figure 5 and Table 2 to the top of the page would improve readability.
    2. I suggest reviewing the notation used in the paper. There are inconsistencies—for instance, R_s is bolded in Figure 2 but not in Section 2.1, and \hat{o} is bolded in Algorithm 1 but not in Section 2.2. Moreover, the bold formatting does not follow standard conventions (e.g., bold for vectors or matrices). It would be helpful to clarify whether this is intentional or an oversight.
    3. The TPE module takes several seconds to update (Section 2.2), which appears slower than the hash grid used in NAF. Given this, it is unclear how the proposed method achieves better efficiency. I expect some discussion regarding that.
    4. Missing period (.) at the end of in Fig. 4 caption.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    [+] This paper presents an interesting idea to encode structural priors for INR-based CBCT reconstruction. The proposed method demonstrates improved reconstruction quality while maintaining good efficiency, as shown by the experiments. [–] The experiments are conducted on a small-scale dataset, limiting the evaluation of generalizability. [–] The paper lacks important details regarding implementation and ablation study, which may hinder readers from fully understanding the work. [–] There are minor formatting and notation inconsistencies that should be addressed for clarity.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have adequately addressed my concerns.



Review #3

  • Please describe the contribution of the paper

    The paper addresses the problem of sparse-view CT reconstruction using neural fields. Building on NeRF and its CT-specific adaptations, the authors propose a method that incorporates a Target Prior Structural Encoder—a module designed to embed structural priors relevant to CT reconstruction. The central claim is that the structural prior improves reconstruction performance in sparse-view settings by enhancing accuracy, reducing artifacts, and improving efficiency.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Delivery: The paper is well-written, well-structured, and easy to follow. The contributions are clearly stated, and the results are presented in a comprehensible and accessible manner. Methodology:

    1. The paper addresses the challenge of ultra-sparse view CT reconstruction in a comprehensive manner, focusing on key aspects such as accuracy, speed, and artifact reduction.
    2. The proposed approach extracts structural information from projection data prior to reconstructing the target CT volume. This enables the model to focus on task-relevant structures rather than relying on generic priors. The introduction of the Target Prior Structural Encoder (TPSE) is a novel and meaningful contribution, supported by an ablation study that demonstrates its effectiveness. Evaluation: The method is thoroughly evaluated against state-of-the-art CT reconstruction techniques across multiple metrics. The inclusion of clear visual examples further supports the results and makes the improvements easy to interpret.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Weaknesses: Clarification/Additional ablation: It is unclear whether the ablation study was conducted with or without the adaptive ray selector. Since adaptive ray selection can significantly impact both reconstruction accuracy and computational efficiency, an experiment isolating the effect of the ray selector would help disentangle its contribution from that of the structural prior. Including such a comparison would strengthen the paper’s argument for the effectiveness of the Target Prior Structural Encoder (TPSE). Minor Comments: The reference to NeRF contains an error—the correct last name of the first author is Mildenhall. In the introduction, the authors state that “SAX-NeRF(… )sacrifices quality due to positional encodings optimized for smaller networks.” This raises the question: could reconstruction quality be improved simply by adopting alternative positional encodings, without incorporating structural priors? A brief discussion on this possibility would be helpful. The statement in the introduction that “Although this technique [approximate priors] improves reconstruction accuracy, its practical application in engineering contexts is still limited” requires further clarification. It would strengthen the paper to explain why these limitations exist—whether due to computational cost, generalizability, or other factors.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a well-motivated and technically sound approach to the problem of ultra-sparse view CT reconstruction. It is well-written, clearly structured, and easy to follow, with clearly stated contributions and results. The proposed methodology is thoughtfully designed, combining structural priors with neural field-based reconstruction. The introduction of the Target Prior Structural Encoder (TPSE) is a novel and valuable contribution. The paper addresses not only reconstruction accuracy but also efficiency. The evaluation is thorough, with comparisons to strong baselines and multiple metrics, and the visual examples effectively illustrate the advantages of the proposed method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    While the paper could benefit from additional ablation studies, the current evaluation fits the format of the conference. Priors in the reconstruction task is an interesting topic and the results of this paper have potential to provide material for discussion and future works.




Author Feedback

  1. Ablation Study on ARS Strategy We apologize for the confusion regarding the exclusion of ARS from Ablation Study. The ARS strategy, derived from Hard Sample Mining, enhances INR learning by selectively sampling challenging rays with higher voxel projection losses (R2). Its novelty is secondary to our TPSE module and TPE algorithm. Therefore, we treated ARS as a fundamental component of our framework, akin to Fourier positional encoding and periodic activation functions (R1, R2). Our experiments show that ARS improves performance over random ray selection (R2), but it does not significantly influence the incremental gains from TPSE (R3). To present these results, we hope to revise Fig. 5 to include a detailed incremental ablation study in table format, clearly outlining the contributions of ARS and other components of our framework (R1, R2, R3).

  2. Implementation of TPE Algorithm and TPSE Module, and their Ablation Study The TPE algorithm generates an initial reconstruction, termed the target prior, independently of the INR learning process. This calculation occurs only once before the training or testing process, offering a task-relevant prior for structural encoding within TPSE. The INR training involves 100 epochs, each comprising 1024 iterations, where we sample the 256 most challenging rays for learning (R2). To evaluate the effectiveness of TPSE and the advantages of our TPE over alternative methods for deriving target priors from projections, we conducted ablation studies in Table 2. This table compares various prior generation methods—FDK, SART-TV, TPE (w/o LV), and TPE (w/ LV)—regarding their impact on the target prior (\hat{o}) and final reconstruction (\sigma). For instance, with 20 views, the first row lists the PSNR/SSIM values for target priors, while the second row shows the corresponding values for final reconstructions (R1, R2). Results indicate that incorporating structural encoding via TPSE consistently improves performance across all prior generation methods compared to the baseline without it. Notably, performance gains from TPSE positively correlate with the quality of target prior, highlighting the advantages of TPE in both prior generation and reconstruction (R2). The comparison between TPE (w/o LV) and TPE (w/ LV) illustrates the value of the Local Variation (LV) term in prior estimation. We opted for LV over Total Variation (TV) since TPE updates only local regions covered by selected rays, making LV a more suitable choice for optimizing prior estimation in these contexts (R1).

  3. Fairness of Comparative Experiments In comparing our model to NeRP, it’s crucial to note the differences in data requirements. NeRP’s prior-based version necessitates a recent CT image from the same patient as prior. In contrast, our method computes the prior directly from sparse projections using TPE, eliminating reliance on additional CT images, which may not be available (R1, R3). Therefore, comparing our model with the non-prior version of NeRP is fair, as both rely solely on projection data for reconstruction (R1). Regularization is applied only in prior generation, not during INR learning. This ensures fair comparisons, as both our method and others like NAF do not use regularization during training (R1).

  4. Model Generalization We appreciate Reviewer 2’s concerns about model generalization. While SAX-NeRF has shown efficacy across various industrial and medical datasets, our focus is on medical reconstruction, aligning with MICCAI’s themes. We selected four representative medical datasets (jaw, chest, abdomen, and foot) that vary in anatomical complexity, providing reasonable insights into the model’s generalization capabilities. Importantly, INR model’s self-contained learning scheme facilitates greater generalization than traditional deep learning models. Based on these findings, we believe our model can achieve comparable performance on larger datasets, and we look forward to exploring this further (R2).




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper presents TP-INR, a implicit neural representation (INR) framework for sparse-view CBCT reconstruction. The method combines a Target Prior Structural Encoder (TPSE) with an Adaptive Ray Selector (ARS) to guide optimisation toward structurally informative regions and challenging rays. The paper is motivated by the limitations of existing neural field-based reconstruction methods in data-scarce regimes, and it aims to improve both reconstruction accuracy and computational efficiency.

    The reviewers generally agree on the potential impact of the proposed method. Reviewer #3 strongly supports acceptance, emphasising the clear writing, sound methodology, and solid performance improvements over prior approaches. They highlight the novelty of TPSE and appreciate the integration of task-relevant priors to guide neural field learning. In contrast, Reviewers #1 and #2 both recommend weak rejection, citing issues with experimental clarity, completeness, and fairness.

    In particular, all reviewers express interest in the idea of incorporating structural priors into INRs. However, Reviewers #1 and #2 raise important concerns about the depth of ablation studies, especially around the ARS and TPSE modules. Reviewer #1 also questions the fairness of comparisons to baseline methods, noting the lack of consistent regularisation across models. Reviewer #2 adds that the paper lacks critical training details (e.g., schedule for TPSE updates) and is limited by a small-scale dataset. While Reviewer #3 finds the evaluation adequate, they also note that disentangling the effect of ARS from TPSE would further strengthen the claims.

    Given the technical potential of the proposed contributions, but also the number of open questions regarding methodological evaluation, implementation clarity, and fairness of comparisons, this AC is recommending rebuttal. The authors are encouraged to address the concerns raised regarding ablations, comparative regularisation, and implementation clarity, as well as to provide further discussion on the generalisability and limitations of the current experimental setup.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a promising and well-motivated advance in implicit CT reconstruction, particularly for sparse-view and low-data scenarios.

    While some reviewers raised valid concerns on comparative fairness and experimental details, the methodological innovation and potential impact are clear. After the rebuttal, two reviewers recommend acceptance.

    I vote for Accept considering its methodology novelty and application impact.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top