Abstract

Cone-Beam Computed Tomography (CBCT) is an indispensable technique in medical imaging, yet the associated radiation exposure raises concerns in clinical practice. To mitigate these risks, sparse-view reconstruction has emerged as an essential research direction, aiming to reduce the radiation dose by utilizing fewer projections for CT reconstruction. Although implicit neural representations have been introduced for sparse-view CBCT reconstruction, existing methods primarily focus on local 2D features queried from sparse projections, which is insufficient to process the more complicated anatomical structures, such as the chest. To this end, we propose a novel reconstruction framework, namely DIF-Gaussian, which leverages 3D Gaussians to represent the feature distribution in the 3D space, offering additional 3D spatial information to facilitate the estimation of attenuation coefficients. Furthermore, we incorporate test-time optimization during inference to further improve the generalization capability of the model. We evaluate DIF-Gaussian on two public datasets, showing significantly superior reconstruction performance than previous state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0250_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0250_supp.pdf

Link to the Code Repository

https://github.com/xmed-lab/DIF-Gaussian

Link to the Dataset(s)

https://luna16.grand-challenge.org/Data/ https://ditto.ing.unimore.it/toothfairy/

BibTex

@InProceedings{Lin_Learning_MICCAI2024,
        author = { Lin, Yiqun and Wang, Hualiang and Chen, Jixiang and Li, Xiaomeng},
        title = { { Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed the learned 3D Gaussians for sparse-view cone-beam CT reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is employed for 3D CBCT reconstruction, which presents a more attractive approach. The use of learned 3D Gaussians represents a novel attempt in CBCT reconstruction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors applied this method to a 6-view CT reconstruction, which is not feasible in clinical settings. Although the results of the proposed method exhibit superior quality, the significant loss of detail renders its clinical application meaningless.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should select a more reasonable number of projection views for clinical relevance.

    2. The current methodology is not sufficiently clear. It would be beneficial for the authors to provide a more detailed explanation to enhance readability and clarify the training and utilization of 3D Gaussians.

    3. 3D Gaussians represent a novel concept, and the authors need to explain the advantages of using 3D Gaussians compared to traditional CNNs.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clinical value of the reconstruction.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The number of views remains clinically irrelevant. Although the authors suggest its potential use for registration during surgery, numerous feature points are often lost in the reconstructions, resulting in significant challenges for practical application.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel supervised-learning method for extremely sparse view CBCT reconstruction. Specifically, this paper represents a 3D feature distribution with a set of Gaussian kernels. The intensity of a point can be queried with features from 2D X-ray projections and features from 3D Gaussians. Besides, this paper proposes test-time optimization during inference to further improve the reconstruction quality. Experiments on two large-scale CT datasets show that the proposed method achieves the best results compared with baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Compared with baseline work (DIF-net) that used only 2D features from X-ray projections, this paper incorporates 2D features and 3D spatial information by representing a feature field with a set of 3D Gaussian kernels. This explicit representation improves 3D awareness and consistency for network training. This paper proposes a test-time optimization (TTO) strategy to further improve the reconstruction quality during inference. This is done by rendering X-ray projections from queried CT volume and minimizing the rendered and actual captured projections. Experiments show that the proposed method achieves the best reconstruction performance than SOTA methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks details on implementing test-time optimization (TTO), such as the number of iterations and rays per iteration. These details are pivotal for understanding TTO’s efficiency and effectiveness. More explanation and discussion of experiments (especially explanation of tables) are expected. For instance, a clearer delineation between the training and inference phases in Table 2 would enhance clarity. It remains ambiguous whether TTO should be classified as part of the training or inference process. Besides, efficiency analysis is also expected in ablation study on TTO since it is generally considered time-consuming. The use of “GS” (presumably denoting Gaussian splatting) in the method name “DIF-GS” appears misleading, since the proposed work primarily borrows the idea of Gaussian representation but does not use the splatting techniques.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No additional comments on reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper proposes a novel supervised method for extremely sparse-view CBCT reconstruction. The idea of representing 3D features with Gaussians is inspiring, and the experiments show the superior performance of the proposed solution over baseline methods. In spite of the aforementioned strength, there are several weaknesses primarily in the experiment section that require addressing. Details of TTO. The implementation of test-time optimization (TTO) is not discussed. For example, since TTO follows ray-based optimization like NeRF, what is the number of iterations for TTO and the number of rays used per iteration? TTO and inference. Technically, TTO should be included in the inference. According to Table 2, the proposed method takes only 1.8 seconds for inference. This seems too short for a ray-based optimization. As a reference, training a NeRF takes at least 10 minutes. More details on the implementation of TTO and an explanation of inference/training definitely help the reader understand the work. Name of the method. While the method is termed “DIF-GS,” the use of “GS” (presumably denoting Gaussian splatting) appears somewhat misleading. This work primarily borrows the idea of representing a scene with Gaussians but does not use the splatting technique at all. A more apt description reflecting the method’s characteristics would enhance the paper’s conceptual clarity. Typos and writing suggestions. Page 2, second paragraph: “… can be efficiently rendered by spltting”. “Splitting” should be splatting. Page 3, Section 2.1. The full name of DDR is expected for clarity.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a supervised learning method for extremely sparse view reconstruction. The idea of representing 3D features with Gaussians is novel and inspiring. The method’s description is clear and easy to understand. Experimental results show the proposed method’s effectiveness. However, some weaknesses regarding implementation and experiment details lead to confusion for readers. The paper can be accepted once the aforementioned weaknesses are addressed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The rebuttal addresses my concerns about TTO and the method name. I acknowledge the technical novelty and clinical significance of this paper regarding sparse-view CT reconstruction. I am inclined to rate it as an accept.



Review #3

  • Please describe the contribution of the paper

    1.The authors proposed to introduce 3D Gaussians as an explicit feature representation in supervised CBCT reconstruction. 2.The paper proposes a framework DIF-GS, which earns 3D Gaussians from sparse projections as an explicit 3D Representation. 3.The paper proposes test-time optimization to improve the generalization capability of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper propose a novel reconstruction framework which leverages 3D Gaussians to represent the feature distribution in the 3D space. 2.The addition of testing time optimization during the inference process further enhances the model’s generalization ability. 3.The paper shows improvement compared with previous work.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper’s results lack adequate statistical analysis.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Statistical results such as P value or mean method can be displayed in the results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written, clearly stated, and its method is novel enough.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thank reviewers for their valuable feedback. Overall, reviewers consider the paper is well-written (R4/R5), and our method is novel (R3/R4/R5), attractive (R3), and inspiring (R4). Also, they appreciate our method’s effectiveness and superior performance (R4/R5). Below, we address reviewers’ concerns. We will revise accordingly and add necessary content.

  • R3 [Q1] The number of views. Except for 6-view, we did experiments with 8/10-view (Tab. 1), following DIF-Net (MICCAI’23, extremely sparse-view). 8/10-view shows better results than 6-view. We will include visual examples of 8/10-view in supplementary. More importantly, the framework can be applied to more views, e.g., 20-view with clearer reconstructed details and ~4.+ PSNR improvement over 6-view (still SoTA performance). This is a trade-off between radiation dose and image quality.

[Q2] Clinical significance. Even with 6 views (Fig. 2-4), lungs and bones are reconstructed with precise boundaries. Hence, Extremely Sparse-View Scanning (ESVS) can be utilized in intraoperative navigation, where boundaries of organs are often used for registration (Sec. IV.A in [33]) during surgery. Extremely reduced radiation dose enables more frequent ESVS, allowing for frequent registration and thereby minimizing errors due to patient movement. Also, ESVS can be performed in dentistry (e.g., orthodontics [34]) or orthopedics [35] for surgical planning, as bone surfaces can be extracted from the boundaries.

[33] Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. TMI’11. [34] Tooth model reconstruction based upon data fusion for orthodontic treatment simulation. CBM’14. [35] Pre-operative planning and templating with 3D printed models for complex primary and revision total hip arthroplasty. Journal of Orthopaedics 2022.

[Q3] Training/utilization of Gaussians. Gaussians are generated from projections (Eq. 2-3) and used for feature querying (Eq. 4-5). The whole framework (including generation of Gaussians) is trained end-to-end. Due to limited space, we cannot present all the details of Gaussians here (and in the paper). We will release code later.

[Q4] Gaussians v.s. CNNs. 3D CNNs (as decoders) require high memory and computational consumption and are not feasible for reconstruction with scalable output resolutions. 3D Gaussians have been proven to be a powerful explicit representation, represented by a set of sparse points (with properties) and supporting point-based feature querying (Eq. 4-5). Hence, 3D Gaussians can be effectively integrated into sparse learning and inference, showing memory and computational efficiency.

  • R4 [Q1] Explanation for Tab. 2. For fair comparison, DIF-GS (without TTO) is compared in Tab. 1-2, and 1.8s is network inference. TTO is compared in Tab. 3 and discussed in ablation study. In Tab. 2, training of data-driven methods means training the network with specific epochs (we follow their default configs); inference of self-supervised methods includes per-sample optimization and network inference; inference of data-driven methods is just network inference.

[Q2] Implementation/efficiency of TTO. TTO: network is optimized using Adam (LR=1e-7). In each iteration, one projection view is selected (i.e., batch_size=1). For a view, randomly select 512 rays (512 points sampled in each ray). Loss converges after 60 iterations. Efficiency: 0.465±0.005 s/iter, and 28 seconds per-sample optimization.

[Q3] Name of method: DIF-GS => DIF-G.

[Q4] DRRs: Digitally Reconstructed Radiographs.

  • R5 [Q1] Statistical analysis. Results in Tab. 1 are the mean of 5 repeated experiments. Standard deviations (std) are not reported due to limited space. We will clarify this and include std in supplementary. Specifically, std (PSNR/SSIM) of our method and DIF-Net are [LUNA16: 6/8/10-view] Ours: 0.02/0.10, 0.03/0.08, 0.02/0.09. DIF-Net: 0.13/0.21, 0.09/0.17, 0.11/0.19. [ToothFairy] Ours: 0.03/0.08, 0.02/0.08. 0.03/0.09. DIF-Net: 0.12/0.23, 0.10/0.18, 0.13/0.19.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The DIF-GS model is proposed for sparse-view cone-beam CT reconstruction, utilizing learned 3D Gaussians for explicit feature representation. All reviewers have praised the paper for its clarity and novelty, and the authors have effectively addressed the reviewers’ questions.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The DIF-GS model is proposed for sparse-view cone-beam CT reconstruction, utilizing learned 3D Gaussians for explicit feature representation. All reviewers have praised the paper for its clarity and novelty, and the authors have effectively addressed the reviewers’ questions.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work presents the 3D Gaussians for sparse-view cone-beam CT reconstruction and conducts experiments with extremely-sparsed views (6/8/10). The results were better than previous baselines.
    All the reviewers have appreciated the novelty of the proposed method. After carefully reading the paper, all the reviews, authors’ rebuttal, and post-rebuttal comment, I found the main issue with this work is the clinically irrelevant experiment design. Some issues are listed as follows. (1) the number of views is too less for diagnosis and screening; the authors mentioned the potential use for registration during surgery, however, there is no such evaluation. (2) All the datasets were simulated with a lower resolution (256x256). (3) The use of CBCT for chest regions is questionable.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This work presents the 3D Gaussians for sparse-view cone-beam CT reconstruction and conducts experiments with extremely-sparsed views (6/8/10). The results were better than previous baselines.
    All the reviewers have appreciated the novelty of the proposed method. After carefully reading the paper, all the reviews, authors’ rebuttal, and post-rebuttal comment, I found the main issue with this work is the clinically irrelevant experiment design. Some issues are listed as follows. (1) the number of views is too less for diagnosis and screening; the authors mentioned the potential use for registration during surgery, however, there is no such evaluation. (2) All the datasets were simulated with a lower resolution (256x256). (3) The use of CBCT for chest regions is questionable.



back to top