Abstract

Retrieving 3D bone anatomy from biplanar X-ray images is crucial since it can significantly reduce radiation exposure compared to traditional CT-based methods. Although various deep learning models have been proposed to address this complex task, they suffer from two limitations: 1) They employ voxel representation for bone shape and exploit 3D convolutional layers to capture anatomy prior, which are memory-intensive and limit the reconstruction resolution. 2) They overlook the prevalent occlusion effect within X-ray images and directly extract features using a simple loss, which struggles to fully exploit complex X-ray information. To tackle these concerns, we present Spatial-division Augmented Occupancy Field~(SdAOF). SdAOF adopts the continuous occupancy field for shape representation, reformulating the reconstruction problem as a per-point occupancy value prediction task. Its implicit and continuous nature enables memory-efficient training and fine-scale surface reconstruction at different resolutions during the inference. Moreover, we propose a novel spatial-division augmented distillation strategy to provide feature-level guidance for capturing the occlusion relationship. Extensive experiments on the pelvis reconstruction dataset show that SdAOF outperforms state-of-the-art methods and reconstructs fine-scale bone surfaces. Our code will be made available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2205_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2205_supp.pdf

Link to the Code Repository

https://github.com/xmed-lab/SdAOF

Link to the Dataset(s)

https://github.com/naamiinepal/xrayto3D-benchmark

BibTex

@InProceedings{Che_SpatialDivision_MICCAI2024,
        author = { Chen, Jixiang and Lin, Yiqun and Sun, Haoran and Li, Xiaomeng},
        title = { { Spatial-Division Augmented Occupancy Field for Bone Shape Reconstruction from Biplanar X-Rays } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method that utilizes a neural occupancy field to reconstruct the bone shape from bi-planar X-rays.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Instead of learning a voxel representation, this paper proposed to learn the occupancy field for better reconstruction of bones.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The experiments lack proof of generalization across different types of reconstruction objects and baseline models for comparison. Please refer to comments for details.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The XrayTo3D benchmark includes various reconstruction objects besides the pelvis, such as ribs and vertebrae. However, the experiments lack demonstrations on a broader range of human bone structures.
    2. In cases where density distribution is crucial, such as in teeth, can the model accurately reconstruct the density details? If not, what is the clinical value of the proposed method?
    3. Models based on adversarial learning, as referenced in [1], and implicit neural representations, as in [2], are notably absent in the baseline comparisons for cross-dimension translation in radiology imaging.

    [1] Ying, Xingde, et al. “X2CT-GAN: reconstructing CT from biplanar X-rays with generative adversarial networks.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. [2] Park, Sihwa, et al. “NeBLa: Neural Beer-Lambert for 3D Reconstruction of Oral Structures from Panoramic Radiographs.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 5. 2024.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces a novel approach to representing 3D objects with an occupancy field in radiology imaging. However, the experiments are limited to the pelvis, and several important baseline models are not included in the comparisons.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author has clearly addressed my questions. Although some experiments are missing, I believe the paper merits acceptance. Therefore, I would like to raise my rating to “weak accept.”



Review #2

  • Please describe the contribution of the paper

    This is a nice topic of research about the 3D bone reconstruction from (bi)planar X-rays using efficient occupancy field representation that allow iso-surface generation at the selected scale/resolution. The implicit representation is memory efficient since is does not require explicit storage of the 3D volume. Moreover, applied to the X-ray imaging specificity, a spacial-division augmentation (using distillation) is adopted to try to minimize the influence of the occlusion due to the overlapped structures in X-ray projection.

    The 3D reconstruction method is trained and evaluated using a Pelvis CT-scan dataset. Corresponding X-ray projection are computed using DDR generator. The proposed method is compared to state-of-the-art voxel-based representation methods as well as to original method using occupancy field developed for natural images. Authors claimed that the proposed method outperforms voxel-based methods. An ablation study allows to conclude about the proposed components.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Originality of the work to apply and augment a method from computer vision processing natural images into a bone 3D reconstruction method from X-ray imaging. -It is true that the voxel-based representation raises limitations (memory and low image size) to get accurate results in direct 3D modeling prediction/generation from planar X-rays. Using the occupancy field (OF) as an alternative is memory efficient and allow to generate different scale/resolution of 3D models once a continuous OF function is trained. -The spatial-division using K-subspace to generate intermediate DRR along the body depth to robustify and make aware the occlusion issues in the feature extraction from X-ray images (having structure/tissue overlapped) -The results are supported by an ablation study and the comparison with SOTA methods

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -A main concern is about the assumption of the X-ray image calibration and kind of projection used. In the original paper of [23], the depth (z) is also an input of the OF function. It seems that a parallel projection is assumed in the proposed work, given a simple z estimation with z = Xz for the depth of the ray (with X the 3D point, Xz the z coordinate of X).

    Therefore, when reading the paper, a major question arises: is the method applicable with real X-ray devices ? How the pose of the 3D objects in the space could affect the generated 3D object ? Indeed, the object’s 3D pose affect the appearance in the resulting X-ray (with magnification effect). Is the proposed method robust to that object magnification pose ? For example, depending on the 3D object pose in the 3D space, object could appear more or less magnified in the image, but the bone has an actual constant 3D shape.

    It becomes particularly true in the biplanar X-ray context, where features are concatenated. Depending of the view poses, I am not sure if the method is robust to any point of views. Therefore, more details about these potential limitations should be provided, and real example of its use in clinical setup should be explained or cited as direction for future work I believe.

    -The section about distillation is difficult to follow.

    -Since this is the right Ilium that occludes the left Ilium, why the the right ilium has larger errors ? (EMD 6.1 vs. 5.7, Table 1).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Additionally, Authors use a public dataset for the evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The section 3) Additional experiments in the supplementary material must be moved in the main text: 1) the generalization and robustness to different imaging parameters is paramount to use the proposed method with different imaging systems having various projection (cone-beam projections, fan-beam projections, fluoroscopy….).

    “2D X-ray images are generated by DRRs using the TIGRE [3] package” with which projection ? Does a parallel projection is assumed here ?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well constructed and the proposed method makes substantial progresses in 3D model inference from planar X-ray view.

    More details about the applicability in real clinical setup must be added.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Authors adequately answered to reviewers concerns. However, the planned improvements to be done in the manuscripts are not clear in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    The authors proposed a spatial-division augmented occupancy field (SdAOF), a novel implicit representation based on continuous occupancy field (OF), to improve the bone mesh reconstruction from biplanar X-ray images. In order to provide guidance for learning such OF representation, a dedicated distillation strategy is also proposed. The proposed model is evaluated on a pelvis reconstruction dataset and the results indicate that the model outperforms state-of-the-art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is novel in methodology. The authors investigate the OF implicit representation and refactor the bone mesh reconstruction task to the prediction task of OF.
    2. The proposed model can work as a general framework for few view reconstruction task. The methods are designed for ambiguous number of views and evaluate it for only two views.
    3. The idea to handle the occusion problem is nice. The authors explored to split the spatial space to deal with the occlusion problem, and use a distillation strategy to incorporate such information into the reconstruction process.
    4. The proposed method is evaluated against multiple voxel-representation and baseline PIFu.
    5. The paper is well-explained and in general quite easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors present additional results in the supplementary materials and this should not be desired.
    2. the interp operation in section 2.1 is not well elaborated. it seems that Fi shape is CHW while the pi(p) shape HW.
    3. The reconstruction failure caused by the occlusion problem is not illustrated. Instead from the qualitative results in Fig.3, the results of voxel representation methods do not have problem of occlusion problem.
    4. The additional feature extractor is not fully elaborated. Does it still work without the additional feature extractor? The authors claim that is complements the occlusion-aware ones, but it actually ignores the occlusion information.
    5. How is the view-depth distance in Eqn (4) actually defined?
    6. Ths (B) in ablation study is confusing, in order to separate the distillation step, the authors actually increase the model complexity by adding a image-to-image translation network.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The experiments are done on public dataset and the code will be available. The reproducibility should be fine.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Additional elaboration/results to show that the occlusion problem is helpful.
    2. In Fig.2, the distillation step is expected to also illustrate that distillation is applied to intermediate layers.
    3. More elaboration is expected on the interp operation in section 2.1
    4. Refactor the (B) in ablation study, I would suggest to omit the teacher OF network training and only train the SdAOF in an end-to-end way.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method in this paper is novel and well evaluated by the experiments. However, some key concepts lack robust elaboration and thus rebuttal is needed.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The concerns in my comments are well addressed in the rebuttal




Author Feedback

Thank reviewers for their valuable feedback. Overall, reviewers consider that the proposed method novel (R1, R3, R5) and efficient (R1), appreciate its good performance (R1, R5) and the idea is nice (R1, R5). The major concerns are clinical applicability (Q1-2), method comparison (Q3-4), and elaboration (Q5-10).

[R1] Q1:Imaging geometry robustness: We use cone-beam geometry with magnification controlled by calibration parameters, aligning with real x-ray devices. SdAOF’s robustness to these parameters is validated by experiments in the supplementary materials, which will be moved to the main text in the revision. Moreover, as SdAOF uses accurate geometry instead of learning 2D-3D mapping (current methods), it can enhance robustness by training with varying imaging parameters.

[R3] Q2:Clinical value: Reconstructing bone surfaces from X-rays, rather than CT, can simplify clinical workflows and reduce radiation exposure. Though surface model lacks density distribution, it is useful for orthopedic surgical planning [28] and customized orthodontic appliances [29].

[28]Printed three-dimensional anatomic templates for virtual preoperative planning before reconstruction of old pelvic injuries: initial results. Chin Med J [29]Application progress of three-dimensional printing technology in orthodontics. Digital Medicine

[R3] Q3:Baseline selection: Compared methods are SOTA on XrayTo3D benchmark [25]. Suggested baselines [30,31] are for CT reconstruction, which differs from our focus on reconstructing mesh surfaces.

[30]X2CT-GAN, CVPR’19 [31]Neural Beer-Lambert, AAAI’24

[R3] Q4:More structure validation: SdAOF also applies to other bones. Due to page limit, we report results on pelvis since in XrayTo3D benchmark: 1) It is challenging; 2) Extracted mesh quality from original annotation is the highest, valid for fine-scale reconstruction; 3) Occlusion relationship between left and right ilium is clear, benefiting occlusion analysis.

[R1] Q5:Left and right ilium comparison: The occlusion of different parts (e.g., left/right ilium) in the AP x-ray is mutual (their information overlaps), so there is no guarantee the right ilium will be reconstructed better than the left. Lower output resolution (128^3) may also smooth out details and affect performance.

[R5] Q6: Occlusion analysis: In AP view, left and right ilium overlap significantly, making models struggle to distinguish different parts, resulting in inaccurate reconstruction. We will include visual AP-view examples in the revision to illustrate the occlusion. Voxel-based methods indeed suffer from occlusion and perform worse than ours (see more examples in supplementary).

[R5] Q7:Interp operation: \pi(p) = (x_i, y_i) \in R^2 is the projected location. Bilinear interpolation uses features of 4 closet pixels (each of shape Cx1x1) on the feature map F_i (shape CxHxW). The shape of point-feature f_i(p) is Cx1x1. We will release the code later for clarity.

[R5] Q8:Additional extractor: Our method works without additional extractor. This extractor captures global features across subspaces rather than subspace-specific information in the distillation branch, further enhancing reconstruction.

[R5] Q9:View-depth definition: As shown in Fig. 1, the reconstruction space is divided along the view direction (assumed as the z-axis) into subspaces, with feature planes in the middle of each. The view-depth distance of a feature plane is the distance from the X-ray source to the center of this x-y plane. For a 3D point, its view depth is the distance from the X-ray source to the x-y plane where the point is located.

[R5] Q10:Ablation (B): Training the SdAOF end-to-end without the teacher OF network corresponds to setting (A), as it lacks spatial division information and distillation. Without distillation, the model can’t learn or use spatial division information. Thus, (B) augments the reconstruction network with pseudo-spatial-division inputs using image-to-image translation nets.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a novel approach to 3D bone reconstruction from biplanar X-rays using a spatial-division augmented occupancy field. The reviewers appreciated the originality, efficiency, and strong evaluation of the proposed method. The authors’ rebuttal effectively addressed key concerns, including clinical applicability, robustness, and baseline comparisons. The proposed method demonstrates significant improvements over state-of-the-art techniques, and the authors have committed to clarifying and expanding the manuscript based on reviewer feedback. This paper offers a valuable contribution to the field and should be accepted for presentation at MICCAI 2024.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a novel approach to 3D bone reconstruction from biplanar X-rays using a spatial-division augmented occupancy field. The reviewers appreciated the originality, efficiency, and strong evaluation of the proposed method. The authors’ rebuttal effectively addressed key concerns, including clinical applicability, robustness, and baseline comparisons. The proposed method demonstrates significant improvements over state-of-the-art techniques, and the authors have committed to clarifying and expanding the manuscript based on reviewer feedback. This paper offers a valuable contribution to the field and should be accepted for presentation at MICCAI 2024.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top