Abstract

Radiography is widely used in orthopedics for its affordability and low radiation exposure. 3D reconstruction from a single radiograph, so-called 2D-3D reconstruction, offers the possibility of various clinical applications, but achieving clinically viable accuracy and computational efficiency is still an unsolved challenge. Unlike other areas in computer vision, X-ray imaging’s unique properties, such as ray penetration and standard geometry, have not been fully exploited. We propose a novel approach that simultaneously learns multiple depth maps (front and back surfaces of multiple bones) derived from the X-ray image to computed tomography (CT) registration. The proposed method not only leverages the standard geometry characteristic of X-ray imaging but also enhances the precision of the reconstruction of the whole surface. Our study involved 600 CT and 2651 X-ray images (4 to 5 posed X-ray images per patient), demonstrating our method’s superiority over traditional approaches with a surface reconstruction error reduction from 4.78 mm to 1.96 mm and further to 1.76 mm using higher resolution and pretraining. This significant accuracy improvement and enhanced computational efficiency suggest our approach’s potential for clinical application.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1001_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1001_supp.pdf

Link to the Code Repository

https://github.com/Kayaba-Akihiko/3DDX

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Gu_3DDX_MICCAI2024,
        author = { Gu, Yi and Otake, Yoshito and Uemura, Keisuke and Takao, Masaki and Soufi, Mazen and Okada, Seiji and Sugano, Nobuhiko and Talbot, Hugues and Sato, Yoshinobu},
        title = { { 3DDX: Bone Surface Reconstruction from a Single Standard-Geometry Radiograph via Dual-Face Depth Estimation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript present a method for 3D surface reconstruction of pelvis and femur anatomy from a single radiograph, called 3DDX. 3DDX consists of two steps: a depth-estimation step and an SSM fitting step. During depth estimation, a deep neural network (DNN) predicts the segmentation mask, depth of the front face, and the depth of the back face for the left hip, right hip, left femur, and right femur. A novel center-aligned scale-invariant (CASI) loss function is proposed to supervise only the scale of depth estimation. The method is evaluation on a dataset of 2651 X-ray images from 600 patients with corresponding CTs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The novel CASI loss is appropriate for shape reconstruction in X-ray.
    • Reconstructed shapes have an average symmetric surface distance of less than 2mm.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method is limited to a fixed imaging geometry, but X-ray imaging in general is variable, even for purely diagnostic imaging.
    • The second phase is presented as “shape completion,” which is misleading.
    • Various implementation details are needed, including SSM creation, final model registration.
    • Fig 2. suggests that the symmetric surface distance varies significantly, up to (or more than?) 5mm. Report this error.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The core assumption of the method, namely that of a fixed imaging geometry, does not hold in general. Imaging geometries vary from one device to another, and it is common for diagnostic X-rays to be acquired with a variable geometry device. What is the source-to-detector distance in the case of the images collected here? What is meant, precisely, by “fixed” geometry?

    The phrase “shape completion” is misleading given the SSM-based mode through which this step was accomplished. In the literature, shape completion typically refers to completion of a 3D point cloud or other shape representation without explicit prior knowledge [1]. This might be done using a neural network with implicit knowledge encoded in its parameters, but that is somewhat different from a statistical shape model. The clarity of the method would be greatly increased if “SSM-based shape completion” or some other term were used instead. Furthermore, Fig 1 strongly suggests that a deep network G_c was used for this step. Although the caption clarifies that it is a statistical shape model, the U-Net-like symbol is highly misleading and should be changed so that reading the caption is not necessary.

    The details for the creation of the SSM are needed, since this SSM-based shape completion is an essential step. How robust is the SSM used here? Is it created using the entire dataset of 600 CTs? Or is it created in a cross-fold evaluation manner. It is crucial that the SSM not include the given patient in its source data, in order for the evaluation to be considered valid. This should be discussed.

    As an additional detail, how is the 3D reconstruction either with or without SSM-based shape completion registered to the CT ground truth to obtain error metrics? This registration step is a factor in the computation of any of these metrics, so it should be detailed.

    Reporting average error statistics, such as the average symmetric surface distance, may serve to mask the relevant performance because of the shape of the pelvis and the femurs in particular. Both of these anatomy feature large surfaces which are relatively uniform (the ilium and the shaft of the femur), which are “easy” to reconstruct. Fig 2 suggests that the error may approach (or even exceed) 5mm in some regions. This is not necessarily a dealbreaker, but reporting the mean and median error rather than the mean and standard deviation is highly irregular.

    Additional comments:

    • The abstract brushes over the core contributions of the method, saying only what it leveraged.
    • On page 4, the claim that aligning the depth centers amounts to rigid registration is misleading. Rigid registration implies that the two point clouds will then be overlapping, which requires manipulating more degrees of freedom than aligning the depth-axis mean.
    • On page 4, it may be more clear to write ReLU as a function, or max(0, x).
    • Say what “numerical safeguard” is $\epsilon = 0.0001$ to avoid division by zero.
    • Overall, the notation may be unnecessarily burdensome, as it is necessary to refer to the text for most of the terms in each equation. I recommend defining a set $\mathcal{S}_{\rm L hip}$, for example, of pixels $i$ belonging to the left hip segmentation. Then sums over this set can be more clearly defined in the equations, as well as the size of the set $ \mathcal{S}_{\rm L hip} $. Reducing the number of symbols that have to be defined by relying on such a unifying definition would greatly improve the readability of the mathematical descriptions.
    • Need to define eq 4 before introducing the CASI losses, which refer to $h$ before it is defined.
    • For Table 1, why is the median and not the standard deviation or confidence interval reported? Indication of error spread is more important than the median.
    • How do you register the a
    • Fig. 2 in the supplementary material needs a legend for the colormap used.

    References:

    [1] Dai, Angela, et al. “Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis.” 2017, pp. 5868-77, openaccess.thecvf.com/content_cvpr_2017/html/Dai_Shape_Completion_Using_CVPR_2017_paper.html.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although there are a number of significant weaknesses, depth estimation in X-ray is a challenging problem for which the authors have made tangible progress through a novel loss function and problem formulation. At present, there are several technical details that require addressing (implementation, error reporting, misleading presentation) before I am confident to recommend acceptance. I look forward to the author’s response.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my concerns related to their evaluation, clarifying that a four-fold cross-validation with separate SSMs was used. Although the method is still limited to a narrow range of imaging geometries, the manuscript will be of interest to the MICCAI community.



Review #2

  • Please describe the contribution of the paper

    The authors propose a novel single-view 3D reconstruction algorithm for X-ray CT. The model exploits the penetrative nature of X-ray radiography to estimate two depth maps from a single image, leading to higher quality reconstructions than single depth map methods. The model is trained and evaluated on a large data set of pelvic X-rays and CTs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The prediction of two depth maps from a single image is an interesting idea for under-constrained surface reconstruction. The paper is well written, although the presentation of the loss functions in §2 is not sufficiently contextualized and hard to follow (see comments below).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The losses proposed in §2 are introduced without any contextualization or explanation. For a reader without detailed understanding of the depth estimation literature, it is not possible to deeply understand any of what the authors propose. Specific points of improvement to make the manuscript more readable are below:

    • What do the hyperparameters alpha and lambda_var in Eq 1 correspond to? Do they have any semantic meaning? The values of 10 and 0.85 read as very arbitrary. Why are hyperparameters tuned on datasets of real images appropriate for X-ray images?
    • Eq 3 is presented without any explanation. How does M(.) capture interdependency between depth maps? What is g_i^j?
    • In the CASI subsection, why do you not require supervision on shift?
    • Why is supervision computed on log depths? Furthermore, could the authors please add one line explaining why MSE is not appropriate for depth estimation? That’ll be the first thought for readers interested in 3D reconstruction but unfamiliar with depth estimation.

    In the experiments, the authors only compare their method to ablated versions. Merely saying existing methods “suffer from low reconstruction quality” (e.g., [32,16,21,33]) without direct comparisons limits evaluation of the results presented in this work.

    Numerous previous works have shown that single-view reconstruction models do not learn an understanding of 3D geometry, but rather simply return the most similar object in the training set. As an ablation, I would be very curious how the model’s reconstructions compare to the closest pelvis in the training set. This would help determine if the model actually performs 3D reconstruction in a generalizable way. (Reference: https://openaccess.thecvf.com/content_CVPR_2019/html/Tatarchenko_What_Do_Single-View_3D_Reconstruction_Networks_Learn_CVPR_2019_paper.html)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Will the data also be released upon publication?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A clearer description of the loss functions in the method, along with comparisons to prior work and stronger ablations, would help clarify the contributions of this work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The main idea of this paper (the proposed dual-face depth) is very interesting for numerous X-ray tasks, so I think the paper deserves a weak accept. The majority of my concerns were not directly addressed in the rebuttal, but I hope they will be reflected in a revised manuscript.



Review #3

  • Please describe the contribution of the paper

    This is a nice topic of research about bone surface 3D reconstruction from single calibrated X-ray based on the deep-learning estimation of the front and back surface depth maps.

    The depth map estimation using previously proposed scale invariant loss function was extended to multiple depth maps (front and back for each bony structure) and with a new center-aligned scale-invariant loss adapted to the geometry of X-ray imaging devices. The reconstructed surfaces could be then estimated using the 3D points cloud computed from front and back depth maps. The resulting 3D surface is then used to fit a statistical shape model to refine, close and complete the bone 3D surfaces.

    The model is trained and evaluated using a four fold cross validation from 10258 objects having both frontal X-ray and paired CT. Depending on loss function used, single or dual face depth maps, or pre-trained w/wo, authors reported a pelvis surface reconstruction error from 4.78 to 1.76 mm (ASSD).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The proposed work is original and re-use the concepts of computer-vision depth estimation for 3D points recovering and bony surface 3D reconstruction by avoiding using costly voxel-based representation to generate the 3D.

    -The extensive evaluations from a large dataset of 3D objects with paired X-rays and CT of real patients

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -If we can imagine that the 3D reconstructed points are located along the rays at the position of the depths, Authors omitted to provide details about how the 3D surface reconstruction is performed. The choice of this component could affect the surface accuracy, particularly at the locations where the surface is tangent to the detector plane (we see that the femur surface is cut in Fig. 2), i.e. where the front depth is almost equal to the back depth.

    -The X-rays and CT images pairing requires per-bone rigid 2D-3D registration. The method cited [30] was developed for C-arm device, therefore, applying it to larger digital X-ray could have inaccuracies (in addition to convergence failures that were excluded). The registration should be sub-millimetric for this application since the ground truth depth maps are extracted for registered 3D models segmented from aligned CT images. How were calibrated the X-ray image 3D environment ?

    -Since the depth maps are masked by the predicted 2D segmentation, the quality of the 3D surface reconstruction (boundary) could be affected by the segmentation result. Could authors elaborate on that point please. For instance, in case of degraded X-ray presented as input, the proposed method may generate a 3D model with large errors.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would be important to specify which kind of X-ray device geometry is supported and which one was adopted in the current evaluated work. The discussion of the method generalization to biplanar X-ray could be also interesting: would be still require a SSM to refine and regularize the 3D surface if two (or more) views are used ?

    It would be interesting to talk/report some examples of cases with larger errors (with more difficult X-rays with bad contrast and image quality).

    The section 3.4 (Implementation details) must be moved before the presentation of the results.

    The figures with the depth maps are difficult to follow. They should be improved, color bar scale is seems to be different for front and back depth maps (Fig. 2, and Fig. 2 in supplementary materials). Taking the simple structure of the femur diaphysis, with a frontal X-ray view-point, this is not possible that colors become inverted between the femoral head and the diaphysis regions for instance.

    Fig.1: the SSM model fitting (Gc): it looks like Gc is a neural network but according to the main text, this is SSM fitting process.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work brings original deep learning method for 3D reconstruction from planar X-ray. The method uses a compact representation by adopting depth maps (and can handle multi-objects 3D reconstruction) and avoid using costly voxel-based representation to generate the 3D that is a current limitation of other methods for direct inference of 3D models from planar images.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Authors adequately answered to reviewers concerns. Moreover, the planned improvements to be done in the manuscripts are relevant.




Author Feedback

We thank the reviewers for their highly constructive and positive feedback. We focused on exploring the proposed dual-face depth and the CASI loss (which are this paper’s main contributions/novelty) together with the medical application. We understand the paper lacked details on 3D reconstruction, shape evaluation, and the SSM model.

Details about the 3D reconstruction (R1, R3): When converting an estimated depth map to a point cloud, we created the rays based on the detector size and the source-to-detector distance (SDD) recorded in the DICOM header, assuming a pinhole camera with a regular viewing frustum without the skew. The SDD is a major factor in determining geometry. For hip X-ray imaging, the SDD of 120 cm is defined as the international standard in [10]. There can be flexibility in this imaging geometry (i.e., the position of the X-ray source relative to the detector)as the operators manually set the X-ray source. Thus, accurate geometry is not measurable unless time-consuming geometric calibration is used, which is not actually performed in clinical routines. Our 600-patient (2565 X-ray image) dataset was acquired in this fashion over more than five years in clinical routine by many and varied operators. Despite this, we demonstrate high 3D reconstruction accuracy in our cross-validation experiment, demonstrating high robustness against the small geometry variation due to the human operator in training and testing datasets. Since the term “fixed-geometry” is confusing, we will change it to “standard-geometry” throughout the paper.

Our Figures had two major problems that confused the readers (R1, R3). First, the color map of the back-face depth map was inverted. We will correct this. Second, the G_c in Figure 1 was depicted as a neural network to show the implementation-agnostic nature, but it was unclear since our actual implementation used the SSM. Although some advanced neural networks can be applied in theory, we chose the fundamental/simple SSM-based 3D completion so that the reader can focus on our main points: the dual-face depth and CASI loss. We will correct the image for G_c in the revised paper.

Q1 (R1): the quality of the 3D surface reconstruction (boundary) could be affected by the segmentation result. A1: We agree on this. Although our segmentation model showed high accuracy (0.988 dice indicated in Section 3), the segmentation performance degraded for the diseased patients. The revised paper will discuss the effect of segmentation performance.

Q2 (R1): It would be interesting to talk/report some examples of cases with larger errors. A2: We will report those cases in the supplemental materials in the revised paper.

Q3 (R3): The details for the creation of the SSM are needed. A3: We applied a four-fold cross-validation (mentioned in Section 3), which means four SSM models. We used the same cases (common training dataset) for training the depth map estimations and SSMs; thus, there was no contamination.

Q4 (R3): How is the 3D reconstruction registered to the CT ground truth to obtain error metrics? A4: This is an important point. We aligned the depth center of the prediction (with or without SSM-based shape completion) to the ground truth shape, i.e., adding a shift in the Z direction before calculating the error in order to evaluate the reconstructed shape while ignoring the absolute distance from the X-ray source which is not possible to measure due to the manual variability mentioned above.

Q5 (R3): Reporting the mean and median error rather than the mean and standard deviation is highly irregular. A5: The revised paper will report the mean and standard deviation.

Q6 (R4): I would be very curious how the model’s reconstructions compare to the closest pelvis in the training set. A6: In the revised paper, we will report the results using the closest shape in the training dataset.

Other comments: The revised paper will improve the organization of the sections (R1) and the formulas (R3, R4).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers increased their scores after the rebuttal and unanimously agree that the work should be accepted for presentation.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers increased their scores after the rebuttal and unanimously agree that the work should be accepted for presentation.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top