List of Papers Browse by Subject Areas Author List
Abstract
The accurate prediction of femur shape changes due to hip diseases is potentially useful for early diagnosis, treatment planning, and the assessment of disease progression. This study proposes a novel pipeline that leverages geometry encoding and context-awareness mechanisms to predict disease-related femur shape changes. Our method exploits the inherent geometric properties of femurs in CT scans to model and predict alterations in bone structure associated with various hip diseases, such as osteoarthritis (OA). We constructed a database of 367 CT scans from patients with hip OA, annotated using a previously developed bone segmentation model and an automated OA grading system. By combining geometry encoding and clinical context, our model achieves femur surface deformation prediction through implicit geometric and clinical insights, allowing for the detailed modeling of bone geometry variations due to disease progression. Our model demonstrated moderate accuracy in a cross-validation study, with a point-to-face distance (P2F) of 1.545mm on the femoral head, aligning with other advanced predictive methods. This work marks a significant step toward personalized hip disease treatment, offering a valuable tool for clinicians and researchers and aiming to enhance patient care outcomes.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4076_paper.pdf
SharedIt Link: https://rdcu.be/dV1Wi
SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_35
Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4076_supp.pdf
Link to the Code Repository
https://github.com/RIO98/FemurSurfacePrediction
Link to the Dataset(s)
N/A
BibTex
@InProceedings{Li_Prediction_MICCAI2024,
author = { Li, Ganping and Otake, Yoshito and Soufi, Mazen and Masuda, Masachika and Uemura, Keisuke and Takao, Masaki and Sugano, Nobuhiko and Sato, Yoshinobu},
title = { { Prediction of Disease-Related Femur Shape Changes Using Geometric Encoding and Clinical Context on a Hip Disease CT Database } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15003},
month = {October},
page = {368 -- 378}
}
Reviews
Review #1
- Please describe the contribution of the paper
This study presents a comprehensive pipeline for predicting femur shape changes caused by hip osteoarthritis (OA) progression. By utilizing partially diseased CT scans and demographic data, the model predicts the femur shape of the diseased lateral by leveraging the contralateral, normal lateral from the same patient’s CT scan. In contrast to previous methods based on statistical shape modeling (SSM) and deep learning (DL) the present study integrates geometric encoding and clinical context (KL/Crowe scores and demographics). The method may support predicting femur shapes corresponding to specific hip OA states but further accuracy improvements and validation are necessary.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The author’s present an innovative approach of bridging methods that traditionally overlook the integration of clinical context into spatial data. By combining geometric shape analysis with clinical context awareness, the paper aims to bridge the latent correlations between normal and OA femurs via surface deformations. Further, the method incorporates demographic and pathological data, which are known to influence femoral morphology.
The author’s main innovation is the multilayer perceptron (MLP)-based model, which is likely lightweight and efficient with inference predicitons and could be advantageous in clinical settings where timely decision-making is crucial. However, inference and training times were not reported.
Overall, the paper was well written and the methods were clearly explained. The motivation of the study was well described.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
While the paper mentions achieving “moderate accuracy” comparable to existing methods, the evaluation lacks thoroughness.
The use of a pre-trained bone segmentation model, but more specifically an automated osteoarthritis grading system to annotate the CT scan database introduces potential bias from the original site where these methods were trained. The paper should address this concern by providing details on manual review or validation procedures to mitigate bias.
It is concerning that direct comparison between normal and diseased femurs sometimes yields better results than predictions achieved by all tested networks. This inconsistency raises questions about the reliability of the model and suggests potential issues with model training.
In comparison of this papers method with other approaches, the networks have significant prediction issues, often generating shapes that do not resemble the original input. Properly trained networks should at least yield the healthy input femur, but the observed distortions greatly weaken the validity of the comparisons. While hyperparameters were kept at their default for ‘fair comparison’ a properly trained network is more of a ‘fair comparison’ to the proposed method. .
The study’s design is somewhat concerning: patient hip OA grades are used as input along with the contralateral healthy femur to predict the diseased femur at the same time point. There is no certainty that the predicted diseased output is generated from anything except the input normal femur and its Crowe/KL grading. The presented work must be validated with a longitudinal dataset to prove the efficacy of the predictive model; for example, at timepoint 1, the diseased femur must be predicted for a later visit and segmentation at timepoint 2 (the later visit) should serve as the ground truth.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Do you have any additional comments regarding the paper’s reproducibility?
The dataset may be private but methods/code is given at an anonymized link.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
The title reads in an unfinished fashion ‘…Clinical Context-aware on…’ context-aware mechanisms appears in the abstract. Should this be in the title?
The keyword ‘bones’ should be replaced with femur, which seems more appropriate.
This is a nicely written study and the methods are promising but additional validation and a more rigorous evaluation/comparsion to existing methods is required for immediate acceptance.
The authors should address concerns about potential bias introduced by the use of automated annotation methods by detailing the steps taken to mitigate this bias. Consider incorporating manual review or validation procedures to ensure the accuracy and reliability of the annotated data.
The study must recognize the importance of validating the proposed method’s efficacy with longitudinal ground truth data for predicting femur deformations to establish its clinical utility and applicability.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Reject — should be rejected, independent of rebuttal (2)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Primarily, the issues with limited evaluation (with longitudinal data) and unconvincing performance of the tested methods are the main downfalls of this submission.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Weak Reject — could be rejected, dependent on rebuttal (3)
- [Post rebuttal] Please justify your decision
Given the challenge in creating a longitudinal dataset for validation, the rating of the paper is increased slightly. A very small cohort (2 or 3 cases) of longitudinal data should be available given the private dataset from a healthcare institution, so the rebuttal on this topic is not fully convincing. Further, stating the method is predictive, yet allows for baseline inputs to be the highest scoring result in some instances, remains unconvincing.
Review #2
- Please describe the contribution of the paper
The structure of the paper is well-organized, providing a comprehensive explanation of the data preparation process, model architecture, experimental results, and comparative studies. The key contributions of the paper are: 1) a complete pipeline for predicting femur shape changes through geometric encoding and clinical context-awareness; 2) validation of the impact of these mechanisms on prediction accuracy; 3) demonstration of the proposed pipeline’s effectiveness using a non-longitudinal dataset. These contributions are clearly articulated and backed by experimental results, offering a significant advancement in the field of medical imaging and clinical diagnosis.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper presents an innovative approach for predicting femur shape changes related to hip disease by leveraging geometric encoding and clinical context-aware mechanisms. The study employs point cloud deep learning and integrates clinical data to improve the accuracy of shape prediction, aiming to offer valuable insights for the early diagnosis, treatment planning, and disease progression assessment of conditions such as osteoarthritis (OA). The proposed method is validated using a dataset of 367 CT scans and demonstrates superior performance compared to existing models, showing notable improvements in prediction accuracy and robustness.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-
The dataset used in the study, while comprising a substantial number of CT scans, appears to be sourced from a single or limited number of clinical settings. This lack of diversity in data sources could limit the model’s generalization capabilities in broader clinical contexts. To address this, consider expanding the dataset to include scans from different geographic regions, hospitals, and patient demographics, ensuring a more comprehensive representation of clinical scenarios.
-
The model architecture incorporates complex elements such as geometric encoding and clinical context-aware modules. While this complexity contributes to prediction accuracy, it can hinder understanding and interpretation, especially for clinical practitioners. A more detailed analysis of the model’s internal workings, along with clear explanations of key mechanisms, would enhance its interpretability. Additionally, providing sensitivity analyses on the model’s parameters can help understand the impact of different configurations on performance.
-
The paper demonstrates the effectiveness of the proposed method in predicting femur shape changes, but lacks detailed explanations of how specific clinical context factors influence predictions. Providing a deeper analysis of the underlying relationships between clinical data and predicted outcomes would improve the model’s credibility and applicability in real-world settings. Including visualization tools that illustrate these relationships could further aid in the interpretation of results.
-
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
-
The experiments focus primarily on the accuracy of femur shape predictions, without extensively exploring the model’s robustness under various clinical conditions. To ensure the model’s reliability, consider conducting experiments with different subsets of the dataset, covering a wider range of OA progression stages and clinical characteristics. This would provide a more robust assessment of the model’s performance.
-
Although the paper demonstrates the potential clinical value of the proposed model, it lacks concrete examples of its practical application in clinical environments. To bridge this gap, the authors could provide case studies illustrating how the model’s predictions can be used to guide clinical decision-making or treatment planning. Additionally, feedback from healthcare professionals and patients on the model’s usability and impact would offer valuable insights into its real-world applicability.
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I value the design motivation of the method and the logical coherence of the article more than the operation and results.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
Given a dataset of diseased-pathological bilateral femurs pairs, Authors train a model to predict the shape of the diseased femur from the opposite healthy one. They leverage the proven symmetry between bilateral femurs.
Authors use pre-trained models to generates 3D meshes for 367 femur pairs and generate disease grading scores. They generate a “Geometric Encoding” (GE) of position and index encodings for each vertex in the healthy input. They generate a “Clinical Context-aware” (CCA) Encoding from Demography and Pathology metadata for each input. A small MLP is trained on the combination of GE and CCA vectors predicting vertex co-ordinates of the diseased output. Model optimization uses MAE and L1-Chamfer loss between true and predicted vertices.
Results are compared to off-the-shelf point-cloud auto-encoders trained without GE or CCA.
Authors show their method predicts more accurate shapes for higher disease gradings using metrics like point-to-face and Hausdorff distances.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The Authors propose a valuable method to incorporate patient demography and clinical context into diseased femur shape prediction. Authors figures are clear and helpful in understanding the both the data preparation and model training phases. A series of metrics are used to compare their work with other methods, each of which evaluates the model performance in a different way. Authors explore the advantage of also adding cinical context above including only the geometric encoding, which yields better results. They will provide source code for their method which will go towards the reproducibility of their work on other datasets.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The Authors do not adequately describe the formulation of the Geometric (GE) and Clinical Context -aware (CCA) embeddings that are pivotal to the success of their method. Additionally, there is limited discussion about where their work sits in the context of other methods applied to medical image analysis. Further details in the comments.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
The dataset of 367 CT scans is not described in detail, so it is unknown if this is publicly available. However, the data preparation models and tools are all well described, and hyperparameters are give in the supplementary information.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Major Concerns
- The construction of the Geometric (GE) and Clinical Context -aware (CCA) embeddings are pivotal to the Authors’ contributions, but they are not well desribed.
- In §3.1 Authors state that the 3D coordinates are “encoded to a 63-D position feature vector” is this number meaningful? why not 64?
- When the position and index embeddings are concatenated, does this mean the final GE vector is 32 + 63 = 95-D for each vertex? Why this approach and not additive?
- Authors state that the CCA embedding is derived from the demography and pathology embeddings, but there is no definition of these. What do these looks like pre-embedding?
- The GE and CCA embedding are fused, how?
- Which method of positional encoding did authors use in this work? S2.2 Paragraph 1 implies these are fixed sinusoidal/Fourier, why not absolute or relative encodings, e.g. RoPE or ALiBi?
- In §1 Authors briefly mention FlowSSM and Mesh2SSM for shape prediction, both of which lack the ability to take into consideration patient-specific conditions.
- Authors claim to ‘bridge’ this gap by incorporating clinal context, however there is no comparisson of these methods with their own.
- Authors compare against several off-the-shelf point-cloud auto-encoder models (PCM), despite not mentioning any of these earlier - why were these ones chosen?
- Additionally, there are many other works that perform 3D shape prediction in the medical imaging domain (see the Review in [1]). Authors should acknowledge these and put their work into context.
- In §1 and in various other places, Authors claim that adding GE and CCA “enhances prediction accuracy”, however there is no study which demonstrates this concretely.
- This would be evidenced by taking their proposed backbone and removing the GE. Only GE and GE+CCA are compared.
- In §2.1 Authors say that each surface mesh contains “roughly 8192” vertices which are registered to get a mean shape and finally non-rigidly deformed to each surface.
- Please clarify if this results in each surface having the same number of vertices. If not, then please clarify if the later Index Encoding step is therefore valid with differing numbers of vertices.
- Please comments on the maximum and mean surface distance error between the deformed mean shape with fixed numebr of vertices, and each of the sample surfaces?
- Given that some pathological femurs can have high surface variation, reducing the number of vertices that represent these cases could be limiting, please comment on this.
- It is unclear what “Normal-Diseased” means when Authors are comparing comparing their results.
- Is “diseased” in this context the mean shape of the diseased femur at a specific KL/Crowe grade, and therefore Normal-Diseased = (single patient normal input) - (mean KL/Crowe from the dataset)?
- If the diseased mesh is the GT as stated in the table, then taking Normal-Diseased is not the same as predicitng the shape from the normal, Authors should not embolden these results.
Minor Comments
- The title of this work does not make sense and should be changed for clarity. Please consider removing “…-aware…” or replacing “…Context-aware…” with either:
- “…Context-awareness…”
- “…Context Information” or
- “…Context {Enhancement/Enrichment}…”
-
Fig 1 - the caption states that panel (b) “classifies segmented images into normal or diseased based on pathology labels”. However, this seems to be done in stage (a) as described in Section 2.1 ‘Data preparation’ and in Fig 2. Authors should described the 3D reconstruction at (b) in the caption instead.
-
The Dataset Preparation method uses 2 pre-trained models (1) for segmentation of the femurs and (2) for assigning the 2 KL and Crowe OA disease progression grades. Authors acknolwedge that the quality of these models will influence the performance of their shape prediction model. It would therefore be useful to record the performance metrics of these upstream models in the text such that the reader had a clear understanding of the extent to which improving the upstream models may impact the Authors’ work.
-
In §2.2 beneath the objective function, Authors label Lambda as “the hyperparameter”. Perhaps this is better referred to as a “weight” given that there are many hyperparameters surrounding the model architecture, training, and data preparation.
-
In §2.2 Authors include both GE and CCA features. However, in S3, experiment (1) is described as using the backbone, but this includes only GE features, while experiment (2) says the backbone’s GE features are “enhanced” with CCA features. Authors should make clear in §2.2 that the “backbone” uses only GE features.
-
In §3, paragraph 2 on pg 7, Authors state that their work specifically targets KL grade = 4, and Crowe > 0. Is this a post-fact statement based on their result that their method should only be used on this condition? Or was this the case from the outset? Perhaps this just needs rewording for clarity.
- There is a typo in Table 1, final row, “Method = GE + CCA +” has a superfluous terminating “+”.
[1] Wang, J.Z., Lillia, J., Kumar, A. et al. Clinical applications of machine learning in predicting 3D shapes of the human body: a systematic review. BMC Bioinformatics 23, 431 (2022). https://doi.org/10.1186/s12859-022-04979-2
- The construction of the Geometric (GE) and Clinical Context -aware (CCA) embeddings are pivotal to the Authors’ contributions, but they are not well desribed.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Accept — should be accepted, independent of rebuttal (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Desipte a few clarifications that need to be made on the structure of the GE/CAA feature inputs, the paper clearly describes a method for diseased femur shape prediction which yields good results. The paper also demonstrates that the addition of clinically relevant features improves upon geometric modelling alone.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Accept — should be accepted, independent of rebuttal (5)
- [Post rebuttal] Please justify your decision
Authors have provided additional details in their rebuttal that describe some of the details missing in the paper. The Authors’ work demonstrates novel work which will make an interesting contribution MICCAI.
Author Feedback
We appreciate the reviewers’ detailed feedback and insights. Below, we address the primary concerns raised. (R5) Dataset Efficacy is doubted due to dataset’s cross-sectional nature. A. While longitudinal data would ideally validate our models, we would like R5 to understand that building a longitudinal 3D database of the hip, particularly over extended periods from ‘healthy’ to ‘severe OA’ stages, would be a fairly large and challenging project in itself. Thus, we consider that it is beyond the criteria of a MICCAI publication to make evaluation with longitudinal datasets an absolute requirement. As R3 and R4 recognized, our paper’s strength lies in its methodological innovations and proof-of-concept experiments, and recent literature supports using cross-sectional datasets for longitudinal predictions [1]. To prove model’s robustness, an ablation study showed that our model tend to converge to homogeneous shapes for each grade when given only the normal femur and its/target’s OA grade. The model generates distinctive shapes when provided the grading difference between normal and diseased femurs. We will include these in the final version and extend our experiments with a longitudinal MRI dataset in future research. (R5) Model reliability is questioned, as direct comparisons sometimes yield the best results while some comparison models generate unreliable shapes. A. The first issue primarily occurs with small OA grading differences, indicating minimal disease progression and making subtle geometry predictions challenging. As stated in S.3, our focus is on significant shape changes, which are more clinically relevant, particularly for surgical candidates. Regarding the comparison model issue, the best default sets of hyperparameters and templates provided officially, validated by the same metrics, were used. The failure to generate “reliable” shapes from comparison models reflects the complex nature of geometric learning and limited trainable shapes. If a prediction model generates shapes resembling the input, it functions as an autoencoder, shifting from being predictive to descriptive. We appreciate your pointing this out and will add related descriptions. (R3, R4) Details of GE and CCA embeddings? A. We use sinusoidal encodings following [16] to convert each of three coordinates into a 20-D vector. These are then appended to the original coordinates, making it 63-D. This aims to amplify subtle changes of input, differing from the goals of RoPE and ALiBi. Positional and index embeddings are concatenated, not added, to convey distinct information. Before being sent to a 2-layer dense network, CCA includes multiple 32-D demography and pathology vectors. The description will be added. (R3) Why these PCMs were chosen over SSMs for comparison? Reconstruction details? A. These PCMs are SOTA models for 3D point clouds, suitable for comparing with our DL-based method. FlowSSM and MeshSSM focus on descriptively building non-linear SSMs to capture subtle variances, while our method integrates geometric info with clinical context for shape prediction. Further description and additional mesh reconstruction details will be provided. (R4) Dataset diversity and practical application. A. Our dataset includes CT scans from patients aged 17 to 87 with unilateral hip OA, covering common KL and Crowe grading combinations. More details about the model sensitivity will be provided in the revised paper. Future longitudinal work will also include case studies with treatment planning. (R3, R5) Bias from upstream models. A. The accuracy of the segmentation model is DC: 0.991±0.005 and ASD: 0.152±0.384 mm. The accuracy of the OA grading model (one-neighbor) is 0.955±0.021. The training data and predictions were verified by experts. (R3, R4, R5) Minor concerns A. We appreciate the pointing out and will revise accordingly. [1] Campello, et al. “Cardiac aging synthesis from cross-sectional data with conditional generative adversarial networks.”
Meta-Review
Meta-review #1
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper makes a valuable contribution to the conference. The concern regarding the lack of experiments on longitudinal data can be considered for future work.
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
The paper makes a valuable contribution to the conference. The concern regarding the lack of experiments on longitudinal data can be considered for future work.
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper presents a new method for diseased femur shape prediction (e.g., related to OA) which incorporates bone features as well as clinical metadata. All reviewers are somewhat positive about the paper and the most critical reviewer is mostly concerned about the missing longitudinal data evaluation. While I agree that this would strengthen the evaluation, I believe that the surrogate setup (predicting the diseased femur from the opposing healthy one) is enough for a proof-of-concept.
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
The paper presents a new method for diseased femur shape prediction (e.g., related to OA) which incorporates bone features as well as clinical metadata. All reviewers are somewhat positive about the paper and the most critical reviewer is mostly concerned about the missing longitudinal data evaluation. While I agree that this would strengthen the evaluation, I believe that the surrogate setup (predicting the diseased femur from the opposing healthy one) is enough for a proof-of-concept.