Abstract

Fracture risk due to osteoporosis is a highly prevalent disease with costs in the European Union alone of 56 billion p.a.. Accurate assessment of the microarchitecture of the proximal femur (e.g., trabecular thickness, trabecular spacing, bone volume fraction) is essential for assessing bone strength and predicting fracture risk. High resolution (HR) CT provides the necessary spatial resolution. However, for best hip fracture risk assessment HR-CT imaging should be performed at the proximal femur but this would require an unacceptably high level of radiation dose. Therefore, we aimed to investigate whether deep learning based super-resolution (SR) models applied to low-resolution (LR) clinical CT images permit improved assessment of structural parameters. In this study we adapted and optimized state-of-the-art model architectures to compare them in the context of CT-SR of the proximal femur. The dataset used consisted of pairs of clinical LR-CTs and HR-CTs of 50 individuals. This represents clinical reality and avoids bias of downsampling HR images to mimic LR images. Using automated preprocessing data is prepared for model training. We used three-stage template matching of point clouds to automatically extract the relevant regions of interest, from which metrics for bone microarchitecture were determined. We compared SRGAN, Real-ESRGAN+, LDM, and ResShift regarding improvement in structural assessment. We also tested whether 2.5D approaches - using multiple slices of the CT - are superior to 2D approaches. In terms of perceptual reconstruction, the ResShift 2.5D model outperforms the other SR models and achieves comparable results to the Real-ESRGAN+ architectures in the derivation of biomechanical properties.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3144_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/nkoser/Real-Super-Resolution

Link to the Dataset(s)

SR Femur Dataset: https://github.com/nkoser/Real-Super-Resolution

BibTex

@InProceedings{KosNik_Real_MICCAI2025,
        author = { Koser, Niklas C. and Finck, Marten J. and von Brackel, Felix N. and Ondruschka, Benjamin and Pirk, Sören and Glüer, Claus-C.},
        title = { { Real Super-Resolution for Proximal Femur: Enhanced Computation of Structural Bone Metrics from Clinical CTs } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {530 -- 540}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The primary contributions of this study were the construction of a custom LR-HR paired dataset, which can be utilized in future research, and the application and evaluation of state-of-the-art super-resolution models in the context of medical imaging. By applying major super-resolution methodologies, a comparative evaluation was conducted to identify the model that achieved the best performance in the femur region.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    [1] When generating the low-resolution (LR) dataset, the high-resolution (HR) images were not simply downsampled; instead, an actual LR-CT and HR-CT paired dataset was constructed to avoid potential bias. [2] The study provided a clear review of prior research and its limitations to justify the necessity of the proposed work, and adopted appropriate methodologies and evaluation strategies aligned with its research objectives. [3] The differences among the super-resolution architectures used in the study were clearly explained, offering insight into the unique characteristics of each model. [4] Additionally, a point cloud-based ROI selection method was applied during the template matching stage, enabling a more efficient evaluation process.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    This study did not propose a newly developed model or an original methodology designed by the authors to address the problem, and thus, its novelty can be considered limited. In addition, due to the small size of the dataset, it is difficult to expect a high level of generalizability from the presented results.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The majority of the content in this study involved applying previously developed models to femur images without significant modifications or novel implementations. While there was no discrepancy between the research objectives and the actual content, and the study employed appropriate datasets and evaluation methods aligned with its goals, indicating it was a well-conducted study, the lack of originality and novelty led to the decision to reject the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    This work makes a valuable contribution to the MSK research field through the application of a super-resolution model and the creation of a dataset. However, I cannot agree with the authors’ claim regarding the novelty of their research. In my view, this is merely a straightforward application of an existing state-of-the-art super-resolution model. Novelty should arise from advancing the current state of the field, not simply applying an already existing model to a new domain. I have therefore decided to recommend rejection.



Review #2

  • Please describe the contribution of the paper

    This paper evaluated several deep learning-based SR models to improve biomechanical assessments from clinical low-resolution CT (LR-CT) of the proximal femur. It introduces a curated paired dataset of 50 real LR and high-resolution (HR) CT scans. The authors benchmark four state-of-the-art SR models—SRGAN, Real-ESRGAN+, Latent Diffusion Model (LDM), and ResShift—in both 2D and 2.5D settings. The reconstructions are evaluated with standard perceptual metrics (PSNR, SSIM, LPIPS) and also through biomechanical microstructure metrics (BV/TV, Tb.Th, Tb.Sp, Tb.N) extracted from registered anatomical ROIs using a standardized pipeline, which increase the clinical impact of this work.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper curated a good size of bone CT data with LR and HR pair, particularly for proximal femur, which is niche and specialized.
    2. The introduction to the data and the preprocessing is clear and full of details, including registration, lesion segmentation, normalization.
    3. The author compared 4 popular approaches on GAN and diffusion models under 2D and 2.5D settings.
    4. Solid evaluation on basic subjective metrics SSIM, PSNR, LPIPS, GSSIM with inference speed. Also the evaluation based on bone metrics are more interesting and important, with good discussion on these findings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. given the dataset is processed already, the author should make the data directly available online (of course with license) rather than based upon reasonable request, which is a frustrated process we all familiar with as researchers.
    2. No novel SR method introduced or no attempt to improve the SOTA with modifications that suited for the specific of the disease given you are working on such a unique dataset.
    3. A significant omission is some of the 3D SR methods in medical SR in recent years, such as SAINT, TVSRN. Given you already registered the LR to HR, GAN-based method is not not necessary, many methods that require spatial alignment should work too. Also I recall TVSRN also released a paired CT data, although not specific to bone, but the spinal section SR result was impressive. Given the importance of trabecular bone is inherently 3D, 3D methods should be evaluated too.
    4. Please add statistical testing and confidence interval to your evaluation.although I agree with the SSIM and PSNR’s issue, a table for the results is still handy, particularly in the future when other researchers want to compare the metrics. I don’t see the table in current manuscript and I can’t seem to find a supplementary, please double check.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Depends on the data release crieteria I will change my score accordingly.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I changed my suggestion to accept given the authors agree to release code and data without any reservation. Most concerns are addressed



Review #3

  • Please describe the contribution of the paper

    The authors present a comparison of multiple super-resolution methods to obtain high-resolution approximation from clinical CT data of proximal femurs to derive structural parameters of the trabecular bone. Four different methods in 2D and 2.5D for super-resolution generation algorithms were tested with qualitative evaluation and quantitative comparisons of structure metrics: bone volume fraction and trabecular thickness, spacing and number. The models were trained using clinical CT and HR-pQCT image pairs from 50 post mortem subjects.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The research question is valid since, indeed, one could improve fracture strength and therefore fracture risk estimation of the femur if one could accurately derive the structural parameters of the trabecular bone from clinical CT. Several attempts for the subject have been made over the last two decades, especially using textural analysis methods, but generative networks could have potential over traditional methods to solve the problem. The authors also have collected a valid dataset. Organizing a scanning of 50 postmortem subjects and collecting their femurs for HR-pQCT scanning is a laborious task, but such data indeed allows to evaluate how image is formed with scanners of different resolution range. Such dataset allows to gather the relation between different imaging modalities (µCT and clinical CT).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The presented method is a good step towards the final goal, but it is still far from clinical application since only one clinical scanner and, more importantly, only one tube voltage was used. Authors did not mention reconstruction kernel or tube current, so I assume they were not altered either. Clinical CT scanners have a remarkable set of adjustable parameters and, if authors aim to generalize the method, the training and test data should cover a range of images scanned with different parameters. Specifically, used kernel and mAs should be reported. Readers would also benefit if metrics related to signal-to-noise ratio and contrast ratio in the clinical CT image would be given, since that shows what noise level was accepted in the clinical image.

    The title is in its current form misleading, since although these structural parameters of the femur can be linked to mechanical properties, this is not discussed nor is their relation showed in the study. However, this could be overcome by focusing on the title and conclusion to estimation of structural metrics of the trabecular bone instead of biomechanical properties, which is also relevant outcome for a study. The explanation of the first paragraph on page 6 was quite difficult to follow. The aim seemed to be to extract ROIs from the same anatomical region on each bone pair for the bone structure parameter computation. However, I could not follow if ROIs for which correlations were calculated were head, neck, trochanter and shaft or some specific smaller regions in these regions or something else? Could authors mention in the beginning of the paragraph what the operations are aiming to and then compress the rest paragraph to focus on the main operations such that final ROI positions are clear?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I understand that image reconstruction field speaks about low and high resolution images but low resolution CT guides thinking to low dose techniques, However, the noise level seemed quite low in the example clinical CT image so in this sense the used images were not low resolution clinical CT images. Authors could consider using phrases such as clinical CT and pQCT when they speak the specific images they used and not the inputs to the models in general.

    What resampling technique was used when the clinical CT was resampled to pQCT resolution?

    Page 4: “Therefore, it is also transferable to other scanners.” This is true, but if we consider the generalizability of the method, the larger questions are the varying tube voltages and reconstruction kernels within a clinical scanner.

    Please open all abreviations when used when used first time now some such as SSIM and PSNR were missing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study aims to solve relevant issue, calculate structural parameters of the trabecular bone from clinical CT. However, in current form the title and conclusions are somewhat misleading since the authors do not present any biomechanical properties. This is still a further step which authors may represent in the future. Results, especially correlation coefficients are now difficult to interpret since it is not fully clear where the ROIs where the trabecular structure parameters were calculated were located.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors have made significant effor when collecting their dataset and stated to make thed dataset and code directly available. Authors also give a significant benchmark for super resolution techniques.




Author Feedback

We thank all reviewers for their valuable and constructive comments. We are encouraged that they recognized the value of the preprocessed dataset (R1, R2, R3), that our evaluation is solid (R1) and efficient (R2) and that our work addresses a relevant research question towards estimating bone structure from clinical CTs (R3). As requested we will make our dataset and code directly available (R1).

Novelty (R1, R2): We claim that a combination of the following four aspects make our pipeline the first to provide traceable evidence that SR improves the assessment of bone structural competence: (1) we propose the first standardized automated end-to-end approach that enables an autonomy-guided quantification of structural competence of the proximal femur (and as such the only one that allows future developers to test their models against ours); (2) we provide the largest available dataset of independently acquired paired low and high resolution images (avoiding bias of downsampling), which – for the first time – provides a sample size that permits robust training and evaluation of SR methods for the proximal femur; (3) we expanded existing SOTA 2D models to 2.5D in a medical context; (4) we apply ResShift (the most advanced published model) to quantify structural competence on medical image data and compare it to the other three approaches – to our knowledge this is a first-in-kind medical application. Given that we have developed a fully automated standard procedure for comparative analysis of SR models for bones, we think that our work is original and novel.

Dataset and Generalizability (R2, R3): Given that our dataset consists of 50 CT and HRpQCT scans (independently acquired, no downsampling bias), it supports generalizable training of SR methods on the proximal femur. Please note that the next largest dataset only provides 10 data points with a lower resolution difference (R2). We agree that scans from additional scanners with different parameters (e.g. tube currents and voltages, reconstruction kernel) would be beneficial (R3). Please note that this is an extremely demanding undertaking (e.g. access to scanners of different configurations is limited) which is outside the scope of this work. However, we will disclose the requested parameters to support the extension of our dataset and future experiments. The mean SNR of the clinical CT in the femur region is 0.93, the contrast ratio is 2.43. SimpleITK and a B-spline interpolator were used for resampling preprocessing (R3).

Comments R1: We agree that processing 3D volumes has great potential for computing SR. However, as this requires substantially higher compute resources this is a challenging undertaking – we used a SOTA H-100 GPU with 96 GB of VRAM which was not sufficient for training on 3D volumes. To take a step in this direction, we therefore focused on 2.5D methods. SAINT and TVSRN focus on generating missing slices and were not trained with independent datasets (downsampled data or the same scanner). Therefore, we cannot directly compare these methods. We will provide statistical testing to our evaluation (e.g. Real-ESRGAN+ and ResShift are significantly better than low resolution). Supplementary material cannot be provided as per the submission guidelines (thus we added SSIM/PSNR values in Fig 1).

Comments R3: We thank R3 for their detailed comments to improve the exposition of our work. The goal is to consistently extract bone structure metrics from the same anatomical region of the femur, independent of patient-specific variations and without manual ROI selection. This ensures meaningful comparisons by avoiding the analysis of different regions across subjects. We appreciate the suggestion toward a more appropriate title, we will replace biomechanical properties with structural metrics and adapt the conclusion. We have used the abbreviations LR-CT and HR-CT for ease of reading and linked them to the correct terms (clinical CT, HRpQCT) in the Introduction.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper receives mixed reviews (2 positive and 1 negative). I agree with R1 and R3 that although there is no methodology novelty, this paper contribute to the novel applications, especially the authors agree to release code and data without any reservation. Hence, recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers agree on the effort to collect the data and create a valuable dataset of real low and high-dose CT images. The proposed benchmark is sound, and most concerns have been addressed.



back to top