List of Papers Browse by Subject Areas Author List
Abstract
Osteoporosis, characterized by reduced bone mineral density (BMD) and compromised bone microstructure, increases fracture risk in aging populations. While dual-energy X-ray absorptiometry (DXA) is the clinical standard for BMD assessment, its limited accessibility hinders diagnosis in resource-limited regions. Opportunistic computed tomography (CT) analysis has emerged as a promising alternative for osteoporosis diagnosis using existing imaging data. Current approaches, however, face three limitations: (1) underutilization of unlabeled vertebral data, (2) systematic bias from device-specific DXA discrepancies, and (3) insufficient integration of clinical knowledge such as spatial BMD distribution patterns. To address these, we propose a unified deep learning framework with three innovations. First, a self-supervised learning method using radiomic representations to leverage unlabeled CT data and preserve bone texture. Second, a Mixture of Experts (MoE) architecture with learned gating mechanisms to enhance cross-device adaptability. Third, a multi-task learning framework integrating osteoporosis diagnosis, BMD regression, and vertebra location prediction. Validated across three clinical sites and an external hospital, our approach demonstrates superior generalizability and accuracy over existing methods for opportunistic osteoporosis screening and diagnosis.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0082_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{HuaJia_Opportunistic_MICCAI2025,
author = { Huang, Jiaxing and Guo, Heng and Lu, Le and Yang, Fan and Xu, Minfeng and Yang, Ge and Luo, Wei},
title = { { Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15974},
month = {September},
page = {433 -- 443}
}
Reviews
Review #1
- Please describe the contribution of the paper
The manuscript “Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration” describes an AI method that computes textural features from DXA and CT, to predict osteoporosis. The reference data seems to be itself a prediction, based on DXA T-Score + BMD + TBS (no explicitley), reaching around 95% coincidence with the original no-AI based diagnosis.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- combination of CT, DXA, Radiomics and NN.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Dataset is not described sufficiently.
- They have no valid reference of osteoporosis, but just a prediction based on DXA and/or CT (not clear neither).
- They claim improvement of the diagnosis but cannot show it with that study-design.
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(1) Strong Reject — must be rejected due to major flaws
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
They used a standard non-AI predicition as reference to argue improvement but showed only difference.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have addressed my concerns. Still, I believe however that providing basic information about the used CT models, pixel spacing, slice increment, mAs, kernels, etc would help to understand the applicability of the method.
Review #2
- Please describe the contribution of the paper
This paper presented a unified deep learning framework addressing three critical challenges in opportunistic osteoporosis diagnosis using CT imaging via texture-preserving self-supervision, mixture of experts and multi-task integration.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The texture-preserving self-supervision algorithm effectively utilized unlabeled vertebral data through extracted radiomics features, overcoming limitations of random cropping operations in conventional SSL methods.
- The MoE architecture with adaptive gating mechanisms mitigated device-specific variability in DXA measurements, enhancing cross-device generalizability.
- The multi-task learning integrating BMD regression, position prediction, and diagnostic classification leveraged complementary clinical priors to improve diagnostic accuracy.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
It would be better to provide the AUC ROC curve.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper presented a unified deep learning framework addressing three critical challenges in opportunistic osteoporosis diagnosis using CT imaging via texture-preserving self-supervision, mixture of experts and multi-task integration. The results outperformed other compared methods in terms of accuracy or sensitivity or specificity or F1 score.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have addressed the request to include the ROC curve and will incorporate it in the final manuscript.
Review #3
- Please describe the contribution of the paper
This paper presents a unified deep learning method to tackle three major challenges in opportunistic osteoporosis screening using CT scans. First, a proposed sliding-window approach replaces random cropping with a texture-preserving self-supervised learning (TP-SwAV) framework, ensuring that key local trabecular details are retained. Second, a Mixture of Experts (MoE) module adaptively fuses device-specific information based on CT image characteristics and device embeddings, thereby addressing inter-device calibration differences. Third, by combining cross-entropy and MSE losses in a weighted multi-task learning paradigm, the framework simultaneously enhances BMD regression, vertebra position classification, and osteoporosis diagnosis. Validated across three in-house sites and an external hospital, the method demonstrates robust performance by leveraging large unlabeled data, accommodating differences across DXA devices, and utilizing complementary clinical signals to achieve higher accuracy than prior SSL approaches at both vertebra-level and patient-level assessments
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main advantages of this work are a new texture-preserving self-supervised learning component (TP-SwAV) that uses a sliding-window technique over vertebrae to extract high-fidelity radiomic features, so avoiding the texture degradation usually brought on by random cropping and guaranteeing vital microarchitectural details are preserved for strong osteoporosis detection. The authors also provide a Mixture of Experts (MoE) decoder that uses a gating network to combine device-specific expert heads, therefore dynamically accounting for various DXA device biases and greatly improving cross-device adaptability. A multi-task learning system improves diagnosis accuracy by simultaneously handling osteoporosis categorization, continuous BMD regression, and vertebral anatomical indexing, thereby using these complementing supervisory signals. Furthermore included is a large unlabeled dataset of vertebrae that the suggested TP-SwAV integrates for self-supervised pre-training, hence allowing the model to acquire domain-relevant characteristics even before fine-tuning on the smaller labeled set. Through validation across multi clinical locations and an external hospital shows the method’s scalability and robustness.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The suggested technical novelty is modest: texture-preserving pre-training closely mimics other radiomics-guided SSL techniques, and both its base-crop contrastive task and SwinMM’s masked multi-view strategy have already been presented in multiple prior works. Thus, the new TP-SwAV method is mostly a minor engineering tweak. Likewise, addressing scanner or DXA-device bias with a Mixture-of-Experts (MoE) decoder follows a developing trend of MoE research in medical imaging. The only external dataset consists of just 196 cases; hence, external proof of the method’s robustness is lacking. From a deployment standpoint, the pipeline still depends on manual DXA-device labeling, a distinct vertebra-localiser, and PyRadiomics-based texture extraction. Although the publication offers no reports on runtime or failure rates, past comparable research has shown PyRadiomics to be time-consuming and resource-intensive, which is hard to be implemented in real-world scenarios. Moreover, the ablation studies fail to identify how much of the performance increase derives from TP-SwAV as opposed to the MoE decoder and cover just two backbone models. Finally, no statistical significance testing is included, in opposition to best-practice recommendations like CLAIM for multi-site medical AI studies. These limitations raise questions about the therapeutic significance and generalizability of the proposed study.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I scored the paper in that way because its practical strengths—including (i) a texture‑preserving SSL pre‑text task that keeps trabecular detail by sliding‑window radiomics pooling, (ii) a gated Mixture‑of‑Experts head that explicitly corrects DXA‑device bias, (iii) a multi‑task loss that couples diagnosis with BMD regression and vertebral indexing, and (iv) a convincing evaluation on three internal sites plus one external hospital even though each technical block is an incremental extension of prior radiomics‑guided SSL and MoE ideas. However, the novelty provided in the paper is moderate, the external cohort is small (n = 196), the pipeline relies on manual device IDs, a separate vertebra localiser, and time‑heavy PyRadiomics, and the paper lacks runtime figures and statistical significance tests.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all reviewers for the constructive comments and suggestions.
Response to Reviewer #4 Q1: It would be better to provide the AUC ROC curve. A1: We have already plotted the ROC curve and will include it in our final version.
Response to Reviewer #5 Q1: Dataset is not described sufficiently. A1: Due to the page limit and the restriction on supplementary, we will complement comprehensive details of the dataset in our final version. Q2: They have no valid reference of osteoporosis, but just a prediction based on DXA and/or CT (not clear neither). A2: To clarify, DXA is the gold standard for diagnosing osteoporosis, as widely recognized by clinical guidelines (e.g., ISCD, WHO). In this study, we followed standard protocols to define osteoporosis using DXA-measured BMD. Moreover, previous studies and our work have demonstrated the feasibility of opportunistic osteoporosis screening and BMD quantification using clinically acquired CT imaging with DXA as the reference standard. We hope this could address your concern. Q3: They claim improvement of the diagnosis but cannot show it with that study-design. A3: We assessed model performance using four standard metrics (ACC, SEN, SPEC, F1) and compared with SOTA methods (Alice, SwinMM, VOCO). For detailed results, please refer to the Experiments section.
Response to Reviewer #6 Q1: The suggested technical novelty is modest: … the new TP-SwAV method is mostly a minor engineering tweak. A1: While TP-SwAV builds upon SwAV, its design is tailored to the unique demands of osteoporosis diagnosis, which relies critically on fine-grained texture analysis. Traditional random cropping strategies disrupt local texture patterns, which are essential for detecting subtle microarchitectural changes in bone. In contrast, TP-SwAV introduces a texture-preserving self-supervised learning framework, addressing the unique needs of this task. Q2: Addressing scanner or DXA-device bias with a MoE decoder follows a developing trend of MoE research in medical imaging. A2: We introduced MoE during decoding to handle device-specific variability. To enable the model to better adapt to differences between devices, we incorporated DXA device embeddings into the decoding. Despite its simplicity, this approach is effective and experimentally validated. Q3: The only external dataset consists of just 196 cases. A3: Due to data acquisition challenges, Site D includes only 196 cases. We will validate our model on larger multi-site cohorts in future work. Q4: The time-consuming PyRadiomics is hard to be implemented in real-world scenarios. A4: First, computational efficiency can be improved via larger step/window size and parallel processing. Second, PyRadiomics in our pipeline operates offline before pre-training, and subsequent fine-tuning and deployment are independent of it, ensuring no runtime overhead in clinical use. Q5: The ablation studies fail to identify how much of the performance increase derives from TP-SwAV as opposed to the MoE decoder and cover just two backbone models. A5: Tables 2-3 provide stepwise analysis for ResNet34 under different configurations (No pre-training, pre-training using TP-SwAV, fine-tuning with MoE, etc). We also validate the effectiveness of our method on Transformer-based models as shown in Table 5. Compared to the Swin-T baseline, the Swin-T with TP-SwAV demonstrates consistent improvements across three internal Sites A–C, and achieves a substantial 12.25% F1-Score improvement on the external Site D. The consistent effectiveness observed across CNN/Transformer models sufficiently validates the generalizability of our method. Q6: No statistical significance testing is included. A6: We have conducted paired t-tests comparing our method with other approaches (e.g., ResNet34+SwAV/VOCO, ViT+MAE, etc). Results demonstrate statistically significant differences (p<0.05) compared to these methods across four sites. These findings will be presented in the final version.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
I believe that based on the initial reviews and the work done to address the Reviewers concerns the paper should be accepted.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
All three reviewers expressed concern over the limited novelty of the proposed pipeline, which heavily builds upon existing segmentation, registration, and estimation frameworks. Reviewer 1 appreciated the motivation but highlighted the marginal advancement. Reviewer 2 pointed out the lack of strong evidence for generalizability and unclear clinical impact. Reviewer 3 was critical of the modular reuse of prior works, questioning the overall contribution. The validation is confined to the hip region with modest results, and the paper lacks a convincing argument for clinical relevance beyond existing multi-modal approaches. Therefore, I recommend rejection.