Abstract

Collapse of the femoral head is a critical event in osteonecrosis (ONFH) that often leads to debilitating hip pain and necessitates total hip arthroplasty. Early and accurate prediction of collapse risk is crucial for personalized treatment planning. While many studies focus on the automated diagnosis of ONFH, prognosis remains less explored. In this study, we propose a robust tri-stream deep learning framework that extracts features from T1-weighted MRI, region-of-interest (ROI) labels, and ONFH gradings to estimate patient-specific collapse risk. We introduce an independent Spatial Label Encoder (SLE) module that tokenizes discrete ROI labels into dense, context-rich embeddings, thereby facilitating multi-modality model training. Experiments on 92 hips (70 patients) show that our approach performs competitively with state-of-the-art (SOTA) methods across most metrics, achieving a concordance index (CI) of 0.847±0.087 and an integrated AUC of 0.884 in 5-fold cross-validation. Notably, the SLE module enhances long-term discrimination by up to 2.4% on AUC at 60 months compared to our base network. These findings highlight the potential benefits of late-fusion strategies with label tokenization for predicting femoral head collapse in ONFH, contributing to improved early intervention and prognosis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2980_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/RIO98/FemoralCollapsePrediction

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiGan_Predicting_MICCAI2025,
        author = { Li, Ganping and Otake, Yoshito and Kameda, Yuito and Uemura, Keisuke and Takashima, Kazuma and Mae, Hirokazu and Kono, Sotaro and Hamada, Hidetoshi and Okada, Seiji and Sugano, Nobuhiko and Sato, Yoshinobu},
        title = { { Predicting Femoral Head Collapse Risk in Osteonecrosis Using Label Tokenization: A Multi-Modality Survival Analysis Approach } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {515 -- 525}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. Design and propose a Spatial Label Encoder to extract semantic information
    2. Applied the Feature Tokenizer Transformer to the grade related tabular data
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A suitable application and combination of existing modules for osteonecrosis prognosis
    2. Fair and comprehensive comparison to the SOTA methods
    3. The code will be released for replication
    4. Clarified some limitations of the proposed method and future work
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The multimodal fusion is straightforward and simple
    2. The results of KM analysis are not well demonstrated and discussed
    3. The ablation study is not complete, it would be better to investigate how each modality data performant on the survival prediction task
    4. The dataset is private and relatively small
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. The loss function should be better demonstrated, such as the variables in the NPLL loss.
    2. The AUC evaluation is not common in survival prediction, maybe better to clarify why such evaluation is used.
    3. The KM analysis of the SOTA methods and proposed method are relatively similar, all the p-values are very low, and statistically significant. And how the risk groups was stratified?
    4. There are some minor writing issues, such as abbreviations are not defined.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is a suitable combination and application of well established modules. But the multimodal fusion is not novel and straightforward. The experiments can be further improved with comprehensive ablation study, the KM analysis are not convincing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This study proposes a tri-stream deep learning framework that integrates T1-weighted MRI, region-of-interest (ROI) labels, and ONFH grades, introducing a novel Spatial Label Encoder (SLE) to improve multi-modal learning. Five-fold evaluation methods are used. Experiments on 92 hips demonstrate that the method achieves strong performance. Results showed that SLE module further boosts long-term prediction accuracy by up to 2.4%.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty:

    A Spatial Label Encoder (SLE) module is proposed to tokenize the discrete multiclass label map and encode it using a shallow vision transformer.

    The SLE module is combined with a CNN and a Feature Token Transformer for survival prediction.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The number of samples is too small.

    2. Please report the variability of performance for each fold, or the standard deviation of the metrics, to show whether the model is robust given the small sample size.

    3. It would be interesting to see the results without using ONFH grades, or alternatively, to evaluate how the accuracy of ONFH grades affects the prediction results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The writing of the paper is clear, but the number of samples is small.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Given the small number of samples, I originally thought the rebuttal would discuss how the 5-fold experiments were carried out and how robust the results were. However, the authors chose not to address this and instead used the small sample sizes in other studies as an excuse for theirs. They prefer to present this work as a proof of concept to demonstrate the feasibility and value of integrating multimodal data and the SLE module for femoral head collapse prediction. I would like to comment that such a proof of concept is not mature enough for MICCAI.

    Frankly, a sufficiently thorough discussion should be provided when using deep learning methods on extremely small datasets.



Review #3

  • Please describe the contribution of the paper

    The paper makes several notable contributions to the study of Osteonecrosis of the femoral head, a rarely investigated disease. First, the authors analyze an extensive dataset, providing a robust foundation for their findings. They introduce the first survival analysis framework specifically tailored to this disease, marking a significant methodological advancement. Their approach outperforms existing methods according to various performance metrics. Additionally, the authors comprehensively classify their dataset using all established grading systems, ensuring thoroughness and comparability. Finally, they evaluate their model with both conventional and advanced figures of merit, demonstrating the rigor and versatility of their analysis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. They studied a rare-seen disease, Osteonecrosis of the femoral head, in their study.
    2. They studied a large amount of data for their study.
    3. Their performance metrics beat its rivals.
    4. They proposed the first survival analysis framework for this disease.
    5. They classified their dataset by using all grading systems present.
    6. They utilized both conventional and elegant figure of merits for their model.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Their dataset is not public
    2. They do not carry open-science philosophy
    3. They do not mention about RAM and CPU specs
    4. They do not mention about segmentation phase.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    share your dataset

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Topic of study Quality of flow graphs, charts, tables etc.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We appreciate the reviewers’ detailed feedback and insights. Below, we address the primary concerns raised. (R1) The multimodality fusion method is not novel and simple. A. Yes, our fusion method follows a common approach in multimodal learning. As noted at the end of Sec. 1, our goal was not to introduce a novel fusion strategy, but to enhance cross-modality alignment via the proposed SLE module, as also highlighted by R2 and R3. As shown in Table 1, incorporating SLE consistently improves survival metrics, demonstrating its practical value even with a simple fusion method. The clarification will be also added to Sec. 2.2. (R1) The KM analysis of the SOTA methods and proposed method are relatively similar. And how were the risk groups stratified? A. The high- and low-risk groups in the KM analysis were stratified using the median predicted risk score from each model. While all methods yield statistically significant p-values, our proposed method with SLE demonstrates a more distinct separation between risk groups earlier (<50 months), especially reflected in the higher survival probability of the low-risk group (orange line) in Figure 3. This early stratification is clinically valuable for predicting femoral head collapse, where timely intervention is critical. We will revise Figure 3 to highlight this and include the discussion in Section Sec. 3.2. (R1, R2) Dataset is small. A. We acknowledge that our dataset is relatively modest in size compared to other tasks. However: (1) collecting longitudinal (time-to-event) ONFH cases with multimodal data annotations is inherently challenging; (2) a recent work on ONFH collapse classification [1] also collected only a relative modest dataset of 206 follow-up cases out of 718 hips they have; and (3) R3 also noted that our dataset is not small (in this context). In addition, our study is intended as a proof of concept to demonstrate the feasibility and value of integrating multimodal data and the SLE module for femoral head collapse prediction, as also recognized by R2 and R3. We are actively working to expand the dataset and will include this clarification in Sec. 2.1. (R1, R3) Private dataset. A. Current dataset remains private due to its multimodal nature and patient consent restrictions. We are now working to add data samples and obtain the necessary permissions to release all modalities publicly in future work. To ensure reproducibility, we have provided detailed methodological descriptions and will release the code upon acceptance, as mentioned in the abstract. The clarification will be added to Sec. 2.1. (R1, R2) How does each modality data performant? A. While current ablation study focuses on validating the SLE module, we also made a modality ablation to evaluate whether the combination of all three modalities consistently yields the best performance (see Sec. 3.2). A detailed analysis of the individual contribution of each modality will be added in future work and we will clarify this on Sec. 3.2. (R3) Segmentation details and computing device specs A. We used U-Net following nnU-Net’s 3D settings, trained on 63 manually labeled hips, preserving original shape and voxel spacing due to mild anisotropy. The computing device had an Intel Xeon W-2295 @ 3.00GHz CPU and 125 GB RAM. These details will be added to Sec. 3.1. (R2) Std of the metrics. A. The proposed full method achieves stds of 0.087, 0.105, 0.122, 0.068, 0.085, 0.106, and 0.085 for C-Index, AUC(6), …, and iAUC, respectively. For comparison, SurvRNC yields 0.094, 0.086, 0.122, 0.078, 0.104, 0.132, and 0.103. The stds for all metrics will be added to Table 1. (R1) Minor concerns A. We appreciate the pointing out and will add explanations and revise accordingly. [1] Gao, Shihua, et al. “Prediction of femoral head collapse in osteonecrosis using deep learning segmentation and radiomics texture analysis of MRI.” BMC Medical Informatics and Decision Making 24.1 (2024): 320.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The majority of reviewers (#1 and #2) point out that the evaluation is insufficient.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After rebuttal, there are two reject and one accept. After reading the reviews/comments carefully, and reading the paper briefly, although AC somewhat agreed the following that authors could use to improve their work, overall this is an interesting paper with sufficient novel methodology and the problem is quite interesting and impactful. This is not a boring thoroughly evaluated paper. For MICCAI, papers with adequately novel problem/solution should be encouraged in some sense.

    “Given the small number of samples, I originally thought the rebuttal would discuss how the 5-fold experiments were carried out and how robust the results were. However, the authors chose not to address this and instead used the small sample sizes in other studies as an excuse for theirs. They prefer to present this work as a proof of concept to demonstrate the feasibility and value of integrating multimodal data and the SLE module for femoral head collapse prediction. I would like to comment that such a proof of concept is not mature enough for MICCAI.

    Frankly, a sufficiently thorough discussion should be provided when using deep learning methods on extremely small datasets.”



back to top