Abstract

Classical radiomic features (e.g., entropy, energy) have been designed to describe image appearance and intensity patterns. These features are directly interpretable and readily understood by radiologists. Compared with end-to-end deep learning (DL) models, lower dimensional parametric models that use such radiomic features offer enhanced interpretability but lower comparative performance in clinical tasks. In this study, we propose an approach where a standard logistic regression model performance is substantially improved by learning to select radiomic features for individual patients, from a pool of candidate features. This approach has potentials to maintain the interpretability of such approaches while offering comparable performance to DL. In addition, we also propose to expand the feature pool by generating a patient-specific healthy persona via mask-inpainting using a denoising diffusion model trained on healthy subjects. Such a pathology-free baseline feature set allows not only further opportunity in novel feature discovery but also improved condition classification. We demonstrate our method on multiple clinical tasks of classifying general abnormalities, anterior cruciate ligament tears, and meniscus tears. Experimental results demonstrate that our approach achieved comparable or even superior performance than state-of-the-art DL approaches while offering added interpretability through the use of radiomic features extracted from images and supplemented by generating healthy personas. Example clinical cases are discussed in-depth to demonstrate the interpretability-enabled utilities such as human-explainable feature discovery and patient-specific location/view selection. These findings highlight the potentials of the combination of subject-specific feature selection with generative models in augmenting radiomic analysis for more interpretable decision-making. The codes are available at: https://github.com/YaxiiC/RadiomicsPersona.git



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3166_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/YaxiiC/RadiomicsPersona.git

Link to the Dataset(s)

MRNet dataset: https://stanfordmlgroup.github.io/competitions/mrnet/

BibTex

@InProceedings{CheYax_Patientspecific_MICCAI2025,
        author = { Chen, Yaxi and Ni, Simin and Saeed, Shaheer U. and Ivanova, Aleksandra and Hargunani, Rikin and Huang, Jie and Liu, Chaozong and Hu, Yipeng},
        title = { { Patient-specific radiomic feature selection with reconstructed healthy persona of knee MR images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {453 -- 463}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel method that addresses a key clinical challenge: model explainability. The authors propose a deep learning-based radiomic feature selection framework for musculoskeletal disease classification. A feature-weighting neural network (3D-ResNet-18) is used to assign importance scores to radiomic features extracted from both the pathological MRI and a synthesized “healthy persona”—a pathology-free image generated using a DDPM trained on healthy knees. The selected features are then fed into a logistic regression model to perform downstream classification tasks, including detection of general abnormalities, anterior cruciate ligament (ACL) tears, and meniscus tears.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written and addresses the important clinical challenge of explainability by using radiomic features and a transparent classification pipeline (logistic regression), which is more interpretable than typical deep neural networks.
    2. The feature-weighting neural network (3D-ResNet-18) enables dynamic, patient-specific selection of relevant radiomic features, rather than relying on a fixed global set—enhancing per-patient interpretability.
    3. The authors provide qualitative analyses linking selected features (e.g., entropy, compactness) to anatomical structures and MRI views relevant to ACL and meniscus tears, which aligns well with clinical knowledge.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. How realistic are the DDPM-reconstructed healthy personas? Is there any quantitative or qualitative evaluation ? Additionally, what is the computational overhead of generating a healthy persona per scan?
    2. Why was DDPM chosen over GANs for generating healthy personas? Did the authors evaluate or compare reconstruction quality or downstream classification performance between these generative approaches?
    3. Since the DDPM-generated healthy persona is not a real image, what is the clinical or biological validity of the radiomic features extracted from it? Were these features reviewed or validated by clinical experts? Could this synthetic information be misleading or introduce hallucinated patterns into the prediction?
    4. Instead of directly using radiomic features from the pseudo-healthy (synthetic) image, would it be more effective to compute the difference between features from the pathological image and the healthy persona? This contrast could highlight pathology-specific characteristics and potentially improve performance. Was this considered?
    5. While radiomics is interpretable by design, methods like SHAP values provide instance-level feature attribution. Did the authors explore such approaches to enhance the transparency of the model and support more clinically meaningful interpretations?

    6. The authors highlight features like “compactness” and “entropy” as indicators of pathology. How consistently do these correlate with clinical diagnoses across patients, and how robust are these correlations?
    7. Why were second-order texture-based features (e.g., GLCM, GLRLM) excluded from the radiomic pool? These features are commonly used to characterize soft-tissue pathology and could potentially improve both classification accuracy and interpretability.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting, but some important parts are not well explained. The paper uses a fake “healthy” image to get features, but it’s not clear if these features are reliable or clinically meaningful. With better explanation and validation may lead to change my mind.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I agree with the authors’ response to my questions and have no further doubts about the design choices.



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel and interpretable approach that blends classical radiomic features with a neural network-based feature selection strategy and generative modeling to improve musculoskeletal MRI analysis. One of its highlights is the use of a denoising diffusion model to generate a patient-specific “healthy persona”: a synthetic, pathology-free version of the input image. This serves as a personalized reference point for identifying relevant image differences. The model leverages a learned weighting over radiomic features to adaptively select the most informative ones per case, feeding them into a simple logistic regression classifier to maintain transparency. The results are solid, showing competitive or better performance than several state-of-the-art deep learning baselines across tasks like detecting general abnormalities, ACL tears, and meniscal injuries.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    As mentioned above: combining interpretable radiomics with generative modeling and neural network-based feature selection for MSK classification; use of a denoising diffusion probabilistic model (DDPM) to generate a patient-specific “healthy persona”; the downstream use of logistic regression maintains interpretability while still achieving comparable classification performance to deep learning models, especially for meniscus tear detection.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    I have some concerns about generalizability as the paper lacks external validation beyond the MRNet dataset. The healthy persona is a great idea, but it’s trained on a small dataset and I’m also wondering about realism/anatomical fidelity. Also recent state-of-the-art models, including transformer-based approaches, are not compared (eg. https://doi.org/10.1007/978-3-031-72086-4_6 ).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper is good - it combines a novel and well-motivated integration of a generative model with patient-specific radiomic feature selection, and provides competitive performance with state of the art in addition to interpretability. But the lack of external validation and limited comparison to alternative interpretable methods makes me a bit concerned about generalisability.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I think this paper should be accepted.



Review #3

  • Please describe the contribution of the paper

    The paper introduces an end-to-end framework for knee MRI analysis that integrates radiomic features with generative models; by creating a healthy persona using denoising diffusion probabilistic model (DDPM), to improve the accuracy and interpretability of musculoskeletal disease classification.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Incorporating state of the art diffusion architectures; DDPMs in generating healthy personas for a pathology free radiomic feature extraction.
    • Interest in interpretability of classification results by combining generative models with logistic regression, balancing interpretability with predictive power.
    • Comparable results to deep learning based conventional methods, with significant p-values in detecting anterior cruciate ligament (ACL) and meniscal tears.
    • Features extracted align closely with clinical observations and expectations, improving trust in the system’s diagnostic accuracy.
    • Extensive ablation study on the integrated components and their importance, also with feature selection threshold
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Potential biases in weighting specific radiomic features, compromising the robustness of the healthy persona over atypical cases.
    • Lack of in-depth evaluation of the generated healthy personas, DDPM being an important part of the framework it is necessary to ensure its effectiveness.
    • Patient-specific feature selection and persona generation increase computational complexity, which may limit its real context applicability.
    • Unjustified choice of the bounding box depths (50%,30%,50%) from the original image to train the DDPM.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    An interesting approach in combining machine learning techniques with DDMPs, one of the latest image generation architectures, in improving disease classification.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Interesting simple but generalizable approach with extensive study and analysis.




Author Feedback

  1. Evaluation of DDPM and persona (Comment 1-3, R#3; Comment 1-2, R#4) We appreciate R#3’s openness to supporting our work. To clarify, the healthy persona was validated through the downstream classification tasks (Table.1), where its inclusion consistently improved performance. Additional qualities of the persona, e.g. realism and reconstruction fidelity, were assessed by quantitative metrics (MSE, SSIM, PSNR) and qualitatively by two radiologists during our experiments. These confirmed superior image generation compared to both GANs and a ViT-based autoencoder. While we acknowledge that realism may not directly correlate with downstream performance, we agree with R#3 that these results are valuable and merit reporting. In addition, the risk of synthetic (persona) features introducing misleading information is indeed interesting to explore, but we did not observe such effects in our study. These findings will be summarised in the revised paper, with additional visual examples in our open-access repository.

  2. Patient-specific feature selection (Comment 4-8, R#3) The radiomics features in this work are conditioned on both the pathological image and generated persona. Thus, potential difference between the two is expected to be captured by the simultaneously trained linear classifier – if that difference contributes to downstream classification.
    We evaluated the reliability and consistency of feature extraction via the classification performance. Notably, feature extraction and selection in our framework is patient-specific, as predicted by the trained CNNs. Defining a patient-agnostic consistency measure remains challenging due to case-by-case variability, though we welcome alternative suggestions for future work. Our candidate radiomic feature set includes simple, interpretable first-order and shape-based radiomic features, which already outperform several state-of-the-art alternatives. Incorporating higher-order and computationally more expensive radiomic features could yield further improvements while preserving interpretability, an avenue worth future investigation, as suggested by the reviewer. Interestingly, the linear classifier serves a role that is equivalent to SHAP (as noted by the reviewer), assigning attribution scores to the patient-specific CNN-predicted features, with the difference being it is done during end-to-end training. This offers an inherent interpretability to the proposed approach, enabling transparent, per-patient explanations once training is complete.

  3. Computational cost: (Comment 1, R#3; Comment 3, R#4) We confirm that the computational cost remains practical (around 10 seconds per patient) in this radiological application, which will be reported in the revised paper, with further details available in our open-source repository.

  4. Generalizability (R#1) The proposed method was intentionally designed with simple linear classifier for small datasets and more robust generalisability. While the training set size is moderate, the test set is sufficiently large to provide statistically significant difference to the tested alternative approaches and for our ablation studies (Table.1). We agree that further external validation shall further strengthen our findings in future work. Additionally, we will also reference and discuss the relevant transformer-based methods, and include the suggested work in the revised paper.

  5. Region of Interests: (Comment 4, R#4) The bounding box dimensions were empirically defined by radiologists to cover the sufficient pathologically interesting regions of the keen injuries; alternative settings (0.30.30.3 and 0.50.50.5) led to either poorer reconstruction quality or reduced downstream task performance, which can readily be re-produced using our openly-accessible code.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Post-rebuttal decisions of the reviewers are a clear and unanimous accept. Congratulations!



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top