List of Papers Browse by Subject Areas Author List
Abstract
Pelvic Lipomatosis (PL) is a rare disorder characterized by abnormal fat proliferation in the pelvic region, where subtle imaging differences between pathological and normal fat pose significant diagnostic challenges. Existing deep-learning-based computer-aided diagnosis methods struggle to integrate high-level clinical semantics, which limits the diagnosis accuracy. This paper proposes a novel Evidential Deep Learning (EDL) method that synergistically fuses multi-type semantic radiomics priors derived from clinical expertise to enhance PL diagnosis. First, referring to clinical experiences, the critical PL semantic radiomics including bladder-rectal fat distance, rectal circularity, bladder-seminal vesicle angle, and relative pelvic fat volume are extracted from 3D abdominal CT images. Second, these semantic radiomics are probabilistically formulated as prior evidences to quantify their diagnostic relevance. Finally, the prior evidences are fused into the EDL backbone to implement PL diagnosis. Comparing with the pure deep learning methods, the EDL method with prior evidences not only reduces overconfident predictions but also enables interpretable decision-making by involving clinical knowledge. Experiments demonstrate the state-of-the-art performance of the proposed method, which achieves great improvements over conventional deep learning baselines. Ablation studies also validate the necessity of integrating the semantic features. Theoretical proofs further confirm that clinically consistent priors minimize prediction loss and enhance model stability. This work advances the diagnosis by bridging clinical radiomics with data-driven deep learning and provides a paradigm for interpretable PL medical image analysis.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2437_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{ZhaZhe_Integrate_MICCAI2025,
author = { Zhang, Zheran and Yue, Xiaodong and Wang, Maoyu and Xu, Zhikang and Chen, Yufei and Wei, Zhipeng},
title = { { Integrate Semantic Radiomics as Prior Evidence into Evidential Deep Learning for Pelvic Lipomatosis Diagnosis } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15974},
month = {September},
page = {273 -- 282}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper introduces a method for the detection of Pelvic Lipomatosis which leverages relevant semantic radiomics extracted from 3D CT volumes. The key contribution is the use of these semantic radiomics in forming prior evidence (as probability distributions over the features) which is combined with deep learning-based classifier.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The idea to use automatically extracted semantic radiomics to reduce the amount of labelled data required to train a classification model for PL is novel and interesting. The ablation study on the impact of the different radiomic features is comprehensive. The introduction is well-written and clearly explains the problem that the paper aims to address.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
No significance testing was reported, making the comparison of the accuracy of the proposed method to other models difficult to interpret. For example, the proposed method achieves an accuracy of 83.5% with a standard deviation of 6.9% and Logistic Regression achieves an accuracy of 81.6% with a standard deviation of 7.1%. Without significance testing, it is not possible to say that one model performed better than another. This puts into question the claim that “experiments demonstrate the state-of-the-art performance of the model”.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(2) Reject — should be rejected, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The utility of the method compared to other existing methods is put into question by the lack of significance testing.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
The paper presents two main contributions to enhance Pelvic Lipomatosis (PL) diagnosis
-
Construction of semantic radiomics based on clinical domain knowledge of PL. These radiomics include bladder-rectal fat distance, rectal circularity, bladder-seminal vesicle angle, and relative pelvic fat volume. These features are designed to capture fat-induced compression and displacement of pelvic organs.
-
Development of a strategy to integrate clinical semantic radiomics into a Deep Neural Network (DNN) model. The extracted radiomics are transformed into probabilistic distributions and then fused as prior evidence into an Evidential Deep Learning (EDL) backbone model to improve PL diagnosis.
These contributions aim to address the challenges of PL diagnosis, particularly in the context of limited dataset availability, by incorporating clinical expertise into the diagnostic process.
-
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper argues that PL diagnosis is challenging due to its rarity, data scarcity, and similarity to other fat-related conditions. Traditional Deep Neural Networks (DNNs) struggle with these challenges and the contribution could lead the way into methods that integrate domain knowledge in medical image computing models, as current knowledge-integrated methods often rely on shallow fusion strategies, which are insufficient for the complexity of PL diagnosis
The proposed approach aims at maintaining diagnostic reliability even with limited data, unlike pure deep learning methods. This is a particularly important topic for many medical applications, and rare diseases. This approach could potentially be applied to other rare conditions, enabling more accurate diagnoses even with small datasets
By integrating clinical knowledge in the form of semantic radiomics into deep learning models, the method creates a synergy between human expertise and machine learning. This integration could lead to more clinically relevant and interpretable AI systems in healthcare
By incorporating clinically relevant features, the model’s decision-making process becomes more transparent and aligned with clinical reasoning. This increased interpretability could foster greater trust and adoption of AI systems in clinical settings
The proposed method achieved state-of-the-art performance, significantly outperforming conventional deep learning baselines and other comparative methods.
The ablation studies demonstrated that fusing all four semantic radiomics (bladder-rectal fat distance, rectal circularity, bladder-seminal vesicle angle, and relative pelvic fat volume) yielded the optimal accuracy of 87.5%, surpassing combinations with fewer features. The proposed method showed improvements in correctly classifying confusing cases that pure image-driven models misclassified, such as high-fat cases without deformation and low-fat cases with deformation
The integration of semantic radiomics as prior knowledge consistently improved accuracy across different EDL-based models (Evi-ResNet, Evi-DenseNet, Evi-ViT
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper doesn’t mention any external validation on a separate dataset from a different institution. This absence limits the assessment of the model’s generalizability across different patient populations and imaging protocols.
The method relies on multi-organ segmentation results to extract semantic radiomics. Any errors in the segmentation process could propagate through the system, affecting the final diagnosis
Although the authors claim improved interpretability, the paper doesn’t provide detailed examples of how the model’s decisions can be interpreted in a clinically meaningful way.
The dataset only includes male patients due to the male predominance of PL. This could limit the model’s applicability to female patients with PL
The study uses a relatively small dataset from a single institution, which might limit its generalizability to other clinical environments.
A method for interpreting and presenting the model’s output in a clinically meaningful way would be necessary. This might include visualizations of the semantic radiomics and their contributions to the diagnosis.
The authors do not mention how to deal with cascading errors: Inaccuracies in segmentation could lead to errors in the extracted radiomics, which in turn will affect the prior evidence fed into the Evidential Deep Learning model.
Poor segmentation could lead to incorrect measurements, potentially causing misclassification of cases. For example, overestimation of fat volume might lead to false positives for PL. Is there any way to introduce fairness approaches into the proposed methodology? Since the semantic radiomics are meant to align with clinical expertise, inaccurate segmentation could lead to results that don’t match clinical observations, potentially reducing trust in the system.
It’s important to note that the study’s dataset is relatively small (126 CT images) and from a single institution, which may limit its representation of population-wide variations. The authors don’t explicitly discuss how their method might account for or be affected by population-based variations in organ characteristics
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The document does not provide any direct comparison between the performance of the proposed method and human expert diagnosis of Pelvic Lipomatosis (PL). The study focuses on comparing the proposed method with other computational approaches, but does not include a comparison with human expert performance.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is very interesting, but is a bit lacking in the following aspects
It’s a single center study. It could be interesting to perform experiments in an out-of-distribution setting Potential bias in semantic radiomics selection: The choice of semantic radiomics is based on clinical expertise, but there might be other relevant features that were not considered. No external validation or comparison against expert clinicians is performed.
Otherwise, methodologically is a very strong contribution to MICCAI.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
This work is very novel and it could find a large audience at MICCAI. The authors have addressed properly all the points I raised in my review. Also, a point I overlook (significance testing) was properly answered.
Review #3
- Please describe the contribution of the paper
This paper presents a novel approach that integrates clinically derived semantic radiomics as morphological prios into an evidentail Deep Neural Network (EDL) framework to enhance the diagnosis of Pelvic Lipomatosis (PL). The authors address the challenge of distinguishing subtle pathological fat proliferation in PL from normal or obesity-related fat by leveraging domain-specific radiomic feature. These features are probabilistically encoded as prior evidence, which are fused into the EDL backbone through a Bayesian framework to guide model predictions. This integration not only mitigates overconfidence in traditional DNNs but also aligns the model’s decision-making with clinical diagnostic standards, improving interpretability.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper’s major strengths lie in its innovative methodology and robust validation, particularly in its novel use of radiomics as prior evidence within an Evidential Deep Learning (EDL) framework. Unlike conventional approaches that treat radiomics as parallel inputs or shallow features, the authors encode clinical semantic radiomics into probabilistic prior distributions. This integration ensures that clinical knowledge fundamentally guides the model’s reasoning, avoiding the limitations of ad-hoc feature concatenation. Additionally, traditional Deep Neural Networks can produce overconfident predictions due to softmax’s exponential scaling. By utilizing EDL, which models class probabilities as distributions and incorporates clinically informed priors, the framework effectively quantifies uncertainty and diminishes unwarranted confidence in scenarios of data scarcity, such as the case of Pelvic Lipomatosis.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While this work brings an interesting and novel method for Pelvic Lipomatosis diagnosis, it presents notable weaknesses. First, regarding paper readability, while equations are necessary for technical rigor, excessive or overly complex equations such as eq 3, 5, and 6 might be alienating, especially when approaching clinicians and non-specialists. These equations could benefit from intuitive explanations or visualizations to make them more accessible. Additionally, the connection between semantic radiomics and their probabilistic formulation is under-explained. A diagram or simplified workflow illustrating how clinical features translate into a+,a−a+,a− priors would enhance clarity. Another aspect is computational complexity. The paper uses a Vision Transformer (ViT) backbone with 3D patches. Training such models, especially on 3D medical images, requires significant computational resources. The authors mention downsampling inputs due to GPU constraints, which might affect the model’s ability to capture fine-grained details. This trade-off between computational feasibility and model performance could be further explored. Additionally, while the ablation studies show the importance of each radiomic feature, there is no analysis of potential interactions between features. Understanding whether some features are redundant or how they complement each other could provide deeper insights into the model’s decision-making process.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper presents a novel methodology that bridges clinical domain knowledge with deep learning. The approach addresses the challenge of diagnosing Pelvic Lipomatosis (PL), where subtle morphological changes and data scarcity limit pure data-driven methods. The experiments show that it is technically strong. The3 ablation studies validate the necessity of integrating all four semantic radiomics, while comparisons against diverse baselines demonstrate state-of-the-art performance. Though the paper has limitations (e.g., technical complexity, small dataset), these are outweighed by its methodological novelty and practical impact. The framework’s ability to reduce overconfidence, enhance interpretability, and maintain robustness under data scarcity offers a replicable paradigm for rare disease diagnosis.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
Considering the rebuttal and that one reviewer made an accept and another one a reject, I change mydecision to (weak) reject.
Author Feedback
R#1 Q1: Limitation of model’s generalizability. R1: Limited Data Availability: We have collected all available data to the best of our ability. 126 cases curated from an initial 130 candidates through our partner hospital over 8 years. Generalizability: We fully agree on the importance of external validation for clinical applicability. To address single-center data scarcity, we employed 5-fold cross-validation, enhancing result credibility, with our method outperforming baselines in all folds. Gender Bias: We followed clinical experts’ advice to focus on male subjects and corroborated this choice with[24], which shows PL primarily affects men. Q2: Segmentation error affects the final diagnosis. R2: PL diagnosis prioritizes pelvic organ spatial configuration and morphology, where minor segmentation errors minimally impact analysis. After segmentation, all cases passed clinician review, confirming clinical requirements. Q3: Lack of clinical interpretability. R3: Our framework naturally supports this by reporting, alongside each diagnosis, the four semantic radiomics that drove the decision. These radiomics, originally proposed by clinicians, demonstrate clear diagnostic relevance. Q4: Lack of comparison with human experts. R4: Diagnostic labels were assigned by expert consensus as no standardized PL criteria exist, and we will continuously compare future model outputs with expert assessments. Q5: Other potentially relevant features may have been overlooked. R5: Conventional radiomics miss PL’s spatial-deformative hallmarks. Our 4 clinician-defined semantic radiomics target these features, providing superior clinical relevance over traditional measures.
R#3 Q1: Overly complex formulas compromise readability. R1: Inspired by EDL[20], we begin with the Bayesian formulation(Eq. 2) to replace point estimates with distributions, enhancing robustness. Using conjugate priors, Eq. 3 then combines semantic radiomics as the prior and image evidence as the likelihood to yield the posterior. Based on this, Eq.5 extends cross-entropy for probabilistic outputs, and Eq.6 adds KL regularization against overfitting. Q2: Computational Complexity R2: PL diagnosis relies on pelvic organ spatial relationships and deformation. We apply appropriate downsampling to maintain accuracy while preserving global morphological feature capture. Q3: Lack of feature interaction analysis. R3: All 4 semantic radiomics originate from clinicians’ expertise. We plotted each feature’s distribution by class and confirmed that every feature significantly discriminates PL from controls. Due to diverse patient-specific feature combinations, we retain all features as inputs and will explore their interrelationships in future work.
R#4 Q: Lack of significance testing. R: Due to space constraints, detailed significance testing results and the full experimental outcomes have been omitted. First, we rigorously calculated paired t-tests based on five-fold cross-validation splits. P-value=2.04e-05 (p<0.01), which validates that our method achieves statistically significant improvements over all baselines. Furthermore, the proposed framework attains a mean accuracy of 83.5% through 5-fold cross-validation, outperforming the second-best ShallowNN (81.57%) by +1.93%. Critically, this superiority is consistent across every fold. For instance, vs. LR, we see improvements of +2.23%(85.27% vs. 87.50%)+0.87%(79.13% vs. 80.00%).+1.00%(83.00% vs. 84.00%),+1.41%(90.59% vs. 92.00%), and +4.53%(69.54% vs. 74.07%) across fold 1-5, respectively. Beyond accuracy, our approach emphasizes enhancing model interpretability and predictive confidence by integrating high level semantic priors, a capability that is vital for medical diagnosis and decision making and arguably even more important than precision. Based on the analysis above, we believe that the experimental results are convincing to vary the effectiveness of the proposed method.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes an evidential deep learning framework that integrates semantic radiomics as prior information for diagnosing pelvic lipomatosis. Reviewer 1 supports acceptance, highlighting the novelty, clarity of the methodology, and the authors’ satisfactory responses to all raised concerns. Reviewer 2 ultimately leans toward rejection, largely due to the initially perceived lack of significance testing, which the authors addressed convincingly in the rebuttal. Reviewer 4 also cited missing statistical validation, but did not provide further comments following clarification. The rebuttal supplies paired t-tests across all five folds, confirming statistically significant performance gains. Furthermore, the method combines interpretability and predictive robustness by grounding model decisions in clinically defined semantic radiomics. While the dataset size is limited and single-centre, the paper provides thoughtful mitigation via cross-validation and transparent reporting. The proposed framework is well-motivated, methodologically sound, and clinically relevant.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper presents a novel and clinically meaningful approach to diagnosing Pelvic Lipomatosis (PL) by integrating semantic radiomics—interpretable, clinically relevant features—with deep learning models. The method addresses a critical challenge in medical AI: maintaining diagnostic reliability in the context of limited data, as is common with rare diseases.
By fusing clinical domain knowledge with data-driven learning, the proposed framework enhances both performance and interpretability. Ablation studies further validate the necessity of each component, and the method demonstrates improved transparency and alignment with clinical reasoning.
Despite a small dataset and some technical complexity, the paper’s methodological innovation, strong empirical results, and real-world relevance outweigh its limitations. The proposed framework represents a valuable step forward in developing interpretable and trustworthy AI tools for rare disease diagnosis.
I recommend acceptance.