List of Papers Browse by Subject Areas Author List
Abstract
Automatic disease image grading is a significant application of artificial intelligence for healthcare, enabling faster and more accurate patient assessments.
However, domain shifts, which is exacerbated by data imbalance, introduces bias into the model, posing deployment difficulties in clinical applications.
To address the problem, we propose a novel \textbf{U}ncertainty-aware \textbf{M}ulti-experts \textbf{K}nowledge \textbf{D}istillation (UMKD) framework to transfer knowledge from multiple expert models to a single student model.
Specifically, to extract discriminative features, UMKD decouples task-agnostic and task-specific features with shallow and compact feature alignment in the feature space.
At the output space, an uncertainty-aware decoupled distillation (UDD) mechanism dynamically adjusts knowledge transfer weights based on expert model uncertainties, ensuring robust and reliable distillation.
Additionally, UMKD also tackles the problems of model architecture heterogeneity and distribution discrepancies between source and target domains, which are inadequately tackled by previous KD approaches.
Extensive experiments on histology prostate grading (\textit{SICAPv2}) and fundus image grading (\textit{APTOS}) demonstrate that UMKD achieves a new state-of-the-art in both source-imbalanced and target-imbalanced scenarios, offering a robust and practical solution for real-world disease image grading.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3380_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/aTongs1/UMKD
Link to the Dataset(s)
N/A
BibTex
@InProceedings{TonShu_UncertaintyAware_MICCAI2025,
author = { Tong, Shuo and Gao, Shangde and Liu, Ke and Huang, Zihang and Xu, Hongxia and Ying, Haochao and Wu, Jian},
title = { { Uncertainty-Aware Multi-Expert Knowledge Distillation for Imbalanced Disease Grading } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15972},
month = {September},
page = {624 -- 634}
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors propose an Uncertainty-aware Multi-Expert Knowledge Distillation (UMKD) framework to address class imbalance in disease image grading - an area where traditional MKD approaches remain underexplored. To preserve structural and semantic representations, they introduce two feature alignment strategies: Shallow Feature Alignment (SFA), using multi-scale low-pass filtering to retain task-agnostic structural details, and Compact Feature Alignment (CFA), which projects features onto a shared spherical space for effective knowledge transfer. Additionally, they design an Uncertainty-aware Decoupled Distillation (UDD) module that leverages uncertainty metrics to adaptively weight expert contributions, mitigating bias from imbalanced classes and enhancing the robustness of the student model.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
To preserve both structural and semantic information in images, the authors have thoughtfully designed shallow feature alignment and compact feature alignment mechanisms. To mitigate expert bias, they introduced an uncertainty-aware decoupled distillation strategy and carefully integrated all loss components into the training framework. Each module is well-explained with adequate technical detail, and the effectiveness of the proposed components is thoroughly validated through comprehensive ablation studies.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The state-of-the-art method used for comparison is relatively outdated. Including a discussion on more recent approaches would offer better context and highlight how the proposed method aligns with current advancements in medical image analysis.
- Moreover, the paper lacks qualitative analysis - such as t-SNE visualizations or uncertainty maps - which could offer valuable insights into model behavior, feature representation, and reliability, thereby strengthening the overall evaluation
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The major strengths of the paper lie in its thoughtful design to address the class imbalance and preserve critical image features in disease image grading.
- The methodology is supported by comprehensive technical explanations and thorough ablation studies that validate the effectiveness of individual components.
- However, the comparative analysis is limited to relatively outdated state-of-the-art methods, which does not fully highlight the framework’s competitiveness against more recent advancements in the field. Additionally, there is a lack of qualitative analysis which could provide deeper insights into feature separability, model confidence, and interpretability.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
They have addressed the concerns we have asked for. They have updated recent comparison results as well.
Review #2
- Please describe the contribution of the paper
This paper proposes an Uncertainty-Aware Multi-Expert Knowledge Distillation (UMKD) framework to address class imbalance in medical image disease grading. UMKD decouples task-agnostic and task-specific features via Shallow Feature Alignment (SFA) (using frequency-domain multi-scale low-pass filtering) and Compact Feature Alignment (CFA) (spherical space projection). It introduces an Uncertainty-aware Decoupled Distillation (UDD) mechanism to dynamically adjust knowledge transfer weights based on expert prediction confidence, mitigating bias propagation. Experiments on prostate cancer (SICAPv2) and diabetic retinopathy (APTOS) datasets demonstrate state-of-the-art (SOTA) performance in both source-imbalanced and target-imbalanced scenarios.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1) Tackles the critical challenge of data imbalance in medical imaging, offering practical value for real-world clinical deployment.
2) Validates effectiveness on two medical tasks (source/target imbalance) and provides ablation studies to demonstrate module contributions.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) Core components (e.g., low-pass filtering in SFA, uncertainty weighting in UDD) resemble existing techniques in natural image tasks. Insufficient justification for medical-specific innovation.
2) The “learnable low-pass filter” in SFA lacks theoretical grounding. No analysis of how high-frequency information loss impacts pathological details (e.g., microaneurysms).
3) UDD’s uncertainty metric (based on prediction confidence) ignores medical annotation uncertainty (e.g., inter-observer variability), potentially misrepresenting model reliability.
4) Uses only random sampling for data balancing, neglecting validation of resampling or synthetic data methods.
5) Lacks interpretability analysis to link learned features to pathological biomarkers.
6) Metrics (OA, F1) are disconnected from clinical gold standards (e.g., pathological staging consistency), raising doubts about real-world utility.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While UMKD demonstrates methodological rigor and empirical improvements, its limited novelty, technical ambiguities, and weak clinical validation undermine its significance. Major revisions should address: 1) Deeper medical-specific innovation (e.g., integrating domain knowledge or clinical uncertainty). 2) Comparisons with state-of-the-art medical distillation methods. 3) Enhanced interpretability and clinical correlation (e.g., case studies with clinician feedback).
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The author’s response did not address my concerns and I believe the current paper still needs to be revised.
Review #3
- Please describe the contribution of the paper
Propose a method using uncertainty and knowledge distilation to improve image disease grading in light of data imbalances. The results indicate a robust solution on two medical imaging histopathology datasets, histology prostate grading (SICAPv2) and fundus image grading (APTOS).
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The use of uncetainty beyond just predicting it but using it as an additional layer to Knowledge Distiliation is a nice approach and searching for similar papers I did not see anything else similar in approach for these datasets. The problem of matching experts and AI models is an ongoing problem towards obtaining trust in these AI models capabilities so it a well presented idea with interesting outcomes.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
I would have liked to see the results being tested on 5 or more random seed initialization, if possible 3 more a the very least for a final paper to ensure reliability of outcomes. You include uncertainty but all metrics only look at accuracy based outcomes, perhaps there is an appropriate metric you can include in your tables, perhaps entropy, MI or mutual information or perhaps KL divergence is appropriate, but I feel this could be a nice inclusion.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
I hope that the authors provide a repository of code for others to reproduce and potentially test the method on other datatsets.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I think it is a well structured and thought out paper, the idea to include more to the typical approach of knowledge distillation, the experimental analysis performed and the encouraging results obtained make the paper worth accepting. I would like to see that potentially 3 random seed intialisations for the models and amending results accordingly.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all three reviewers for their constructive comments. We have carefully addressed all their concerns, as shown below. We sincerely hope our responses fully address their questions.
To Reviewers 2 & 3: Comparison with SOTA methods. Thank you for your comment. Following your suggestions, we clarified the required results with Style-KD [1] and TSS-KD [2] on the prostate cancer grading dataset, highlighted in gray in Fig. 1. UMKD consistently exceeds all SOTAs.
To Reviewer 3: Lack of interpretability and clinical correlation. Thank you for your comment. Uncertainty quantification is a method of Explainable AI (XAI) [3]. It is visually demonstrated through layered HeatMaps Fig. 2, where shallow features (SFA) capture structural information, while deep features (CFA) localize disease regions, validating the necessity of feature decomposition. Second, the t-SNE visualization Fig. 3 confirms our UMKD’s effectiveness by showing intra-class cohesion and inter-class separation, aligning with quantitative results. Regarding the clinical correlation, we are concerned with disease grading in this research, which is itself a standardized clinical measure in diagnosis and its predictions are evaluated using well-organized metrics (OA, F1). Staging is a different pathology task, which is not within the scope of this study.
To Reviewer 1: Results under random seed initialization and uncertainty metrics. Thank you for your comment. Following your suggestions,we clarified the required results with five random seeds to validate the robustness of our method. As shown in Fig. 1, UMKD achieves both the highest precision and lowest variance among the compared methods. Furthermore, we clarified the required results under mutual information (MI) and entropy (EN). Our UMKD is the SOTA with minimum entropy and maximum mutual information in both source- and target-imbalanced distillation settings.
To Reviewer 3: Novelty and theoretical analysis. Thank you for your comment. We reclarify the novelty of our method, which is the first work to address knowledge bias correction through uncertainty-aware decoupled distillation, as also acknowledged by Reviewers 1 and 2. Unlike conventional uncertainty methods that focus on output prediction, our proposed UDD explicitly mitigates model knowledge bias caused by data imbalance. The proposed Shallow-Feature Alignment (SFA) module, implemented by learnable low-pass filtering, is based on image decomposition theory [4,5], which decomposes an image into structural and disease information for medical image analysis. However, the theoretical study falls outside the scope of this work.
Inter-observer variation. Thank you for your comment. We recognize that multi-expert annotation is an open challenge. However, our UMKD is done on well-recognized grading benchmarks and settings, in which each annotation is considered as the gold standard.
Neglect validation of resampling or synthetic data methods. Thank you for your comment. We recognize resampling and synthetic data as different methods to deal with unbalanced data. Unlike them, we focus on uncertainty-aware model knowledge bias correction for imbalanced disease grading.
[1] Be your own doctor: Temperature scaling self-knowledge distillation for medical image classification. Neurocomputing, 2025. [2] Style-KD: Class-imbalanced medical image classification via style knowledge distillation. BSPC, 2024. [3] A review of uncertainty quantification in medical image analysis: Probabilistic and non-probabilistic methods. MedIA, 2023. [4] BayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability. MedIA, 2023. [5]InDeed: Interpretable image deep decomposition with guaranteed generalizability. pre-print, 2025.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
Many concers presneted by reviewers are not addressed well. The work lacks the deeper medical-specific innovation as well as the comparisons with latest medical distillation methods.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper introduces an uncertainty-aware multi-expert knowledge distillation method to address the common issue of class imbalance in disease image grading tasks. In their rebuttal, the authors thoroughly clarified the key concerns raised by the reviewers, incorporated comparisons with newly relevant methods, and added experiments with multiple random seed initializations to further demonstrate the robustness of their approach. However, it should be noted that the conference guidelines explicitly prohibit the inclusion of external resource links, and such content should be removed from the final version. In the latest review round, one of the reviewers who initially held a “weak reject” stance has revised their decision to “accept.” The authors have also provided detailed responses to Reviewer 3’s comments regarding clinical relevance and theoretical justification, emphasizing that the primary contribution of the paper lies in its methodological innovation. Overall, the majority of reviewers are inclined to accept the paper. The authors have shown a proactive attitude in addressing concerns and have significantly improved the quality of the paper. Nevertheless, it is recommended that the authors continue to consider the reviewers’ suggestions and pursue deeper investigations in future work to enhance the applicability and clinical potential of their proposed method.