List of Papers Browse by Subject Areas Author List
Abstract
Immunohistochemical (IHC) biomarker prediction benefits greatly from multi-modal data fusion analysis. However, the simultaneous acquisition of multi-modal data, such as genomic and pathological information, is often challenging due to cost or technical limitations. To address this challenge, we propose an online distillation approach based on Multi-modal Knowledge Decomposition (MKD) to enhance IHC biomarker prediction in haematoxylin and eosin (H&E) stained histopathology images. This method leverages paired genomic-pathology data during training while enabling inference using either pathology slides alone or both modalities. Two teacher and one student models are developed to extract modality-specific and modality-general features by minimizing the MKD loss. To maintain the internal structural relationships between samples, Similarity-preserving Knowledge Distillation (SKD) is applied. Additionally, Collaborative Learning for Online Distillation (CLOD) facilitates mutual learning between teacher and student models, encouraging diverse and complementary learning dynamics. Experiments on the TCGA-BRCA and in-house QHSU datasets demonstrate that our approach achieves superior performance in IHC biomarker prediction using uni-modal data. Our code is available at https://github.com/qiyuanzz/MICCAI2025_MKD.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0139_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/qiyuanzz/MICCAI2025_MKD
Link to the Dataset(s)
N/A
BibTex
@InProceedings{ZhaQib_Multimodal_MICCAI2025,
author = { Zhang, Qibin and Hao, Xinyu and Chen, Qiao and Xu, Rui and Cong, Fengyu and Lu, Cheng and Xu, Hongming},
title = { { Multi-modal Knowledge Decomposition based Online Distillation for Biomarker Prediction in Breast Cancer Histopathology } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15974},
month = {September},
page = {357 -- 367}
}
Reviews
Review #1
- Please describe the contribution of the paper
This work proposes an online distillation framework that enhances immunohistochemistry (IHC) biomarker prediction from H&E-stained images by leveraging paired genomic and histopathological data during training, while supporting flexible unimodal or multimodal inference at test time.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The study offers a versatile and efficient online distillation system that supports inference with either unimodal or multimodal data for IHC biomarker prediction. The method enables informative and complementary feature learning across modalities by combining MKD, SKD, and CLOD modules. Results from experiments demonstrate the robustness and generalizability of the approach by demonstrating strong performance on both public and private datasets.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
It remains unclear if the pathology-only modality received any multimodal training. Since such leakage could artificially inflate performance and jeopardize the validity of the unimodal evaluation, the authors should specifically state if RNA sequencing data may have unintentionally influenced the pathology-only model.
-
The separate contributions of the SKD and CLOD modules are not separated out in the ablation investigation. The empirical validation would be strengthened and the architectural decisions would be justified with a more detailed analysis that separates the impact of each component.
-
Despite supporting multimodal learning, the suggested framework only slightly improves performance when compared to current state-of-the-art techniques. This demands for additional justification or improvement of the technique and undermines the main argument about the advantages of incorporating new modalities.
-
The experimental review does not include comparisons with current pertinent approaches that also address multimodal fusion for biomarker prediction, like MOTCat [1] and PIBD [2]. Such baselines would give a more thorough evaluation and more clearly place the method’s contributions in context.
[1] Xu Y, Chen H. Multimodal optimal transport-based co-attention transformer with global structure consistency for survival prediction[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 21241-21251. [2] Zhang Y, Xu Y, Chen J, et al. Prototypical information bottlenecking and disentangling for multimodal cancer survival prediction[J]. arXiv preprint arXiv:2401.01646, 2024.
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The same as weakness.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method exhibits only marginal improvements over state-of-the-art approaches in the multimodal setting, thereby undermining the claimed advantage of leveraging additional modalities.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper proposes an online distillation method based on Multimodal Knowledge Decomposition (MKD), aimed at enhancing the prediction of IHC biomarkers in single-modality pathology slides under missing modality conditions. The innovation of the method lies in using multimodal data (pathology slides and genomic data) during the training phase, with the MKD module enhancing the transferability of modality-agnostic decision features in the teacher model. The Similarity-Preserving Knowledge Distillation (SKD) and Collaborative Online Distillation (CLOD) mechanisms facilitate mutual learning between the teacher and student models, significantly improving the student model’s pathology unimodal reasoning performance under missing genomic modality conditions.
Specifically, the core contribution of the paper is the introduction of the MKD module, which decouples the features in multimodal data into pathology-specific, genomics-specific, and modality-agnostic components. This decoupling enhances the model’s ability to perceive key features. Through this structured knowledge decomposition, the teacher model effectively transfers modality-agnostic decision features to the student model. Moreover, the SKD and CLOD mechanisms further enhance the effective use of multimodal data. The SKD mechanism guides the training of the student model, SP, while the CLOD mechanism facilitates mutual learning between the teacher and student models, appropriately capturing the relative relationships between samples. These modules improve pathology unimodal reasoning performance under missing genomic modality conditions.
In the experimental section, the authors validate the method on the TCGA-BRCA and their own dataset. The results show that the distilled model outperforms existing state-of-the-art methods in pathology slide unimodal reasoning and exhibits excellent performance in predicting ER and PR biomarkers, demonstrating the positive impact of multimodal data-assisted training on reasoning performance. Ablation experiments further validate the significant contribution of the proposed modules to model performance, proving that MKD, SKD, and CLOD significantly improve model performance when used together.
Finally, the proposed method has high flexibility and can adapt to situations where genomic modality data is missing in breast cancer biomarker prediction. The model’s inference stage supports single-modality pathology slides and demonstrates good reasoning performance, offering strong clinical applicability and practical value.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
Innovation: The paper introduces a Multi-modal Knowledge Decomposition (MKD) strategy, which decouples features into pathology-specific, genomics-specific, and modality-general components, effectively enhancing the transferability of modality-general decision features. It introduces Similarity-preserving Knowledge Distillation (SKD) and Collaborative Learning for Online Distillation (CLOD) mechanisms, promoting mutual learning between the teacher and student models and enhancing the student’s ability to infer from pathology slides alone when genomic modalities are missing.
-
Methodology Design: Multi-modal fusion is achieved through structural orthogonal loss (L_OR), CORAL alignment, and KL divergence, ensuring both the independence and consistency of the learned representations. The online collaborative training framework avoids the reliance on static teacher models typically seen in traditional knowledge distillation, improving training efficiency and adaptability.
-
Comprehensive Experimental Validation: Extensive experiments were conducted on the TCGA-BRCA public dataset and a large private dataset, covering internal comparisons, external generalization, and ablation studies. The results are convincing and robust. The ablation study demonstrates the contribution of the MKD, SKD, and CLOD modules to performance, enhancing the interpretability of the method.
-
High Flexibility: The method design supports either pathology slides alone or a combination of pathology slides and genomic data, demonstrating good clinical adaptability. It particularly enhances the inference performance of pathology slides, even in cases of “modal absence”.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
Insufficient Support for the Conclusion of “Excellent Multimodal Reasoning Performance”: The current paper emphasizes that the method has “excellent multimodal reasoning performance,” but the core contribution of the paper should be enhancing the student’s model’s performance in pathology unimodal reasoning under the absence of genomic modalities through multimodal knowledge decomposition and knowledge distillation. The statement in the conclusion, “significantly improves multimodal reasoning performance,” is only validated on TCGA-BRCA, and the support is relatively insufficient. Suggestion: The core contribution of the paper should be clearly articulated as the improvement of pathology slide unimodal reasoning by introducing multimodal supervisory information, and it should be emphasized as a solution to the problem of missing modalities in real-world clinical scenarios, avoiding overemphasis on the improvement of multimodal reasoning performance.
-
Insufficient Ablation Experiment Design: The current ablation study only presents the combined effect of the MKD, SKD, and CLOD modules, but it does not further break down the individual effects of SKD and CLOD. This makes it difficult to clearly determine the contribution of each module. The weight α for the orthogonal loss is set to 1/6, but the hyperparameter tuning process is not described. Suggestion: Add individual tests for each module, especially for CLOD’s individual effect, and explore the interaction between SKD and CLOD to better validate the effectiveness of each module. Additionally, consider adding an ablation experiment for hyperparameters.
-
Lack of Genomics-Only Inference Validation: The conclusion mentions that the method “supports inference using pathology slides, genomics data, or both depending on available modalities,” but no experimental results or model structures are provided to support this claim of performing genomics-only inference. Suggestion: Either provide experimental results for genomics-only inference, or modify the wording to avoid overstating the method’s capabilities.
-
Lack of Theoretical Depth: In the MKD module, the theoretical basis of the orthogonal loss and CORAL loss is not sufficiently discussed. The lack of visualization of key modules, such as the feature distribution after MKD decomposition, makes it difficult to substantiate the claim that “MKD promotes the independence and separability of the three types of features.” Suggestion: Add relevant visualizations, such as t-SNE/PCA comparison plots before and after MKD decomposition, to demonstrate the feature distribution.
-
Lack of Computational Efficiency Discussion: Although the manuscript mentions “reducing computational costs,” there is no comparison with other methods in terms of computation time, resource consumption, etc. Suggestion: Add quantitative analysis of computational efficiency, such as training time, inference speed, and GPU resource usage, to help assess the feasibility of deploying the method in practical applications.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper proposes an online distillation method based on Multimodal Knowledge Decomposition (MKD) to enhance the unimodal reasoning performance of pathology slides. The core innovation of the paper lies in improving the model’s ability to perceive key features through multimodal data training, enabling accurate IHC biomarker prediction during the inference phase by relying solely on pathology slides. By introducing the Similarity-Preserving Knowledge Distillation (SKD) and Collaborative Online Distillation (CLOD) mechanisms, the method further enhances the model’s robustness and reasoning accuracy. Experimental results show that the proposed method significantly outperforms existing state-of-the-art methods in pathology slide unimodal reasoning and demonstrates excellent performance in predicting ER, PR, and HER2 biomarkers.
In evaluating this paper, I have considered its innovation, experimental validation, clinical application potential, and the generalizability of the method. Firstly, from an innovation* perspective, the paper addresses the issue of improving pathology slide unimodal reasoning when genomic data is unavailable, a problem of practical clinical value. Through multimodal knowledge decomposition and distillation, the model learns richer knowledge from the multimodal data during training, and the distilled student model significantly improves pathology unimodal reasoning performance. This innovative approach solves the common clinical problem of missing modalities, with strong application prospects.
Secondly, the experimental validation is sufficiently robust. The authors used TCGA-BRCA and their own dataset, performing 5-fold cross-validation and validating the method’s generalization ability with an external test set. The experimental results show that the proposed method achieves significant performance improvements in pathology slide unimodal reasoning, especially in ER and PR biomarker prediction. Additionally, the ablation experiments effectively demonstrate the contribution of the MKD, SKD, and CLOD modules to performance improvement, further confirming their effectiveness.
From a clinical application potential perspective, the method has high practical value. In real-world applications, the lack of genomic data often limits the prediction of related biomarkers. However, this method solves this issue by enabling unimodal pathology slide reasoning, making it particularly suitable for hospitals or environments where genomic data cannot be routinely obtained. This design gives the method significant application potential in resource-limited clinical settings and offers a broad range of applications.
However, there are some areas for improvement in the paper. Method positioning: The paper emphasizes the method’s “excellent multimodal reasoning performance,” but this conclusion lacks sufficient support, and the core contribution of enhancing pathology slide unimodal reasoning performance through multimodal data training is not clearly articulated. Therefore, I recommend clarifying this point in the revision. Ablation experiments: While the paper demonstrates the combined effect of various modules, there is a lack of validation of the individual effects of the SKD and CLOD modules, which makes the contribution of each module unclear. Further breaking down these modules and conducting ablation experiments on hyperparameters would help further validate the method’s effectiveness.
Overall, despite some shortcomings in method positioning, ablation experiments, and computational efficiency, these issues do not detract from the overall contribution of the paper. The method demonstrates significant improvements in pathology slide unimodal reasoning and has practical clinical application value. Considering the innovation, experimental results, and real-world significance, I believe this paper should be accepted with minor revisions to further refine the details.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The authors propose a knowledge distillation (KD) architecture that combines many methods utilized in the medical image analysis community. Their key insight is the application of modality focusing hypothesis, which states that effective cross-modal KD requires learning both modality-specific and modality-general features. Compared to uni-modal training as well as other KD methods, their method performs better on public data as well as internal proprietary data.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Modality decomposition focus: The authors break down multi-modal features into three kinds – one set shared by both modalities, one set by one mode, and the last set by the other mode. The design of their methods centers around this knowledge decomposition model. Though this may be obvious, nearly all multi-modal methods in digital pathology design their objective around learning only shared general features. So, although the idea of modality focusing is not new, the application to genomics + pathology data is new.
Dual teacher distillation: The combination of the various components of their architecture and loss function, i.e. attention MIL, self-normalizing networks, CORAL loss, orthogonal loss, similarity-preserving knowledge distillation, and collaborative learning for online distillation, have all been done individually in pathology and even some multi-modal studies, but their amalgamation in addition to the idea of adopting two teachers – one trained to learn and distill modality-general knowledge and one trained to learn and distill genomic only knowledge, is a novel formulation.
Result consistency: Ablation study results are consistent with comparison methods on TCGA-BRCA when they should be. Bravo.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Questionable justification: The authors hypothesize that overall survival is associated with IHC biomarkers status then use this to justify applying a Cox proportional hazard model to perform gene selection on RNA-seq data as a preprocessing step. Although the application is appropriate for feature selection in the context of survival prediction, its application here is unorthodox. This is because the stronger this correlation is, the less likely it is that the authors’ proposed MKD approach contributes to performance gains over previous methods.
Incomplete performance reporting: The authors report only a mean across five folds on TCGA-BRCA as well as the external dataset. It would be more informative if they reported a standard error or confidence intervals across these folds to get an idea of the variability in test performance. If the variance is high, it would call into question whether their method is truly better than comparisons.
Motivational inconsistency: The authors state in the introduction that one of the motivations for their study is to mitigate the need for expensive genomic data collection. It therefore does not make sense why they report multi-modal results and comparisons. It does not matter that their method performs better than other multi-modal methods (though the authors mention this multiple times) because to point of multi-modal knowledge distillation is to avoid using the expensive/time-consuming data source.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Lack of clarity: It is not immediately unclear in Table 1 what the difference between “Ours” is between the pathology and multi-modal sections is. It took me a few minutes to understand that both are being trained on genomics + pathology data but that former is being tested only on pathology data and the latter on both data. Can the authors please state this explicitly somewhere. Similarly, can the authors please specify where each feature set is extracted from? It is unclear if it is from the network or the head.
Lack of consistency: The authors report F1 in Tables 1 and 2 but oddly not in Table 3.
Future work: There may a more effective way to pre-processing genomics data (or perhaps not at all). There may be genes not associated with survival that can contribute to KD performance. For example, treatment response or recurrence. There may also be some way to distill knowledge to individual patches before aggregation. Different patches are likely associated with different expression patterns and therefore their representations may be enriched via KD. Lastly, there is still a notable gap between pathology-only and pathology-genomics. Rather than developing and comparing to other multi-modal models, it would more interesting to pursue research to narrow this gap.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(6) Strong Accept — must be accepted due to excellence
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I rated this paper a 6 due to its novel and well-motivated approach to multi-modal knowledge distillation, which synthesizes a wide array of established techniques into a coherent and thoughtfully designed architecture. The authors’ emphasis on decomposing multi-modal features into modality-general and modality-specific components is both conceptually sound and impactful, especially given the underexplored application of this paradigm to pathology and genomics. The dual-teacher strategy—separately distilling shared and genomic-specific knowledge—is particularly innovative and provides a strong foundation for the observed performance improvements. While there are concerns about the gene selection justification and incomplete performance reporting, these are relatively minor and do not undermine the core methodological contributions or the consistency of results. Overall, the paper offers a substantial advancement in the domain of medical multi-modal learning and merits strong acceptance.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
This was already the best paper in my stack, and although I missed one of the main concerns brought up by the other reviewers (i.e. a couple ablation studies missing), the authors have addressed this as well as one of my other concerns, namely regarding an estimate of variance in test set performance. Only two author rebuttals are not sufficient. The authors state in the rebuttal that multi-modal results are reported for comparison to other multi-modal methods, but the title of the paper is knowledge distillation, so I think they missed the point (i.e. focus on one thing, and don’t stray). Next, the authors did not report variance in their metrics across cross-validation folds in the original paper. Since they were not able to edit the paper yet, I am not able to see these results. If the variances are high, then their method basically doesn’t improve over previous methods. Therefore, it is difficult to assess whether this merits a rejection.
Finally, regarding gene selection via survival prediction, I was unclear, so my apologizes. The point I was trying to make is that there are other genes that may be predictive of biomarker status (i.e. distillable to pathology) but are not of survival. Yes selection helps, but you may be missing other important genes to distill. Moreover, the logic conclusion of gene selection would be to simply select ER, PR, HER2 genes for distillation, but these results are not present. Given my lack of clarity, I will disregard this.
Given that the paper was already strong and one of my main concerns was addressed, I think the paper should be accepted.
Author Feedback
We thank the reviewers for their valuable comments. Below, we respond to their comments into a few major categories:
[Q1] Gene selection via survival analysis and its implication on MKD module R1: It remains uncertain which specific gene expressions are strongly associated with ER, PR, and HER2 status in breast cancer patients. Given that patients with different biomarker profiles often exhibit distinct survival outcomes, we hypothesized that genes related to overall survival might also correlate with tumor biomarker status. Thus, we selected gene features via survival analysis, which proved effective in improving biomarker status prediction. As shown in Table 3, the MKD module contributes to additional performance gains under this gene selection strategy. Nevertheless, we acknowledge that exploring alternative gene selection approaches, such as those based on treatment response or recurrence, remains an important direction for future research.
[Q2] Incompleteness of comparative and ablation studies. R2: We have obtained both the mean and standard deviation of cross-validation results, along with separate ablation results for SKD and CLOD modules. Since the orthogonal loss is much larger than the CORAL loss, we set its weight α to 1/6 after tuning across several values below 1.0. Existing approaches such as MOTCat and PIBD, which integrate genomic and pathology data for survival prediction, can be adapted to our biomarker prediction task on TCGA-BRCA dataset. These additional comparative analysis will be included in the revised manuscript, subject to space requirements.
[Q3] Necessity of reporting multi-modal comparative results. R3: We report both single-modal and multi-modal results to demonstrate that our trained models are flexible and can operate under either setting. While our knowledge distillation approach is primarily motivated by the goal of reducing reliance on costly genomic data, the multi-modal results highlight its adaptability when such data are available.
[Q4] Lack of clarity and the potential of multimodal information leakage. R4: We apologize for any confusion caused by the table descriptions. In Table 1, the first “Ours” under the “Patho.” modality refers to the pathology student model, while the second “Ours” under “Multi.” corresponds to the multi-modal teacher model. In Table 2, “Ours” denotes the pathology student model. During training, the pathology student model, genomics teacher model, and multi-modal teacher model are jointly optimized using multi-modal data. However, each model is implemented independently, and during inference, the pathology student model operates solely on pathology data. This design ensures that there is no information leakage from genomic data into the pathology student model.
[Q5] Justifying the advantages of incorporating new modalities. R5: While our multi-modal learning model achieves slight improvements over SOTA multi-modal baselines, our pathology student model shows significant improvements over those pathology-only approaches. This indicates the advantage of distilling multi-modal knowledge to pathology student model to enhance predictive performance. Our pathology student model operates using only pathology slides, thereby mitigating the need for costly genomic data collection during inference.
[Q6] Lack of genomics-only inference validation. R6: We have already generated genomic-only inference results on TCGA-BRCA dataset, and will add them to revised manuscript.
[Q7] Lack of theoretical depth. R7: We have performed t-SNE visualizations of the decomposed features, which support the effectiveness of the MKD module. These results will be added to revised manuscript.
[Q8] Lack of computational efficiency discussion. R8: After MDP module, the training time for a five-fold cross-validation is about 25min, and the average inference time per WSI is less 1.0s in our workstation (RTX 4090 GPU). Computational efficiency will be discussed in the revised manuscript.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper introduces a novel and well-structured multi-modal knowledge distillation framework that effectively enhances unimodal pathology-based biomarker prediction by decomposing modality-specific and generalizable features. The dual-teacher design and consistent performance across datasets demonstrate both methodological innovation and clinical relevance. Despite minor concerns, the paper makes a strong contribution and merits acceptance.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A