Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cervical diseases present a significant global health challenge, especially in resource-limited regions with scarce specialized healthcare. Traditional analysis methods for thin-prep cytologic tests and whole slide images are hindered by their reliance on time-consuming processes and expert knowledge. Although AI-driven approaches have advanced single-task screening, they often face difficulties adapting to multi-task workflows and handling extreme class imbalance, thereby limiting their practical deployment in real clinical settings. To address these challenges, we propose a novel framework, MECDS, for multi-task early screening of cervical diseases. Specifically, we design dynamic feature routing to prevent inter-task interference and selectively process task-relevant features. Furthermore, we employ asymmetric attention levels during knowledge distillation to address class imbalance, thus enhancing performance across diverse classes. Our extensive experiments on a large-scale dataset comprising 29,774 whole slide images demonstrate that MECDS surpasses existing single-task and multi-task models across three key screening tasks: cervical cancer, candidiasis, and clue cell detection. Additionally, MECDS exhibits remarkable extensibility, allowing for the efficient integration of novel diagnostic tasks without the need for exhaustive retraining. This unified framework holds great promise for improving comprehensive screening programs in resource-constrained healthcare environments, potentially advancing early detection and improving health outcomes. Our code is released at https://github.com/peter-fei/MECDS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1643_paper.pdf

SharedIt Link: https://rdcu.be/eHxct

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05185-1_38

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/peter-fei/MECDS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{JiaHao_Multitask_MICCAI2025,
        author = { Jiang, Haotian AND Huang, Haolin AND Cai, Jiangdong AND Xu, Mengjie AND Shen, Zhenrong AND Fei, Manman AND Wang, Xinyu AND Zhang, Lichi AND Wang, Qian},
        title = { { Multi-task Screening for Cervical Diseases via Feature Routing and Asymmetric Distillation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {389 -- 398}
}

Reviews

Review #1

Please describe the contribution of the paper

The proposed multi-task architecture separates and routes features specific to each task. Asymmetric knowledge distillation facilitates teacher-to-student transfer by emphasizing positive samples, which is particularly important in clinical settings affected by data imbalance. Moreover, the architecture is flexible and allows for the addition of new tasks with minimal retraining effort, making it a promising solution. The proposed method demonstrates superior performance compared to both single-task and other multi-task settings.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The writing is clear and well-organized, making it easy to follow the authors’ ideas. The multi-task feature adaptation strategy constrains inter-task interference and enables the selection of task-specific features. Moreover, the flexible architecture allows for the seamless addition of new tasks with minimal retraining, making it a promising solution for clinical applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While Asymmetric Knowledge Distillation (AKD) is a reasonable approach for addressing data imbalance, there are several other established methods such as focal loss. The paper should clarify the advantages of using AKD over these alternatives.

Minor comment: The authors claim a ‘significant’ improvement in results, yet the reported numerical gains are small. Statistical evidence, such as p-values, should be provided to support this claim.

In the ablation study, the comparison between AKD and KL divergence is questionable, as AKD is specifically designed under the assumption of an imbalanced test distribution.

Regarding task extensibility, the authors did not specify key experimental settings—for instance, the number of samples used to fine-tune the new task.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The writing is clear and well-organized, and the motivations for each module are clearly presented. However, the proposed method lacks novelty, with limited comparison to other approaches for handling data imbalance and insufficient experimental details regarding task extensibility
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors address key challenges in cervical cancer screening, specifically: (1) the limitation of single-task models, which rely on sequential diagnostic workflows to handle multiple aspects of screening, making them less practical for clinical deployment; and (2) the issue of class imbalance in the dataset. To overcome these challenges, they propose a novel framework called MECDS, which incorporates a Multi-task Feature Adaptation strategy to tackle the first issue. Additionally, MECDS employs an Asymmetric Knowledge Distillation approach to effectively address the problem of class imbalance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The clarity and organization of the paper is good.
2. The main strength of the paper lies in its Multi-task Feature Adaptation (MFA) module, which effectively extends traditional single-task methods to a multi-task setting. Their proposed Task-Isolated Self-Attention mechanism ensures independent learning for each task, reducing inter-task interference. The overall design is thoughtful and presents a compelling approach to efficient multi-task learning.
3. The ablation study and varied set of experiments are beneficial.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The explanation of the methodology lacks sufficient technical depth in several areas:
1. It is unclear how the task-specific tokens are generated.
2. The description of Task-Isolated Self-Attention is largely theoretical, with limited technical validation. Specifically, it does not explain how this mechanism ensures independent learning for each task, prevents inter-task interference, or facilitates efficient integration of new tasks.
3. For the DFR module, there is no clear explanation of how the mapping from the nᵗʰ patch to the mᵗʰ feature is done, how is the importance score calculated or whether this mapping is indeed learnable. Additionally, the role of task tokens remains ambiguous—while they are concatenated both before and after the expert level, the motivation behind this design choice is not well-justified. A more detailed and technically grounded explanation would help clarify how these components contribute to the overall learning process and task-specific representation.
4. The section on Asymmetric Knowledge Distillation lacks important details regarding its implementation. Simply incorporating focal loss may not be sufficient to fully address the class imbalance issue. While focal loss helps by down-weighting easy examples and focusing training on harder, misclassified instances—typically from minority classes—it does not inherently guarantee balanced feature learning or effective knowledge transfer, especially in multi-task or distillation settings.
5. Without further explanation on how the student is guided to learn from the teacher in an asymmetric manner (e.g., selective feature alignment, task-specific distillation weights, or confidence-based sampling), it’s difficult to assess how comprehensively the method tackles class imbalance beyond just loss-level adjustments. A more detailed account of the distillation strategy and its impact on class-wise performance would strengthen the contribution.
While the core ideas are promising, it appears that due to space constraints, the authors may have had to omit significant portions of the technical explanation. Unfortunately, this reduction somewhat undermines the clarity and completeness of the methodology.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a promising framework, MECDS, designed to address key challenges in cervical cancer screening, particularly the limitations of single-task models and class imbalance. A major strength lies in its Multi-task Feature Adaptation (MFA) module, which introduces Task-Isolated Self-Attention to ensure independent learning for each task, effectively reducing inter-task interference. This is complemented by the Dynamic Feature Routing (DFR) module, which dynamically selects task-relevant features by computing importance scores between patches and tasks, retaining only the most relevant features and concatenating them with task tokens for efficient task-specific learning. Additionally, the framework incorporates Asymmetric Knowledge Distillation along with focal loss to mitigate class imbalance, enhancing its robustness for real-world applications. However, the paper lacks sufficient methodological detail in several areas. The process of generating task-specific tokens is not clearly explained, and the explanation of Task-Isolated Self-Attention remains theoretical, without clear technical validation. The DFR module also leaves questions unanswered, such as how importance scores are computed or is it learnable. Furthermore, the implementation of Asymmetric Knowledge Distillation is underdeveloped, with no discussion on the selection of hyperparameters or how it fundamentally differs from standard knowledge distillation techniques. These gaps, possibly due to space constraints, slightly reduce the clarity and depth of the overall contribution, though the core ideas remain innovative and impactful.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The main contribution of this paper lies in the development of MECDS, a unified framework for multi-task early cervical disease screening that addresses critical challenges in clinical practice, such as inter-task interference and extreme class imbalance. Specifically, the paper introduces a novel Multi-task Feature Adaptation strategy, incorporating Task-isolated Self Attention layers and a Dynamic Feature Routing module to ensure task independence and efficient feature selection. Additionally, the paper proposes an Asymmetric Knowledge Distillation (AKD) scheme to effectively handle class imbalance, particularly by enhancing positive sample learning. The framework demonstrates superior performance over both single-task and existing multi-task methods across multiple cervical disease screening tasks, while maintaining extensibility to new diagnostic tasks without requiring exhaustive retraining.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novel Formulation: The introduction of the Dynamic Feature Routing module is a key innovation, inspired by clinical workflows. This module selectively processes task-relevant features, reducing redundancy and improving computational efficiency. The Task-isolated Self Attention mechanism further enhances task independence, which is crucial for multi-task learning.
2. Addressing Class Imbalance: The Asymmetric Knowledge Distillation scheme is a novel and effective approach to tackle extreme class imbalance, particularly by focusing on positive samples using an asymmetric weighting strategy. This enhances the model’s sensitivity to underrepresented classes, which is critical for early screening.
3. Strong Evaluation: The paper provides extensive experimental results on a large-scale dataset of 29,774 Whole Slide Images (WSIs), demonstrating significant improvements in AUC and sensitivity across three cervical disease screening tasks. The results outperform both single-task and existing multi-task models, showcasing the robustness and effectiveness of the proposed framework.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Lack of Qualitative Analysis: The paper largely focuses on quantitative results, but qualitative analysis, such as visualizations of feature routing or attention maps, would provide deeper insights into how the model processes WSIs and selects task-relevant features.
2. Potential Over-reliance on Synthetic Balancing: The class balancing strategy in training may not fully reflect real-world clinical settings, where data imbalance is unavoidable. The framework’s robustness to highly imbalanced datasets without re-sampling could be further explored.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I recommend accepting this paper as it presents a novel and impactful framework, MECDS, for multi-task cervical disease screening, addressing key challenges such as inter-task interference and extreme class imbalance through innovative components like Dynamic Feature Routing and Asymmetric Knowledge Distillation. The framework achieves state-of-the-art results on a large-scale dataset, demonstrating superior performance over existing methods while maintaining clinical feasibility and extensibility to new tasks without retraining. Its strong methodological contributions, robust experimental validation, and potential to improve real-world screening programs make it a valuable addition to the field.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

Thanks for the valuable comments. We appreciate the recognition of our work and have addressed each of the reviewers’ concerns as follows.

Regarding the task-specific tokens and Task-Isolated Self-Attention (R2): For each task, we assign a task-specific token, which is initialized in a manner similar to the CLS token in ViT. As described in the Method, in the TSA, the m-th task token Tm restricts its attention interaction to only the patch features and itself, thus preventing interference from other task tokens. When a new task is introduced, the new task-specific token is computed similarly, thereby minimizing its impact on previously learned tasks. Due to space limitations, we have not provided detailed experimental proofs of the TSA, but we plan to include them in an extended version.

Regarding the DFR module (R2): For each task, a learnable Router is applied to compute the importance score (size=N*1), resulting in a total size of (N,M). In our study, each expert is responsible for a single task, and each task token represents the overall task-specific feature of a WSI. These tokens are updated by the corresponding expert and are finally sent to the specific task head for diagnosis.

Regarding the details about Asymmetric Knowledge Distillation (R2): Thank you for your attention, we did not employ focal loss. We pre-trained a teacher model on a balanced dataset to assist the student model. This approach helps the student model correctly identify positive samples during training on the complete, imbalanced dataset. This requires increasing the weight of positive samples and decreasing or ignoring the weight of negative samples. Our early experiments showed that focal loss could not achieve it. Therefore, we proposed the AKD loss to perform distillation during the second stage. Additionally, we used CE Loss to enable the model to learn stronger representations across the entire dataset.

Regarding the lack of quantitative experimental analysis on asymmetric manner (R2,R3): Apologies for the missing quantitative experiments on the asymmetric manner due to space constraints. Our experiments focus on three key parts: comparisons with other models in cancer screening, ablation studies on our modules, and task extensibility, which we deem critical for our study. Theoretically, a smaller ωpos increases the distillation weight for positive samples, while a larger ωneg reduces the weight for negative samples. In early experiments, we trained models on imbalanced data using Focal Loss but didn’t achieve the expected performance boosts. In our ablation study (Table 2), we only compared the performance of different distillation losses (our AKD and KL) to demonstrate AKD’s effectiveness. Using the same settings, such as the same imbalanced training data, our AKD outperformed standard knowledge distillation techniques. By incorporating the Asymmetric Knowledge Distillation Loss, the student model focuses more on positive samples, delivering better performance.

Regarding the qualitative analysis (R4): We sincerely apologize for it. In practice, we have observed distinctive heat maps for different tasks, with some regions having high attention scores while others have negligible attention. However, due to the page limitation, we were unable to include these visualizations in this paper. In future work, we will focus on developing a more robust routing strategy and conduct more qualitative analyses to provide deeper insights.

Regarding the class balancing strategy (R4): Thank you for your feedback. In this work, we tackle the class imbalance issue via a two-stage training strategy. In the first stage, we re-sample to a balanced dataset, which improves the model’s ability to learn from both positive and negative samples. During the second stage, the model continues to operate under the naturally imbalanced conditions. We appreciate your suggestions and will consider exploring non-re-sampling methods in our future work.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

All reviewers acknowledge the paper’s strong contributions in multi-task cervical cancer screening. Despite one weak reject (3), the other two reviewers recommend acceptance (4, 5). The paper presents innovative approaches to feature adaptation and class imbalance. I suggest a decision of provisional accept. The authors should address the methodological clarifications noted by reviewers in their camera-ready version. Congratulations!

back to top

Multi-task Screening for Cervical Diseases via Feature Routing and Asymmetric Distillation

Author(s):