Abstract

State-of-the-art knowledge distillation (KD) methods aim to capture the underlying information within the teacher and explore effective strategies for knowledge transfer. However, due to challenges such as blurriness, noise, and low contrast inherent in medical images, the teacher’s predictions (soft labels) may also include false information, thus potentially misguiding the student’s learning process. Addressing this, we pioneer a novel correction-based KD approach (PLC-KD) and introduce two assistants for perceiving and correcting the false soft labels. More specifically, the false-pixel-aware assistant targets global error correction, while the boundary-aware assistant focuses on lesion boundary errors.
Additionally, a similarity-based correction scheme is designed to forcefully rectify the remaining hard false pixels. Through this collaborative effort, teacher team (comprising a teacher and two assistants) progressively generates more accurate soft labels, ensuring the “all-correct” final soft labels for student guidance during KD.
Extensive experimental results demonstrate that the proposed PLC-KD framework attains superior performance to state-of-the-art methods on three challenging medical segmentation tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2751_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2751_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_Progressively_MICCAI2024,
        author = { Wang, Yaqi and Cao, Peng and Hou, Qingshan and Lan, Linqi and Yang, Jinzhu and Liu, Xiaoli and Zaiane, Osmar R.},
        title = { { Progressively Correcting Soft Labels via Teacher Team for Knowledge Distillation in Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper investigates knowledge distillation in medical image segmentation. The authors propose the use of two teachers for knowledge distillation. One targets global errors whereas the other one focuses on boundary errors. This way the method is more tailored for medical image segmentation. The method is evaluated on the ISIC, CVC-EndoSceneStill, and IDRiD dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors have identified several issues that plague knowledge distillation methods for semantic segmentation in medical imaging.

    The proposed method improves on the three dataset against the chosen baselines.

    The authors provide an ablation study on the design choices.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The method seems rather complex, consisting of a lot of small steps. Instead of just simply distilling knowledge from one network to another, one requires two additional networks, soft label adaptation, and false pixel image generation. Each of these steps contributes a little to the overall performance but none of these methods denote significant improvements. This add considerable overhead to basic knowledge distillation methods (i.e. [1]).

    The authors state that boundaries are a noticable issue in the knowledge distillation procedure but do not evaluate this characteristic quantitatively via metrics such as Hausdorff-distance or Boundary IoU [5].

    The method is heavily adapted towards semantic segmentation but does not compare against newer work that investigates knowledge distillation for semantic segmentation [3,4].

    Missing comparisons to simple baselines. One direction to investigate is whether there are not as parameter intensive ways to this problem, that approach the problem in question such as adding a boundary-IoU loss to the original design. A seperate one is to investigate the choice behind the assistant networks, which could be compared against a simple essemble of teacher networks as done in [1] (There is was shown that using ensembles as teachers also significantly improves performance compared to using a singular teacher). As it stands, it is not clear whether the performance increase is due to smart design choices or due to additional compute capacity.

    [1] Knowledge distillation: A good teacher is patient and consistent [2] Decoupled Knowledge Distillation [3] Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation [4] Self-Decoupling and Ensemble Distillation for Efficient Segmentation [5] Boundary IoU: Improving object-centric image segmentation evaluation

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Code is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Personally, I found the Fig1 hard to follow. I Tried to not let this influence the review overall, but by refining the figure (either by slimming it down, splitting it into multiple figures, etc.) and making it more readable, it might be easier to follow the method.

    While the claims made in the introduction can be seen in the qualitative example, it is not shown that these are general problems for these methods. Quantitative analysis of these issues and the subsequent improvements through the proposed changes would help the overall narative.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My major concern is regarding the fairness of the evaluation. While the proposed method utilizes multiple networks, the compared methods seem to strictly use the teacher. This is a clear offset in parameters/compute that goes into solving the issue. Additional concerns such as missing simple baselines and validation of claims made in the introduction facilitate my decision to give this work a weak reject.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my concerns with the promise of intergration of HDD/BIoU and simple baselines. I still have some issues following the authors as i.e. I dont quite understand how using multiple teachers (teacher+ 2assistants) during training relates to “Equal Parameters, Better Performance”. However, while I would not call this approach “groundbreaking” or “simple”, I respect the improvements that this approach can provide.

    As such, after also considering the concerns of the other reviewers, I change my decision to weak accept.



Review #2

  • Please describe the contribution of the paper

    The paper proposes leveraging synthetic False-pixel Images and boundary information to assist the teacher network in generating more accurate soft labels, thereby improving the performance of medical image segmentation. The experimental results validate the effectiveness of the approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper introduces two teacher assistants to assist the teacher network in generating more accurate soft labels for distillation. (2) Proposing False-pixel Image Generation to synthesize images for training a false-pixel-aware Teacher Assistant (TA) to enhance soft label generation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The paper’s discussion of knowledge distillation only encompasses works up to 2022, neglecting advancements like boundary-based knowledge distillation methods such as BPKD or multiple teachers ensemble distillation. Similarly, it somewhat falls short in comparing with the current state-of-the-art methods. (2) Some details are missing, such as the methodology for obtaining soft labels through Progressive Refinement. Additionally, specifics regarding the implementation of Pixel-wise Correction and Similarity-based Correction are not provided. Although the code is provided, it is still advisable to provide some explanation in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The code is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Include recent research on semantic knowledge distillation and add experimental comparisons to validate its superiority. (2) Add details about the Progressive Refinement of soft labels.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While there are existing works that fuse multiple teachers or utilize boundary information for knowledge distillation to improve segmentation performance, the novelty of this paper lies in the use of synthetic images to train false-pixel-aware models and the progressive refinement strategy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes progressive prediction refinement for teacher models in knowledge distillation (KD) for segmentation task. Different from single teacher model in most KD literature, this paper proposes a teacher team: 1) T, same to common teacher model; 2) TA_f, refining wrong predicted pixels from T; and 3) TA_b, refining predicted boundary from TA_f.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This paper utilizes 3 teacher models to progressively refine their predictions.

    2) False-pixel Image Generation is proposed to generate images. These generated images are fed to TA_f and thus TA_f is supposed to focus more on wrong predicted pixels from T. The generation and refinement strategy is interesting.

    3) Fig. 2 shows a boundary transformation proposed by the authors. The authors claim the assumption behind this transformation is, the boundary region is more similar to the background region, and due to the size of convolution kernel, a convolution layer is hard to capture clear representation for boundary. Therefore, the authors of this paper exchange the positions of pixels around boundary and pixels in inner space. The teacher model TA_b is trained on these transformed images to focus on better boundary.

    4) Experiments show the advance of each proposed component.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Typo in Fig. 1 (FIG), two “Select False Negative Pixels”. Blue arrow should be “Select False Positive Pixels”.

    2. The authors only mention Pixel-wise Correction on the bottom part of Fig. 1 and Similarity-based Correction is only mentioned in the last paragraph in the caption of Fig. 1.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Code in supplementary materials. If possible, please also list the version of required packages when releasing the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) I understand the authors have to fit their submission to the page limit, but Fig. 1 and its caption increase the difficulty in understanding the proposal. Please consider splitting the figure and putting some texts to the related paragraph.

    2) Please consider moving pixel-wise correction and similarity-based correction to a new subsection in methodology.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed boundary transformation and false-pixel generation is interesting. And the proposed progressive refinement is promising in the field of medical image segmentation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I personally agree with the authors’ response upon the novelty of this work. At the same time, I agree with R1 and R3 that some of the descriptions of the method are hard to find / to follow. Metrics such as boundary-IoU is advised to be included to support the claim of the authors.

    Since the authors promised to reorganize the descriptions and add metrics such as HD and boundary-IoU, I remain my score.




Author Feedback

We thank the reviewers for recognizing our work as novel(R1,R4), well-organized(R1,R3,R4), and effective(R1,R3,R4). We will address the main issues: 1) Missing comparison with the latest Work(R1-Q1,R3-Q3) Our main contributions are as follows: a) New problem. We unveil an unexplored issue in KD methods: teacher’s wrong predictions lead to misguide the student’s learning. By analyzing these wrong predictions, we innovatively propose progressively correcting mechanism and decompose the problem by devising two assistant networks: one for global false-pixel correction and the other for local boundary refinement. We believe our findings advance appropriate teacher ensemble distillation. b) Significant improvement. Experimental results show that our PLC-KD outperforms the previous best methods by an average of 1.96%/2.47% Dice/IoU across three challenging medical segmentation tasks. c) Compatibility. The core of our PLC-KD is the progressively correcting mechanism, thus can be compatible with various existing KD methods and makes better performance.

Compared to the latest methods [1,2] suited for natural images or multi-class problems, our KD approach is specifically designed for more challenging medical image segmentation tasks, which requires accurate knowledge of blurred lesion/tissue boundaries and insignificant texture differences between lesion/tissue and background. Moreover, the lack of source code release hinders us from sufficient comparison[3,4]. Updated comparative methods will be included in the final version.

2) The effectiveness of each component(R3-Q1,R3-Q4) a) Simple and highly effective. Firstly, we introduce two architecturally identical assistant networks to correct teacher errors from two perspectives: global corrections and boundary refinement. As shown in Table 2, the TA_f and TA_b assistant networks significantly boost performance across three datasets, with average Dice/IoU improvements of 2.91%/3.04% and 3.2%/3.6%, respectively. Remarkably, on the EndoScene dataset, TA_f/TA_b achieve Dice improvements of 5.87%/6.24%. Additionally, the visualizations in Figures 2 and 4 further demonstrate the effectiveness of each component and support our argument. b) Equal Parameters, Better Performance. In [5], the average output of two teacher networks serves as soft labels for guiding the student. Each of their teacher networks has identical functions. By contrast, each component of our teacher team is specially designed for specific functions(basic prediction and corrections). Actually, we previously conducted similar experiments. By maintaining the original teacher team structure and following the KD strategy in [5] led to only a slight improvement over a single teacher network, but performs worse than our method, indicating the superiority of our smart design. Finally, we hope our groundbreaking approach will inspire future correction-based KD research.

3) Boundary-aware performance evaluation (R3-Q2,R3-comment1) We appreciate your suggestion. Metrics like Hausdorff distance and boundary-IoU will be added to Table 1 in the final version for better quantitative analysis. (R3-Q4) Additionally, we will also compare with simple baseline with boundary-IoU loss.

4) Writing issues a) Missing Details (R1-Q2) We apologize for the lack of clarity in the details. Actually, the progressive refinement through pixel-wise correction is explained in the italicized note at the bottom of Figure 1, while the similarity-based correction is detailed in the figure caption. (R4-Q2, R3-comments2) We will split the figure and incorporate relevant text into the paragraph for better readability. b) Typo in Fig.1(R4-Q1) The blue arrow should be “Select False Positive Pixels”, we will revise it in the final version.

Finally, we hope to address all concerns and would be delighted if our paper is considered for acceptance at MICCAI 2024.

Ref: [1]SSTKD [2]DPED [3]BPKD [4]MTED [5]Knowledge distillation:A good teacher is patient and consistent




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal is quite thorough and has addressed the reviewers’ concerns regarding the effectiveness of the proposed approach, comparison to state-of-the-art methods, and boundary-aware performance evaluation. Overall this paper has sufficient contributions and the performance is promising. The final version should include more discussions to highlight the novelty of the paper and add describe the proposed methodology in a more detailed manner.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal is quite thorough and has addressed the reviewers’ concerns regarding the effectiveness of the proposed approach, comparison to state-of-the-art methods, and boundary-aware performance evaluation. Overall this paper has sufficient contributions and the performance is promising. The final version should include more discussions to highlight the novelty of the paper and add describe the proposed methodology in a more detailed manner.



back to top