Abstract

Perfusion Parameter Maps (PPMs), generated from Computer Tomography Perfusion (CTP) scans, deliver detailed measurements of cerebral blood flow and volume, crucial for the early identification and strategic treatment of cerebrovascular diseases. However, the acquisition of PPMs involves significant challenges. Firstly, the accuracy of these maps heavily relies on the manual selection of Arterial Input Function (AIF) information. Secondly, patients are subjected to considerable radiation exposure during the scanning process. In response, previous researches have attempted to automate AIF selection and reduce radiation exposure of CTP by lowering temporal resolution, utilizing deep learning to predict PPMs from automated AIF selection and temporal resolutions as low as 1/3. However, the effectiveness of these approaches remains marginally significant. In this paper, we push the limits and propose a novel framework, Progressive Knowledge Distillation (PKD), to generate accurate PPMs from 1/16 standard temporal resolution CTP scans. PKD uses a series of teacher networks, each trained on different temporal resolutions, for knowledge distillation. Initially, the student network learns from a teacher with low temporal resolution; as the student is trained, the teacher is scaled to a higher temporal resolution. This progressive approach aims to reduce the large initial knowledge gap between the teacher and the student. Experimental results demonstrate that PKD can generate PPMs comparable to full-resolution ground truth, outperforming current deep learning frameworks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1885_paper.pdf

SharedIt Link: https://rdcu.be/dV55i

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72117-5_57

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1885_supp.pdf

Link to the Code Repository

https://github.com/mhson-kyle/progressive-kd

Link to the Dataset(s)

https://www.isles-challenge.org/ISLES2018/

BibTex

@InProceedings{Son_Progressive_MICCAI2024,
        author = { Son, Moo Hyun and Bae, Juyoung and Tong, Elizabeth and Chen, Hao},
        title = { { Progressive Knowledge Distillation for Automatic Perfusion Parameter Maps Generation from Low Temporal Resolution CT Perfusion Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {611 -- 621}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces the Progressive Knowledge Distillation (PKD) framework, which offers a novel approach to generating Perfusion Parameter Maps (PPMs) from Computer Tomography Perfusion (CTP) scans at reduced temporal resolutions. This method potentially reduces the radiation exposure for patients—a significant advancement in medical imaging. By utilizing a series of teacher networks trained at varying resolutions, PKD aims to maintain the accuracy of PPMs even when using much lower temporal resolution scans. Although the improvements over existing methods are modest and the comparisons with state-of-the-art techniques are limited, PKD presents a promising concept that could influence future research in the area of medical imaging, particularly in optimizing the balance between imaging quality and patient safety.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s primary strength lies in its innovative approach to reducing radiation exposure in Computer Tomography Perfusion (CTP) scans through the Progressive Knowledge Distillation (PKD) method. By employing a progressive training model that utilizes a series of teacher networks at different temporal resolutions, PKD effectively bridges the gap in knowledge transfer, allowing for the generation of accurate Perfusion Parameter Maps (PPMs) from significantly reduced data inputs. This approach not only demonstrates potential in maintaining diagnostic accuracy with lower radiation risks but also introduces a novel application of knowledge distillation techniques in medical imaging. While the improvement over existing methods may not be substantial, the concept itself contributes valuable insights into the possibilities of enhancing imaging efficiency and patient safety simultaneously.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A notable weakness of the paper is its limited comparative analysis with existing state-of-the-art methods. Without a robust comparison to other leading techniques in the field, it’s challenging to evaluate the true effectiveness and innovation of the Progressive Knowledge Distillation (PKD) framework. Additionally, the results presented in the paper do not show significant improvement over existing methods, which raises questions about the practical value and potential impact of adopting this new approach in clinical settings. These factors suggest that while the PKD method introduces an innovative concept, its current development may require further refinement and validation to substantiate its benefits over established practices in medical imaging.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The paper introduces an interesting method in Progressive Knowledge Distillation (PKD) for reducing radiation exposure in CTP scans. However, the lack of a comprehensive comparative analysis with existing state-of-the-art methods in PPM generation significantly limits the ability to gauge the true efficacy and innovation of PKD. Future revisions should include detailed benchmarks against current leading techniques, providing clear data on performance metrics to establish the method’s comparative advantage.

    2. While the PKD framework is innovative in its approach to training deep learning models at reduced temporal resolutions, the results presented do not demonstrate a significant improvement over existing methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recommendation is primarily influenced by the paper’s limited comparative analysis with existing methods, insufficient methodological clarity, and inadequate explanations for key figures, which collectively challenge its scientific rigor and practical applicability.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose the use of progressive knowledge distillation in the process of training models to generate perfusion maps from CT perfusion data. They show that they can generate perfusion maps comparable to the ‘ground-truth’ using 1/16th temporal resolution and the proposed method is AIF-free. If adopted, it could lead to reduced radiation exposure for patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Progressive knowledge distillation is an elegant solution to this task. The idea to save patients 15/16th of their radiation exposure could have a large impact. Good comparison to the previous state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some details of models/implementation is still missing. No discussion of limitations

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is not clear if the code/models be made available?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Progressive knowledge distillation itself is not novel but how you apply it is novel and should be highlighted instead. I think the knowledge distillation parts of the loss (L_feature and L_soft target) are not explained. In general, some extra background on (progressive) knowledge distillation would help the reader The experiments in table 1 are nice but it is more interesting to see what happens if you train on one data set only and then test on both. There is no discussion of limitations. In the first column for Table 3 – TMAX (1/16), the third from bottom row should be bold instead? It is hard to understand if the small differences in different model performances are statistically significant. Does the table show the mean values over the test set? What are the standard deviations. Why did you stop at 1/16? It would be interesting to see how far this approach can be pushed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is an elegant solution to a clinically relevant problem.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    My assessment is positive based on the authors rebuttal



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel framework called Progressive Knowledge Distillation (PKD) for generating perfusion parameter maps (PPMs) from low-temporal resolution CT perfusion scans. Specifically, PKD leverages a series of teacher networks, each trained at different temporal resolutions, to bridge the knowledge gap between high- and low-temporal resolution models. This approach enables the generation of PPMs comparable to those derived from full-resolution scans, meaning that lower-resolution scans may be acquired in the clinical setting, thus potentially helping to reduce radiation exposure for new patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper lie in the innovative PKD framework and its rigorous evaluation. The PKD approach is particularly novel for its dynamic adjustment of the teacher model according to the student’s learning stage, a concept that aims to mitigate the large initial knowledge gap between the teacher and student models, optimizing the learning process. Then, the authors compare their PKD framework not only against other knowledge distillation methods but also state-of-the-art methods for PPM estimation. Such comparisons help to better understand the relative performance of PKD, emphasizing its efficacy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper overlooks several critical aspects of data preprocessing, notably the consistency of temporal resolution across datasets, which could compromise the reliability and validity of the results. These issues need a more detailed examination, as elaborated in box 10: Constructive Feedback. Additionally, the introduction lacks essential details, such as the motivation for estimating PPMs, and the methodology section omits specifics on the training parameters for the teacher models that are important for reproducing their results. Furthermore, the paper fails to acknowledge the inherent limitations of the study, nor does it explore potential directions for future research.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provide a PyTorch-style pseudocode of their loss and a detailed diagram of the components in the framework. However, training details regarding the optimization of the teacher models are missing, which may be crucial for reproducing their work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Introduction – The authors state that “PPMs are crucial for radiologists in making informed treatment decisions,” yet they do not adequately explain why this is the case. Elaborating on why PPMs are crucial could strengthen the motivation for directly predicting PPMs, enhancing the rationale of their research. Providing specific examples or scenarios where PPMs influence clinical outcomes would effectively solidify this argument.

    Methodology – While the concept of progressive knowledge distillation is well-defined, the transition between different teacher models could be further elaborated. Specifically, it would be beneficial if the authors could provide more detailed criteria or thresholds for changing teacher models during the training process. The authors just mention that the teacher model is adjusted “when specific conditions are met”, but it is not clear what these conditions are. Is the absence of decrement in distillation loss for 5 epochs the only condition? Additionally, discussing the selection process for these teacher models and their specific training setups would help to improve the reproducibility of their work.

    Using a 3D V-Net means that the time is treated as an additional spatial dimension, which may limit the temporal relations that the network can learn. Other architectures, such as CNN-LSTM (doi: 10.1109/EMBC48229.2022.9871735 or Transformer (doi:10.48550/arXiv.1706.03762) models, are specifically designed to handle sequential data and could potentially model spatio-temporal information more efficiently or accurately. For future work, exploring these alternatives might yield improvements in the model performance.

    Complex loss functions, while potentially enhancing model accuracy, often increase computational overhead and can extend training durations. How is the proposed model different in terms of training time and computational requirements compared to the baseline methods? Such a comparison would help in evaluating the practicality of implementing their model in real-world scenarios.

    Dataset and implementation details – In terms of the data, the paper does not specify the original temporal resolution of the scans nor whether this resolution is uniform across all datasets (ISLES 2018 and in-house). If variations exist, it would be beneficial for the authors to describe how the images were standardized or interpolated to a common temporal resolution. In addition, it is stated that “random volumetric patches of size 128 x 128 x 32 are extracted” for the analyses – but why is this necessary? To me, this just raises questions about how the PPMs are then obtained for the full brain and whether multiple patches are analyzed per scan. If so, how are these integrated? I am also particularly concerned about the randomness of extracting 32 time points (or frames) from the sequence and its impact on the consistency of the hemodynamic information present across the dataset. Was there any form of temporal normalization used to align the CT perfusion sequences prior to extracting the patches? A potential method for doing so is to align the peak of the time-intensity curves with the center of the 32 time point window (as performed in doi:10.1016/j.media.2022.102610). Implementing such a temporal normalization could ensure that the selected time points for the 1/16 low-resolution model are at least representative of critical phases in the contrast agent’s passage through the brain, such as the onset and peak of contrast enhancement.

    Moreover, the authors augment the data by duplicating each frame n-times to match their reduced temporal rate (1/n), ensuring that all input sequences consist of 32 frames. However, I argue that achieving a similar high-dimensional latent space representation should be possible without duplicating frames, as this approach merely repeats existing data without adding new information. Thus, this redundancy could unnecessarily increase the computational demands. Conducting an ablation study to assess the impact of this data augmentation would provide crucial insights into whether this is necessary and/or effective in future work.

    Discussion – The discussion lacks a comprehensive exploration of the method’s limitations, such as computational costs and scalability issues. It would strengthen the paper to include these aspects and suggest the next steps for refining the PKD framework.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper shows novelty in its particular approach to dynamically adjust the teacher models according to the student model’s learning stage, optimizing the knowledge distillation process. However, the paper lacks crucial details regarding the data preprocessing, which raises concerns about the reliability and validity of the results. I see the potential for a very interesting paper, especially considering the extensive evaluation of various methodological variations and comparisons against state-of-the-art models. Overall, the paper shows promise as a potential contribution to estimating perfusion parameter maps (PPMs) from low-temporal resolution CT perfusion images.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have effectively addressed the main concerns from the initial review, providing clarifications on the statistical significance of their results, limitations of their study, and specifics of their model implementation. The authors also promise full transparency of their methods upon acceptance. Given the rebuttal and the evident contributions of their methodology, I recommend accepting this submission for its value to the MICCAI community.




Author Feedback

Thank you for the valuable feedback on our manuscript. We will first address common questions, then respond to individual comments in detail.

Are the results statistically significant? (R3 &R4) We acknowledge the reviewers’ observation regarding the perceived minimal improvement of our PKD framework over existing methods. We emphasize that we have rigorously quantified these improvements. We performed a paired t-test to statistically validate the differences. This analysis confirmed that the improvements, while subtle, are statistically significant. We will add the statistical results in the final version. Beyond improvements, our PKD framework offers unique clinical advantages. The use of multiple trained teacher models provides flexibility to address diverse resolutions based on specific demands, enhancing clinical adaptability and effectiveness.

Lack of discussion of limitations and future work. (R3 & R5) Due to space constraints, we could not initially include a detailed discussion of limitations. We acknowledge the extended training time and resources due to the requirement for teacher models. Despite this, the clinical applicability and inference time remain unaffected, staying within a few seconds. We will add these points and suggest further optimization of training efficiency and scalability as future work.

Model implementation detail? (R3 & R5) Upon acceptance, the complete code, detailed training processes, data preprocessing, and teacher model’s selection criteria will be available in the associated repository.

Clarification of loss. (R3) To clarify, Equation (5) presents a simplified version of the knowledge distillation loss, L_KD = L_feature + L_soft target. L_feature corresponds to the first term of right-hand side of Equation (5), while L_soft target corresponds to the remaining terms.

Experimental setting: train on single dataset and test on both? (R3) While we tested using a single dataset for training and application across internal and external datasets, it was not fully pursued due to the limited data in this specific setting: 510 cases (3615 CT scans).

Why did you stop at 1/16? (R3) We identified 1/16 as the lower bound where results significantly deteriorate without our proposed framework. 1/16 already uses only two frames to compute the PPMs.

Lack of a comprehensive comparative analysis with existing state-of-the-art PPM generation models. (R4) Most standard methods require AIF data, while our model generates PPM without needing AIF data, representing a significant contribution. Thus, our comparisons focus on existing AIF-free models, which are scarce. Including AIF-dependent models would not provide a fair assessment due to differing data requirements. Thus we additionally compared our model with various knowledge distillation frameworks to fully showcase its capabilities.

Unclear motivation for PPM generation. (R5) As highlighted in our abstract and introduction, PPMs are instrumental because they “deliver detailed measurements of cerebral blood flow and volume.” In the introduction, we mention that deriving PPMs from CTP aims to address clinical issues with CTP. We will further elaborate the details in the final version.

Data preprocessing details? (R5) To standardize the original temporal resolutions of the scans from the ISLES2018 and our in-house dataset, we interpolated frames to achieve a uniform 40 frames per scan. The extraction of random volumetric patches of size 128x128x32 was used to enhance data augmentation and improve model learning by capturing diverse spatial brain areas. For temporal alignment, the initial frames were selected to capture the inflow, peak, and outflow of the contrast agent, as confirmed by manual annotations from medical experts. For the repeated frames, this method was used to enhance model performance by maintaining consistent input data representation. We acknowledge that this approach could be further explored in future work through an ablation study.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Perfusion maps derived from CTP data are clinically used in stroke management and related conditions and the authors propose the use of progressive knowledge distillation to improve the training of models that generate them. After the rebuttal 2 reviewers recommend acceptance, where the reviewer not in favour of the paper seems to be mostly concerned about a lack of comparison to other approaches. I believe that the methodological contributions of this paper outweigh this concern.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Perfusion maps derived from CTP data are clinically used in stroke management and related conditions and the authors propose the use of progressive knowledge distillation to improve the training of models that generate them. After the rebuttal 2 reviewers recommend acceptance, where the reviewer not in favour of the paper seems to be mostly concerned about a lack of comparison to other approaches. I believe that the methodological contributions of this paper outweigh this concern.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors propose the use of progressive knowledge distillation in the process of training models to generate perfusion maps from CT perfusion data. The reviewers are generally in favor of the paper, especially with major concerns addressed by the rebuttal. The authors shall carefully polish the paper to address the remaining concerns in their final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors propose the use of progressive knowledge distillation in the process of training models to generate perfusion maps from CT perfusion data. The reviewers are generally in favor of the paper, especially with major concerns addressed by the rebuttal. The authors shall carefully polish the paper to address the remaining concerns in their final version.



back to top