List of Papers Browse by Subject Areas Author List
Abstract
Cerebral CT Perfusion (CTP) sequence imaging is a widely used modality for stroke assessment. While high temporal resolution of CT scans is crucial for accurate diagnosis, it correlates to increased radiation exposure. A promising solution is to generate synthetic CT scans to artificially enhance the temporal resolution of the sequence. We present a versatile CTP sequence inpainting model based on a conditional diffusion model, which can inpaint temporal gaps with synthetic scan to a fine 1-second interval, agnostic to both the duration of the gap and the sequence length. We achieve this by incorporating a carefully engineered conditioning scheme that exploits the intrinsic patterns of time-concentration dynamics. Our approach is much more flexible and clinically relevant compared to existing interpolation methods that either (1) lack such perfusion-specific guidances or (2) require all the known scans in the sequence, thereby imposing constraints on the length and acquisition interval. Such flexibility allows our model to be effectively applied to other tasks, such as repairing sequences with significant motion artifacts. Our model can generate accurate and realistic CT scans to inpaint gaps as wide as 8 seconds while achieving both perceptual quality and diagnostic information comparable to the ground-truth 1-second resolution sequence. Extensive experiments demonstrate the superiority of our model over prior arts in numerous metrics and clinical applicability. Our code is available at https://github.com/baejustin/CTP_Inpainting_Diffusion.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3196_paper.pdf
SharedIt Link: https://rdcu.be/dV1Mq
SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_7
Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3196_supp.pdf
Link to the Code Repository
https://github.com/baejustin/CTP_Inpainting_Diffusion
Link to the Dataset(s)
https://ieee-dataport.org/open-access/unitobrain
http://www.isles-challenge.org/ISLES2018/
BibTex
@InProceedings{Bae_Conditional_MICCAI2024,
author = { Bae, Juyoung and Tong, Elizabeth and Chen, Hao},
title = { { Conditional Diffusion Model for Versatile Temporal Inpainting in 4D Cerebral CT Perfusion Imaging } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15002},
month = {October},
page = {67 -- 77}
}
Reviews
Review #1
- Please describe the contribution of the paper
The application of DDPM to increase the temporal resolution of Cerebral CT Perfusion by inpainting. They extended DDPM by adding a category and a temporal distance to the embedding.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The method attains state-of-the-art results on their data, with a good range of competing methods from recent years on a large dataset, with a hold out test set consisting of 50 scans. There is a significant improvement across most metrics for this task and multiple metrics are provided. Additionally, they showcase how their method can be used for the auxiliary task of motion correction and is in general well written and easy to understand.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No code and tested on in-house data, so I am not sure how reproducible this paper is. It would be good to have the model be trained and validated on only public data, with a clearly defined split and/or cross-validation.
- The technical novelty is limited as it consists of adding embeddings (categorical and distance) to DDPM for the inpainting task. However, these additions are not ablated and hence the benefit thereof is never verified. In addition, it is unclear how the scans were categorized. Was the categorization performed manually or automatically and is user input required for this method? Additionally, it is unclear if a wrong classification would break the system.
- The impact is overstated. In the application of motion correction, the authors claim, that their model provides clearer diagnostic information. However, it is never stated, what additional clinical information is provided. Also, this has not been validated by a clinician and it is not clear, that these images have better clinical utility.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
It has not been sufficiently specified how the categorical embeddings were choosen.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Having the paper be reproducible would improve the validity of the research and so having a public benchmark with a clear would be preferable.
Additionally, the additions in contrast to prior work should be ablated to see the impact on the final performance.
A clearer formulation of the claims with regard to clinical utility and method. You claim in the abstract that this demonstrates clinical applicability, but it has never shown, that the synthetic scans are not hallucinating.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Reject — could be rejected, dependent on rebuttal (3)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The main factors, were the missing ablation and lack of reproducibility.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Weak Accept — could be accepted, dependent on rebuttal (4)
- [Post rebuttal] Please justify your decision
The reproducibility issues are solved given their feedback, therefoew I change my vote to acceptance.
Review #2
- Please describe the contribution of the paper
This study presents a diffusion model for CTP temporal reconstruction, based on high spatial resolution CTP with proper dosage and standard spatio-temporal resolution. The clinical applicability of the presented method is to alleviate the radiation exposure to the patient in the CTP acquisition without compromising image quality, and potentially enabling CTP acquisition with temporally-limited acquisition CT machines such as CBCT. Its potential applicability as a motion correction tool is also really interesting
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Technically, I do not observe any major flaws in the design and validation of the method (using different datasets and employing a hold-out test set for proper validation)
- The authors make a great effort in evaluating the model thoroughly, especially by implementing previous state-of-the-art models to make a fair comparison of their model compared to previous approaches.
- I appreciate the time-agnostic nature of the approach, especially for motion correction purposes.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-The added value of this approach compared to state-of-the-art methods is not entirely clear to me. While the implementation of the model could potentially reduce radiation exposure, the necessity of CTP acquisition for model conditioning diminishes its impact. Additionally, although the presented method demonstrates superior performance metrics compared to previous approaches, it is important to discuss to what extent this improvement in performance could translate into tangible benefits for clinical practice. -The claims regarding the validity of the model for temporal gaps of up to 8 seconds may be somewhat vague. While the model does show improvement over the compared approaches, there are concerns about the presented error rates, particularly in the case of CBF, when compared to typical nominal values. -Currently, CBF maps are the standard for ischemic core estimation in acute stroke care in most software platforms (e.g., RAPID and Vitrea). Evaluation metrics in Table 2 indicate a high degree of error when acquisition intervals are 4 and 8 seconds, highlighting a trade-off between precision and radiation exposure reduction. Considering that stroke patients are typically older, this trade-off may favor precision over reducing radiation exposure.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
Part of the dataset is public, with the majority of the images being from a private dataset. Training details are well-presented. Preprocessing is properly described. Code is not provided.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
While I appreciate the technical aspect of the presented research and the thorough validation conducted by the authors, I have concerns regarding how the proposed approach could enhance patient care. The study focuses on reducing radiation exposure among a population of patients where adverse events related to radiation exposure may not be clinically significant due to their age. I suggest the authors elaborate on justifying the clinical relevance of an ideal version of this application in clinical practice. As I understand it, the primary benefit would be a reduction in radiation exposure to the patient, but this factor alone may not hold significant relevance in the context of stroke care. Additionally, I believe the application for post-processing motion artifact is promising, given that approximately 10-20% of patients undergoing CTP experience motion artifacts (a significant proportion of them are severe,which precludes any analysis)
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I commend the authors for their adept implementation of a novel application of a diffusion model, which enhances flexibility for temporal image inpainting of CTP series and yields state-of-the-art results. However, I find the applicability of their approach questionable, as complete CTP acquisition is still necessary to obtain a comprehensive dataset. Moreover, the potential benefit to the patient may not justify the quality loss of the perfusion maps compared to a complete CTP acquisition.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Weak Reject — could be rejected, dependent on rebuttal (3)
- [Post rebuttal] Please justify your decision
The use of perfusion imaging in stroke is primarily limited to acute stroke management: patients presenting in the late-window period (more than 6 hours from last seen well) for endovascular thrombectomy selection and for diagnosing occlusion locations, particularly in distal arteries where CT angiography is less sensitive. Typically, a threshold of 70 ml for relative reduction of CBF <30% compared to the contralateral side, along with the Tmax 6 maps output, are used in facilities with perfusion imaging to select patients for endovascular thrombectomy. This approach is mainly used in large comprehensive centers with the necessary hardware for the required acquisition rates. Perfusion imaging is not usually performed (nor recommended) in smaller facilities (primary stroke centers without thrombectomy capabilities), as it lacks clear utility in these settings. Perfusion imaging has no demonstrated applicability for patients with mild symptoms, those in the chronic stage of stroke, or those in rehabilitation. Furthermore, patients with ischemic stroke are generally not sensitive to radiation, as most patients are elderly, and the risk of radiation exposure (if any) is minimal in terms of kidney toxicity or allergic reactions, which are idiosyncratic adverse events unrelated to dose. Pregnant patients are typically imaged with MRI due to concerns about fetal radiation exposure. The error metrics presented for the proposed method, although better than other state-of-the-art methods, leave practitioners questioning the trade-off between radiation exposure and diagnostic precision. Given the nature of the disease, the population affected by it, and the current use of perfusion imaging in clinical care, my initial concerns remain regarding the utility of the proposed method remains, despite its technical novelty and precise description. Also, the time needed for the algorithm is not reported. I acknowledge the contribution from the authors, but I keep my initial opinion
Review #3
- Please describe the contribution of the paper
The paper presents a novel conditional diffusion model for temporal inpainting in 4D CT perfusion imaging, capable of generating synthetic CT scans to fill gaps within imaging sequences. This model employs a unique conditioning scheme that leverages patterns of time-concentration dynamics in perfusion imaging, offering a more flexible and clinically relevant approach compared to traditional generative deep learning methods. Capable of inpainting gaps up to 8 seconds, the model maintains both high perceptual quality and diagnostic accuracy, comparable to ground-truth sequences with 1-second resolution. Additionally, its versatility extends to applications such as repairing sequences affected by motion artifacts.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths of this paper include the novel model architecture, comprehensive evaluation, and direct relevance to practical applications. First, the introduction of a conditional diffusion model specifically tailored to the dynamics of cerebral perfusion imaging is a remarkable advancement. I particularly want to highlight the clever use of time-concentration curves to guide the diffusion process, showcasing an innovative utilization of available data. Moreover, the extensive experiments and validations clearly highlight the superiority of the proposed model in terms of generation quality and clinical applicability compared to existing state-of-the-art methods. The introduction is also well-motivated, setting a solid foundation for understanding the significance of the research. Overall, the model’s ability to accurately inpaint temporal gaps of any duration, independent of sequence length and without extensive prior knowledge of the sequence, demonstrates its broad potential in various clinical scenarios.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The extensive conditioning scheme and training may constrain the generalizability of the model across diverse clinical settings. This complexity also translates into substantial computational demands, potentially restricting the model’s real-world applicability. Additionally, the model requires a manual setting of conditions based on the time-concentration curve scenario and does not consider variations in physiological profiles (e.g., differences in contrast injection protocols and cardiac outputs). Specific examples of these weaknesses are detailed further in box 10: Constructive Feedback.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
The methodology section of the paper is clear in terms of the model components and mathematical formulations, and a detailed description of the training parameters is also included. However, it would be beneficial if the authors provided the source code to enhance the reproducibility of the paper.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Methodology – Variations in contrast injection protocols, such as the rate and volume of contrast medium, can affect the shape and characteristics of the time-concentration curve. Additionally, the cardiac output of each patient greatly influences these curves: higher cardiac output results in a faster circulation rate, leading to sharper and earlier peaks, whereas lower cardiac output tends to flatten and delay these peaks. How does the proposed model adapt to these variations in curve shifts and amplitude? Can the model maintain its performance accuracy in the presence of such variations, especially in cases with high cardiac output that might cause multiple peaks due to recirculation effects?
As I understand it, condition S (one of the three scenarios based on the mean tissue intensities) is currently set manually. Is there potential for optimizing this process? Since the other two conditions appear to be determined automatically, integrating automation for setting condition S could enhance the efficiency and consistency of the model.
The authors mention that their conditioning scheme “produces a large number of combinations for a given target during training”. How long does it take to train the proposed model, and what are the computational costs associated with its implementation? This complexity may be a big limitation for real-world clinical applications, particularly those with limited resources.
Data and preprocessing — What is the original temporal resolution of the CTP images, and how exactly were these images resampled to a 1-second interval? Was a particular interpolation method, such as B-spline approximation, utilized? In addition, the paper mentions the use of “manually annotated artery locations” for the follow-up experiments. Does this refer to the locations of the AIF utilized in the deconvolution process for deriving the perfusion maps? These couple of details regarding the data and its preprocessing were unclear.
Results and discussion — I am confused as to why the evaluation did not include synthetic CT images matching a 1-second acquisition interval, especially since the introduction emphasizes the clinical importance of having such a fine temporal resolution. If the model is capable of doing so, including quantitative and qualitative results for a 1-second interval would greatly enhance the paper’s contributions and provide a more comprehensive assessment of its capabilities.
Future work suggestions – As previously mentioned, it would be very interesting to explore the robustness of the model against outliers and anomalous data inputs. For instance, how does the model perform under conditions of extreme physiological variance? Investigating these scenarios would help assess the model’s reliability in real-world applications. Moreover, expanding the model to incorporate multimodal data could greatly refine its predictive abilities. For instance, integrating clinical metadata that influences cerebral perfusion, such as age, medical history of conditions like hypertension or diabetes, and current medication regimes, could potentially enhance the accuracy and personalization of the CT inpainting.
Other comments:
-
Regarding the notation “Title Suppressed Due to Excessive Length,” I suggest considering a shorter version of the title specifically for the header.
-
References 6 and 7 are identical in the citation list.
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Accept — should be accepted, independent of rebuttal (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper introduces a novel approach to inpainting CT scans within 4D CT perfusion sequences using a conditional diffusion model that effectively leverages patterns of time-concentration dynamics. The evaluation further underscores the superiority of the proposed model in terms of generation quality and clinical applicability compared to existing state-of-the-art methods. This combination of innovative techniques, thorough evaluation, and practical relevance represents a significant contribution to the field and is of high interest to the MICCAI research community.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Accept — should be accepted, independent of rebuttal (5)
- [Post rebuttal] Please justify your decision
The paper presents a novel methodology for inpainting gaps of any duration in 4D CTP images using a conditional diffusion model. This capability marks a substantial improvement over existing methods that necessitate separate training for different gap durations. Moreover, the authors have committed to releasing their code, experiment settings, and preprocessing steps upon acceptance, which addresses my initial concerns about reproducibility.
Author Feedback
Thank you for highlighting the key unclear items.
Reproducibility (R1,R3) Upon acceptance, we will provide our code, experiment settings, and preprocessing steps for reproducibility. We were unable to use only public data as it lacks the AIF data for PPM computation. We also plan to release our private data in the future.
Technical Novelty (R1) Our main contribution lies in our model’s unprecedented versatility, allowing inpainting of gaps of any duration without extensive prior knowledge. This is achieved by minimizing the required condition to simple embeddings. Complex conditioning methods offered minimal improvement, and incorporating advanced technical novelty was unnecessary for our research purpose.
Ablations on the Embeddings (R1) Distance (Local Context): This is essential for specifying the exact position of intermediate scans. Ablating it would prevent the model from distinguishing between potential intermediate scans. Categorical (Global Context): MPVF and MCVD are advanced models providing local context but lack global context. Our model’s simpler and efficient incorporation of both local and global contexts outperforms these models, implying significant contribution of global context. Explicit ablation experiment would have been more informative if space allowed.
Categorization of Global Context (R1,R3) Categorization is automatic, requiring only the time index of the peak mean intensity. During testing, we inferred the peak as the average time point of the two highest-intensity known scans.
Impact of Wrong Categorization (R1,R3) Incorrect categorization likely occurred due to its automated process. However, generating various condition set combinations for the same target (Section 2.2) enhanced robustness and generalizability during training. The model learned from correct conditions to reconstruct clean images, evidenced by its high performance despite categorization errors.
Clinical Benefit and Validation of Motion Correction (R1) In Fig. 4, motion-corrected maps show clearer localization of perfusion deficits, with clear separation from unaffected tissue. Before correction, maps exhibit obvious abnormalities hindering accurate diagnosis. After correction, these abnormalities are removed, and the natural rise-fall behavior of the curves is recovered. Fig. 4 sufficiently suggests the potential without formal clinical validation, given the auxiliary nature of this task.
CTP Acquisition (R4) Comprehensive datasets are needed for training. However, once trained, our model can interpolate from reduced scans during testing, eliminating the need for full CTP acquisition at deployment.
Tangible Benefits Compared to SOTA (R1,R4) Our method’s merit over SOTA models lies in its versatility. SOTA models require separate training for different gap durations (2s, 4s, 8s). In contrast, our single trained model can interpolate any gap between 2s and 8s, offering greater clinical value by being adaptable to various interpolation needs.
High CBF Error Rate (R4) CBF inherently has high values as it measures blood flow per minute, so it does not necessarily indicate performance deficiency. Table 2 includes Lin’s CCC, which is less affected by scale and combines precision and accuracy measures. The CCC for CBF is comparable to CBV and Tmax, indicating consistent performance across different parameters.
Radiation Reduction at the Cost of Precision (R4) While precision is crucial, especially for severe stroke patients, our model could still bring clinical benefit in certain scenarios. Some scanning hardware is incapable of achieving a sufficient acquisition rate. Also, for patients with mild symptoms (assessed by NIHSS) or those at the chronic stage, where focus shifts to long-term management and rehabilitation, clinicians may opt for greatly reduced radiation (up to 1/8) at the cost of slight precision loss. This is especially relevant for patients sensitive to radiation (e.g., pregnancy, high radiation exposure history).
Meta-Review
Meta-review #1
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The authors propose a new approach 4D CTP image inpainting in stroke imaging using a conditional diffusion model. 2 reviewers recommend acceptance after the rebuttal and the reviewer recommending its rejection is mostly concerned about the practical utility of the method in clinical practice. I believe that this aspect is of less importance here as the methodology appears to be sufficiently novel.
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
The authors propose a new approach 4D CTP image inpainting in stroke imaging using a conditional diffusion model. 2 reviewers recommend acceptance after the rebuttal and the reviewer recommending its rejection is mostly concerned about the practical utility of the method in clinical practice. I believe that this aspect is of less importance here as the methodology appears to be sufficiently novel.
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The rebuttal was well received, the reviewers re-engaged and see their concerns widely addressed. All reviewers are now in the accept range and praise the novelty and thorough work of the authors. Therefore acceptance is recommended.
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
The rebuttal was well received, the reviewers re-engaged and see their concerns widely addressed. All reviewers are now in the accept range and praise the novelty and thorough work of the authors. Therefore acceptance is recommended.