Abstract

Motion artifacts can compromise the diagnostic value of computed tomography (CT) images. Motion correction approaches require a per-scan estimation of patient-specific motion patterns. In this work, we train a score-based model to act as a probability density estimator for clean head CT images. Given the trained model, we quantify the deviation of a given motion-affected CT image from the ideal distribution through likelihood computation. We demonstrate that the likelihood can be utilized as a surrogate metric for motion artifact severity in the CT image facilitating the application of an iterative, gradient-based motion compensation algorithm. By optimizing the underlying motion parameters to maximize likelihood, our method effectively reduces motion artifacts, bringing the image closer to the distribution of motion-free scans. Our approach achieves comparable performance to state-of-the-art methods while eliminating the need for a representative data set of motion-affected samples. This is particularly advantageous in real-world applications, where patient motion patterns may exhibit unforeseen variability, ensuring robustness without implicit assumptions about recoverable motion types.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1486_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1486_supp.pdf

Link to the Code Repository

https://github.com/mareikethies/moco_diff_likelihood

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Thi_Differentiable_MICCAI2024,
        author = { Thies, Mareike and Maul, Noah and Mei, Siyuan and Pfaff, Laura and Vysotskaya, Nastassia and Gu, Mingxuan and Utz, Jonas and Possart, Dennis and Folle, Lukas and Wagner, Fabian and Maier, Andreas},
        title = { { Differentiable Score-Based Likelihoods: Learning CT Motion Compensation From Clean Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper addresses motion compensation in CT images without any motion corrupted training data. It uses a diffusion model to learn what a motion-free image should look like and then applies that model to correct patient specific motion corrupted image. Authors compared their result against a previously proposed Autofocus method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    *The approach is quite intuitive as an human expert trained with motion-free images can recognize motion corruption in CT images easily. *No motion corrupted training data were needed which is difficult to find. *High level of mathematical tools are used to achieve the solution. For example, use of neural ordinary differential equations’ solver is unique. *Motion correction at reconstruction time.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    *Too complex mathematical approach to fit in a conference paper’s page limit. *What are functions f(x,t) and g(t) in Eq 1, in this context? Terminologies are often not explained. *Where do timed-data come from for the reverse-time integration in Eq 2? *Does CQ500 dataset contain motion free images only? *Sinograms are forward projected, not real. The claim of motion correction at reconstruction time is difficult to accept from this view. *It appears that validation data for motion corrupted images are simulated (using spline). This means, no motion compensation is tested on real data.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Given the dense nature of the paper it is difficult to judge its reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It may be submitted to a journal with more detailed explanation of the procedure. Alternatively, for page-limited submission, more higher level views could be presented.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Given the page limitation I doubt if the above weaknesses can be addressed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    My comments regarding weaknesses mostly stand. Mathematics is not the problem of the work, presenting them coherently in the paper is. I cannot grasp how denoising of already “clean” images can be considered as motion correction. However, I will revise my score based on the rebuttal.



Review #2

  • Please describe the contribution of the paper

    The authors propose a method for 2D rigid motion estimation in CT images. A score-based diffusion model is trained to learn the score of the distribution of clean head CT images. The evaluated likelihood computed using the probability flow ODE is differentiated to obtain gradient estimates for the motion parameters, which are then optimized using gradient descent.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main advantage is that a method for obtaining motion estimates is presented, that does not require pairs of clean and corrupted samples during training. Thereby, gradients for motions parameters that are modelled for (i.e. assumptions about the expected motion patterns have to be made in advance such as rigid body motion in this case) can be obtained by differentiating the (negative) log-likelihood. Results that are competitive with the autofocus method are obtained, showing that unsupervised training manages to produce similar results as a supervised method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the paper is the lack of detail in the method: what does the mathematical model look like that incorporates the motion parameters into the image formation process? What splines are used to model the motion trajectories and why (i.e. what assumptions are made on the motion trajectory such as smoothness or smoothness in its derivative, etc.)?

    It is not stated explicitely on how corrupted ground truth images are generated, but probably using the same mathematical model that is used in the proposed approach. So even though an advantage is that the approach works without training on corrupted data, it was only shown to work en par with another approach on corrupted data, which has the same underlying motion assumptions. It is not clear how this generalizes to real-world data and more complex scenarios.

    Moreover, the paper does not provide any details on training or hyperparameter choices that would be absolutely crucial to reproduce the results: how were \sigma_min and \sigma_max chosen for the VE-SDE? How many iterations was the score network trained, which learning rate, how large (in #params) was the NCSN++, etc.

    It is not entirely clear whether the usage of the likelihood as the sole criterion for motion parameter estimation is sufficient. The authors state that the corresponding loss landscape is well-behaved - what does this even mean and how/why is this ensured and guaranteed in the process?

    Diffusion models have been used in motion-correction scenarios, especially with MRI in joint alignment and reconstruction approaches. A discussion of this is missing the related work section on motion compensation.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    In its current state, the paper is not reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Clearly stating the underlying mathematical model for the rigid body motion and the spline curves and giving the details required to reproduce the results would improve the paper.

    The overall idea is very interesting. If combined with a joint alignment and reconstruction procedure, this could likely be extended to limited angle tomography. What about 3D imaging, I assume this should directly hold in 3D using 6 motion parameters. Can it also extend to more complex motion patterns?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In its current state, the paper does not explain satisfactorily the used model and setting and does not meet reproducibility criteria.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have discussed and reasoned well on the concerns raised regarding simulated vs. real motion data. It seems valid to first conduct such a study on simulated (real-world) data.

    Since the proposed approach aims to target motion compensation in CT images, it is important to describe the employed mathematical model in detail. Although the method section thoroughly describes the underlying mathematical models (the ODE, likelihood, gradient updates), the motion model should not be omitted.

    Reproducibility is a crucial cornerstone of the MICCAI community. It remains to trust that the authors will include sufficient details upon acceptance to fully guarantee reproducibility.



Review #3

  • Please describe the contribution of the paper

    The paper describes a method that uses the data distribution learned from motion-free CT images to guide optimization-based motion compensation of artifacts-ridden images. Overall the idea is novel and the study is well designed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of using diffusion model-learned distribution to optimize CT reconstruction is novel, as it implicitly encodes the data features without needing to make explicit assumptions.
    2. The developed method allows motion-corrupted CT images to be corrected, even if the motion artifacts have not been obeserved or learned via a prior model.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The motion compensation and reconstruction are performed on individual slices, possibly due to the 2D diffusion model used. By doing so some important correlation/constraint cannot be enforced, for instance the rigidity and continuity of motion.
    2. The motion is simulated/characterized in this study with B-splines. It is unclear whether the B-splines may over-smooth the motion, especially for sudden head movements.
    3. All the evaluations performed in this study are limited to simulated data. Evaluations on real clinical scans are needed.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are encouraged to discuss the use of B-splines to characterize the motion and the limtiations of per-slice reconstruction. Additional studies using real clinical scans are also recommended.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall idea is quite innovative, although the evaluation can be further enhanced.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have explained the use of simulated motion-corrupted data, which is reasonable considering the difficulty of achieving paired data (motion-corrupted and motion-free) in such a scenario for evaluation. However, the 2D-based reconstruction still remains a limitation, as the current algorithm does not fully capture the 3D nature of motion and the smoothness/constraint that comes together with it. The potential pitfalls come with the 2D reconstruction need to be further investigated in future, even through a simulation study.




Author Feedback

We thank all reviewers for providing thoughtful feedback.

In our paper, we propose a novel approach for assessing motion severity in head CT images using the likelihood from a score-based diffusion model trained only on motion-free images. We show that our approach effectively compensates for motion artifacts by aligning the reconstructed image with the distribution of motion-free images and without using any motion-affected examples at training time. We appreciate that this contribution is positively assessed by all reviewers as being “intuitive” (R3), “innovative”, “novel”, and “free of explicit assumptions” (R4) as well as “very interesting” and having “competitive results” despite its unsupervised nature (R5).

Complexity of mathematical formulation (R3): We politely disagree with R3’s recommendation to reject our paper based on its perceived mathematical complexity. All equations are based on established work on score-based diffusion models that are widely used in medical image processing and referenced accordingly. Functions f(x,t) and g(t) are clearly defined in section 2.1 of the paper. We believe that MICCAI values contributions with both experimental and mathematical advancements, and that our paper aligns with these standards. Rejection solely based on mathematical complexity seems unwarranted, especially given that our work is supported by robust experimental evidence.

Missing training details (R5): An overly concise description of the training settings limiting the reproducibility is a valid concern raised by R5. We will add the hyperparameters used for training the score network to the paper and our code will be released in case of acceptance.

Modeling of motion patterns (R4, R5): We acknowledge the paper’s omission of key details regarding motion parameter modeling. Here is a summary of the missing points: Rigid motion parameters are directly incorporated into the projection matrices via matrix multiplication, defining the perspective projection for each acquired projection view in the CT scan. To ensure smooth motion patterns during optimization, we apply Akima splines with continuous 1st derivatives, but discontinuous 2nd derivatives to avoid overshooting. For motion simulation, 10 evenly spaced spline nodes are used across the 360 projections of a full scan. Motion estimation utilizes 30 nodes, equating to one node for every 12 projection images, which we believe adequately captures high frequencies and complex motion patterns given the frame rates of modern CT scanners. Code for motion simulation and integration into projection matrices will as well be released in case of acceptance.

Simulated measurements (R3, R4): We use real reconstructed head CT images but simulate the corresponding measurements via forward projection. Unfortunately, there is no open-access data set containing clinical cone-beam head CT scans together with pre-reconstruction measurement data. Likewise, data sets with real motion-affected images are not openly available. However, our score network is trained on real reconstructed images, with simulated measurements used solely in the motion compensation optimization. Therefore, we anticipate minimal performance degradation on real measurement data since the network operates in image domain. Given appropriate data access, we are eager to validate this in future experiments.

Extension to 3D (R4): The current study is focused on 2D images. Rather than applying the method slice by slice, the preferred way of extending it to 3D incorporates a full 6 DOF motion formulation. This requires a likelihood evaluation taking volumetric information into account. We hypothesize that this is possible, but leave the investigation to future work.

Missing related work (R5): Thanks to the reviewer’s suggestion, we identified the paper “Accelerated motion correction with deep generative diffusion models” by Levac et al. as a relevant reference. We will add it to the paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers noted the novel approach of using a diffusion model to learn and correct motion-corrupted CT images without requiring motion-corrupted training data as a significant strength of the paper. However, several critical issues have been raised that impact the overall acceptance of your manuscript.

    The primary concerns include the lack of clarity in the mathematical formulations presented. Specific functions and terms within the equations were not adequately explained. Additionally, the validation of your method appears limited, primarily relying on simulated data rather than real patient data, which raises questions about its clinical applicability and robustness.

    Whilst during the rebuttal, the authors addressed some questions, it is not enough to convince the AC.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers noted the novel approach of using a diffusion model to learn and correct motion-corrupted CT images without requiring motion-corrupted training data as a significant strength of the paper. However, several critical issues have been raised that impact the overall acceptance of your manuscript.

    The primary concerns include the lack of clarity in the mathematical formulations presented. Specific functions and terms within the equations were not adequately explained. Additionally, the validation of your method appears limited, primarily relying on simulated data rather than real patient data, which raises questions about its clinical applicability and robustness.

    Whilst during the rebuttal, the authors addressed some questions, it is not enough to convince the AC.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents an innovative and intuitive approach for motion compensation in CT images, using a diffusion model without the need for motion-corrupted training data. I strongly agree with Reviewer #4’s positive assessment, highlighting the novel use of the diffusion model to implicitly encode data features and optimize motion-corrupted CT images. Despite some concerns about mathematical complexity and the use of simulated data, the authors have addressed these issues convincingly in their rebuttal. The paper’s originality and potential impact on clinical applications justify its acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents an innovative and intuitive approach for motion compensation in CT images, using a diffusion model without the need for motion-corrupted training data. I strongly agree with Reviewer #4’s positive assessment, highlighting the novel use of the diffusion model to implicitly encode data features and optimize motion-corrupted CT images. Despite some concerns about mathematical complexity and the use of simulated data, the authors have addressed these issues convincingly in their rebuttal. The paper’s originality and potential impact on clinical applications justify its acceptance.



back to top