Abstract

PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes. The necessity for PET denoising arises from the intrinsic high noise levels in PET imaging, which can significantly hinder the accurate interpretation and quantitative analysis of the scans. With advances in deep learning techniques, diffusion model-based PET denoising techniques have shown remarkable performance improvement. However, these models often face limitations when applied to volumetric data. Additionally, many existing diffusion models do not adequately consider the unique characteristics of PET imaging, such as its 3D volumetric nature, leading to the potential loss of anatomic consistency. Our Conditional Score-based Residual Diffusion (CSRD) model addresses these issues by incorporating a refined score function and 3D patch-wise training strategy, optimizing the model for efficient volumetric PET denoising. The CSRD model significantly lowers computational demands and expedites the denoising process. By effectively integrating volumetric data from PET and MRI scans, the CSRD model maintains spatial coherence and anatomical detail. Lastly, we demonstrate that the CSRD model achieves superior denoising performance in both qualitative and quantitative evaluations while maintaining image details and outperforms existing state-of-the-art methods. Our code is available at: \url{https://github.com/siyeopyoon/Residual-Diffusion-Model-for-PET-MR-Denoising}

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2347_paper.pdf

SharedIt Link: https://rdcu.be/dV5FG

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_72

Supplementary Material: N/A

Link to the Code Repository

https://github.com/siyeopyoon/Residual-Diffusion-Model-for-PET-MR-Denoising

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yoo_Volumetric_MICCAI2024,
        author = { Yoon, Siyeop and Tivnan, Matthew and Hu, Rui and Wang, Yuang and Son, Young-don and Wu, Dufan and Li, Xiang and Kim, Kyungsang and Li, Quanzheng},
        title = { { Volumetric Conditional Score-based Residual Diffusion Model for PET/MR Denoising } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {754 -- 763}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The main technical contribution of the paper is modifying the EDM framework with the 3D patch perspective and adding an MR image as condition to the scoring function. The paper compares the proposed method with different denoising methods for different datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper modifies the EDM framework with the 3D patch perspective and adds an MR image as condition to the scoring function.

    2. The paper compares the proposed method with different denoising methods for different datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is unclear if the baseline model using no MR image is better than existing models.

    2. In my understanding, to have optimal performance of the proposed method, MR and PET images seem to be co-registered. To have reliable co-registered MR and PET images in practice, one needs a fancy PET-MR fusion scanner. Even with an MR image, the performance improvement seems marginal.

    3. The performance comparisons are limited to image denoising methods. Existing PET image reconstruction methods are not considered.

    4. Adding MR condition to the scoring function seems incremental novelty.

    5. The paper title argues that the proposed method is efficient, yet its efficiency is not quantitatively compared with the non-patch based counterpart.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is unclear if the experiments use public data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1-2. I can’t think of how to resolve the marginal improvement of the proposed method (both w/ and w/o MR image prior).

    1. I suggest clarifying that the proposed method use both PET and MR images in the title. (It seems to me that the main novelty of the paper includes adding MR condition to the scoring function.)

    I additionally recommend including discussion about the limitation of co-registration b/w PET and MR images (if my understanding is correct).

    1. I suggest including at minimum one PET image reconstruction deep learning method in comparisons.

    2. I don’t know how to resolve insufficient novelty limitation….

    3. I recommend adding training/inference time comparisons b/w the proposed method and exiting non-patch based counterpart.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have several concerns about experiments and incremental improvement of the proposed method over SOTA methods. I’m a little concerned about the technical novelty of the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    Compared to Restormer, the proposed method increases PSNR and SSIM values by approximately 0.5 dB and 0.1, respectively. From the error map comparison perspective in Fig. 3, it’s difficult for me to observe its improvements over Restormer. Different from the authors’ claim, I don’t consider these as superior denoising performance over Restormer.

    I have a concern about the clinical impact/practicability of the proposed method. The proposed method assumes that MR and PET images are co-registered. One can use a co-registration tool to overcome this, but it’s unclear how its error can affect the performance of the proposed method. Using a PET-MR scanner can overcome this as the reviewer responded, yet, this implies another practical limitation.



Review #2

  • Please describe the contribution of the paper

    The paper introduces the Conditional Score-based Residual Diffusion (CSRD) model for efficient volumetric PET denoising. ​​ The CSRD model addresses the limitations of existing diffusion models by incorporating a refined score function and 3D patch-wise training strategy. ​ It significantly lowers computational demands, expedites the denoising process, and maintains spatial coherence and anatomical detail. ​ The CSRD model achieves superior denoising performance compared to state-of-the-art methods, as demonstrated through qualitative and quantitative evaluations. ​​ The integration of volumetric data and potential application in other modalities further enhance the clinical feasibility and impact of the paper.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novel Formulation: The paper introduces the Conditional Score-based Residual Diffusion (CSRD) model, which is a novel formulation for efficient volumetric PET denoising. ​ The CSRD model incorporates a refined score function and 3D patch-wise training strategy, addressing the limitations of existing diffusion models. ​ This formulation allows for efficient denoising of volumetric PET images while maintaining spatial coherence and anatomical detail. ​

    Efficient Computational Performance: The CSRD model significantly lowers computational demands and expedites the denoising process. ​ It achieves rapid volumetric denoising of 3D PET images within three minutes and requires only 12GB of memory on a single GPU. ​ This efficient computational performance makes the CSRD model practical for real-world applications. ​

    Integration of Volumetric Data: The CSRD model effectively integrates volumetric data from PET and MRI scans. ​ By incorporating anatomical information from MRI scans, the model maintains anatomical integrity and consistency in the denoised PET images. ​ This integration enhances the diagnostic quality of PET images and improves the accuracy of interpretation and quantitative analysis.

    Superior Denoising Performance: The CSRD model demonstrates superior denoising performance compared to existing state-of-the-art methods. ​ This is evidenced by quantitative evaluations using multiple metrics, including mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), structure similarity index (SSIM), Haralick feature distance (Hdist), and perceptual distance (Pdist). ​ The CSRD model outperforms other methods in terms of denoising performance, both in traditional image metrics and feature-based metrics. ​

    Potential Application in Other Modalities: Although the CSRD model is specifically designed for PET denoising, the paper highlights the potential application of the CSRD model in other image modalities. ​ The residual distribution in 3D patches with uniform intensities closely resembles a Gaussian distribution, indicating the potential for the CSRD model to be applied to other image modalities beyond PET. ​

    Overall, the paper’s strengths lie in its novel formulation, efficient computational performance, integration of volumetric data, superior denoising performance, and potential application in other modalities. These strengths contribute to the advancement of PET denoising techniques and have implications for improving the accuracy and interpretation of PET imaging in clinical settings. ​

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    not true paired low dose pet: not demonstrated on real low dose data why generating the residual image during reverse diffusion instead of generating the original high dose img? fig2: CSRD oversmoothing compared to normal does? provide error map 2.2: unsure how the structural anatomical prior was incorporated in the model framework. it seems that the loss cares more about the pixel value distribution instead of the exact anatomical structural model in explicit way. unsupported claim? 2.2.: also unsure how macrostructure of the brain can be preserved within each patch, as each patch is rather small and only include a very small part of the brain. 3.1: pls explicitly note the IRB approval number 3.1: explain more details for the following method description: “The MR images were resampled to the identical spatial resolution as the PET images;” How exactly did you do that? eg. bi-linear upsampling? 3.2: the model is very likely not transferrable or generaliable, given its high computational demand: “4× NVIDIA A100 40GB GPUS (5 days)” fig3: CSRD oversmmothing? table1: indicate p-value for statistical siginificance test. very likely not stats sig from the rest: eg.: PSNR: mean-std (CSRD w/MR) = 41.11-4.97=36.14 < mean of TV highly doubt the significan of the result presented 3.2: how did you re-merge the patch back to the volume? how to you deal with potential pixel discrepancy across the boundaries of different patch volume when re-merging them back together?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    3.2: “Our source code and trained models will be available online.”

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    not true paired low dose pet: not demonstrated on real low dose data why generating the residual image during reverse diffusion instead of generating the original high dose img? fig2: CSRD oversmoothing compared to normal does? provide error map 2.2: unsure how the structural anatomical prior was incorporated in the model framework. it seems that the loss cares more about the pixel value distribution instead of the exact anatomical structural model in explicit way. unsupported claim? 2.2.: also unsure how macrostructure of the brain can be preserved within each patch, as each patch is rather small and only include a very small part of the brain. 3.1: pls explicitly note the IRB approval number 3.1: explain more details for the following method description: “The MR images were resampled to the identical spatial resolution as the PET images;” How exactly did you do that? eg. bi-linear upsampling? 3.2: the model is very likely not transferrable or generaliable, given its high computational demand: “4× NVIDIA A100 40GB GPUS (5 days)” fig3: CSRD oversmmothing? table1: indicate p-value for statistical siginificance test. very likely not stats sig from the rest: eg.: PSNR: mean-std (CSRD w/MR) = 41.11-4.97=36.14 < mean of TV highly doubt the significan of the result presented 3.2: how did you re-merge the patch back to the volume? how to you deal with potential pixel discrepancy across the boundaries of different patch volume when re-merging them back together?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    diffusion model oversmoothing result not stats sig from TV (a non-deep learning conventional method) the volumetric approach is no longer a novel idea: poor novelty

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The reviewer has carefully read your rebuttal reply and appreciate your efforts in replying to our concerns. Although MICCAI no longer allow additional results to be added post-rebuttal, after consulting with Dr Kitty Wong (Submission platform manager), providing additional metric on the same set of results is acceptable (such as affixing p-values to the already presented result metrics) . The reviewer is then satisfied that the paper will be in publishable standard once the author is able to provide additional metrics as promised in their rebuttal reply #1. Other parts of your rebuttal comments were able to satisfy most of my concerns. Thus, the score is updated to (5-Accept)



Review #3

  • Please describe the contribution of the paper

    The paper uses a 3D patch-wise conditional diffusion model where MRI-patches guide the generation of PET patches for efficiency in low-dose PET denoising.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key idea of using MRI to guide PET denoising is interesting. Relatively new design components are utilize in the model itself.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are several baselines in the paper for PET denoising, however, they do not utilize MR guidance. In that case, the fair comparison is against the non-MR version of the proposed method, which does not do any better than some baselines. So the motivation for using a diffusion model is unclear.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is unclear whether the proposed method would offer any benefits over some baselines (e.g., restormer or similar) if they were also fed with the MR modality guidance. Without this comparison, the presented results do not justify the introduction of this diffusion method.

    Given MR guidance, then the method becomes tightly related to im2im methods based on diffusion. Relevant prior art in this domain then should be discussed (e,g., IEEE TMI 42 3524-3539 2023) and compared against.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of comparisons against proper baselines with fair treatment regarding model inputs.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear Reviewers, We appreciate the reviewer’s time and detailed feedback. We’re pleased that the reviewers highly endorse our “novel formulation” of MRI-guided 3D PET denoising and patch-wise training and “the superior denoising performance.” In this rebuttal, we thoroughly address your constructive comments and will incorporate them into the final version.

  1. Performance Metrics: While MAE, PSNR, and SSIM are important, overly smoothed images may yield high quantitative values. Although improvements in these metrics seem marginal, our method significantly improves the Haralick feature (H-dist) and Perceptual distances (P-dist), which evaluate texture and perceptual similarity. In Table 1, our model w/o MR had an H-dist of 3.21, while Restormer and U-net with MR were 3.59 and 4.57, respectively. Our model outperformed in P-dist, with a value of 0.10, compared to Restormer and U-net, which had values of 0.14 and 0.18, respectively. As suggested by R#3 and #4, we will compare our model to diffusion-based im2im methods and include statistical analysis to evaluate the significance.

  2. Utilization of MR Prior: All the DL models leveraged the MR prior as anatomic information by inputting them as an additional channel, which is learned through the DL models (Sec. 3.2). Although Restormer and U-net utilized the MR prior, the denoised image showed a blurred structure (Longitudinal fissure in Fig. 3). In contrast, our model significantly enhances anatomical consistency with MR prior (Fig. 3) and showed the best performance in various metrics in Table 1, supporting the effectiveness of MR prior. We assume registration of MR and PET can be achieved using clinical PET-MR scanners. Since the rigid body model is typically used in brain imaging and MR priors are learned through DL models, we assumed that the registration error is minimal. In the revised version, we will discuss the influence of co-registration.

  3. Computational Efficiency of Training and Inference We used a patch size of 64x64x64 (=78x78x78 mm³), which is sufficiently large to capture brain subregions. We provided the model with 3D coordinates that describe the relative positions of the patches. Our patch-wise training requires 10 GB of GPU memory per single batch. However, the memory requirement easily exceeds GPU capacity when the entire 3D volume is used (64x64x64 vs. 160x160x160). Therefore, a patch-wise approach is essential for 3D model training. For inference, we processed the entire volume at once by leveraging the shift-invariance property of convolution layers. The model took the 4-channel input (coordinates, MR, low-dose PET, and Gaussian noise) with a volume size of 160x160x160. Despite this comprehensive 3D input, the model requires only 12 GB on a single GPU and takes ~3 minutes. We will compare the computational cost to other methods.

  4. Comparison with PET reconstruction and real low-dose data We agree that comparing our method with PET reconstruction methods is important. However, low-dose images can differ significantly depending on the reconstruction method, making standardizing comparisons challenging. For real low-dose PET data, due to limitations caused by motion and additional radiation exposure from repeated acquisitions, we employed Poisson thinning to replicate low-dose imaging while minimizing these impacts accurately. We will clarify our comparison strategies in the revised manuscript.

  5. Effectiveness of Residual learning Our residual diffusion model concentrated on the differences between low-dose and high-dose images. Residual learning is often more stable as it handles smaller variations compared to processing the entire image, and using lower dynamic ranges enhances training stability and prevents value clipping at peaks.

  6. Clarification of details We will modify the title to clarify PET/MR usage. We will include error maps in Fig. 2. We resampled MRs using 3D linear interpolation. IRB number is GIRBA2365.

Sincerely,
All co-authors




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors replied in the rebuttal carefully to part of the reviewer’s concerns. The authors claimed that the concern of registration influence will be discussed, and the comparison strategy will be further explained w.r.t missing comparisons. 2/3 reviewers favour to accept the paper. The reviewer with most positive change revised his/her score from (3-weak reject) to (5-accept) in the condition that the author providing additional metrics in their rebuttal reply 1#. Some concerns remained unsolved (mostly from R1) about the incremental improvement and novelty. But the reviewer R1 couldn’t provide constructive comments to improve this condition either. Strength: The paper uses a 3D patch-wise conditional diffusion model where MRI-patches guide the generation of PET patches for efficiency in low-dose PET denoising. The key idea of using MRI to guide PET denoising is interesting. Weakness: Lack of comparisons against proper baselines with fair treatment regarding model inputs.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors replied in the rebuttal carefully to part of the reviewer’s concerns. The authors claimed that the concern of registration influence will be discussed, and the comparison strategy will be further explained w.r.t missing comparisons. 2/3 reviewers favour to accept the paper. The reviewer with most positive change revised his/her score from (3-weak reject) to (5-accept) in the condition that the author providing additional metrics in their rebuttal reply 1#. Some concerns remained unsolved (mostly from R1) about the incremental improvement and novelty. But the reviewer R1 couldn’t provide constructive comments to improve this condition either. Strength: The paper uses a 3D patch-wise conditional diffusion model where MRI-patches guide the generation of PET patches for efficiency in low-dose PET denoising. The key idea of using MRI to guide PET denoising is interesting. Weakness: Lack of comparisons against proper baselines with fair treatment regarding model inputs.



back to top