Abstract

Positron Emission Tomography (PET), an advanced nuclear imaging technology capable of visualizing human biological processes, plays an irreplaceable role in diagnosing various diseases. Nonetheless, PET imaging necessitates the administration of radionuclides into the human body, inevitably leading to radiation exposure. To mitigate the risk, many studies seek to reconstruct high-quality standard-dose PET from low-dose PET to reduce the required dosage of radionuclides. However, these efforts perform poorly in capturing high-frequency details in images. Meanwhile, they are limited to single-dose PET reconstruction, overlooking a clinical fact: due to inherent individual variations among patients, the actual dose level of PET images obtained can exhibit considerable discrepancies. In this paper, we propose a multi-dose PET reconstruction framework that aligns closely with clinical requirements and effectively preserves high-frequency information. Specifically, we design a High-Frequency-guided Residual Diffusion for Multi-dose PET Reconstruction (HF-ResDiff) that enhances traditional diffusion models by 1) employing a simple CNN to predict low-frequency content, allowing the diffusion model to focus more on high-frequency counterparts while significantly promoting the training efficiency, 2) incorporating a Frequency Domain Information Separator and a High-frequency-guided Cross-attention to further assist the diffusion model in accurately recovering high-frequency details, and 3) embedding a dose control module to enable the diffusion model to accommodate PET reconstruction at different dose levels. Through extensive experiments, our HF-ResDiff outperforms the state-of-the-art methods in PET reconstruction across multiple doses.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1415_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Tan_HFResDiff_MICCAI2024,
        author = { Tang, Zixin and Jiang, Caiwen and Cui, Zhiming and Shen, Dinggang},
        title = { { HF-ResDiff: High-Frequency-guided Residual Diffusion for Multi-dose PET Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors developed a High-Frequency-guided Residual Diffusion Model (HF-ResDiff) that improves upon conventional diffusion models through several innovations. Firstly, it uses a simple CNN to predict low-frequency content, enabling the diffusion model to concentrate on high-frequency details, thereby enhancing training efficiency. Secondly, it integrates a Frequency Domain Information Separator and High-frequency-guided Cross-attention, which help in precisely restoring high-frequency information. Lastly, an embedded dose control module allows the model to adjust to varying dose levels for PET reconstruction. This streamlined approach not only refines detail recovery but also optimizes performance across different settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) a straightforward CNN is utilized to predict low-frequency content, enabling the primary model to focus on high-frequency details, which significantly boosts training efficiency. 2) the integration of a Frequency Domain Information Separator and High-frequency-guided Cross-attention facilitates the precise restoration of high-frequency information. 3) the inclusion of a dose control module allows for flexible adaptation to different dose levels in PET reconstruction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In this study, the authors utilize a CNN as a low-frequency information extractor, introducing L_FFT and L_DWT as novel loss functions. However, both FFT and DWT pertain to frequency domain transformations, and the manuscript does not justify the need for two similar frequency-domain transformations in processing PET images. Furthermore, while diffusion models are employed to handle high-frequency information, the manuscript lacks detailed mathematical analysis of the diffusion model’s performance with high-frequency inputs. A more rigorous mathematical representation could enhance comprehension. Regarding experimental metrics, the study currently employs PSNR and SSIM. Including mean square error as an additional metric is recommended to better assess the model’s accuracy in fitting pixel distributions, potentially providing a more comprehensive evaluation of model performance.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See above for detailed information.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This study proposes employing CNNs as PET image feature extractors for low-frequency information, while using diffusion models to recover high-frequency details. However, the manuscript lacks innovation in the loss functions associated with these frequency domain transformations. Additionally, the diffusion model is presented without sufficient mathematical analysis, making its functionality and effectiveness less transparent. The manuscript would benefit from a more extensive quantitative evaluation to provide a deeper insight into the model’s performance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, authors design a High-Frequency-guided Residual Diffusion Model (HF-ResDiff) for Multi-dose PET Reconstruction with an emphasis on high-frequency details. They embed a Frequency Domain Information Separator (FD-Info Separator) and a High-frequency-guided Cross-attention (HF-guided CA) within the Diffusion model, which isolate high-frequency information and thus better recover intricate details. Moreover, they use a Dose Control Module to accommodate multi-dose PET reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The proposed approach is very well presented in the text and well-illustrated in a figure. -Extensive experiments using quantitative measurements and qualitative assessments with several existing state-of-the-art techniques.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -There is no information about the data used to pre-train CNN at the beginning, which will generate a low-frequency coarse prediction.

    • From a methodological point of view, it is not clear the difference between the proposed approach and the one proposed in [20], except the Dose control bloc ?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The number of samples used is so limited, only 197 subjects, which may affects the generalizability of the proposed Multi-dose PET reconstruction approach, specifically the Ultra Low Dose PET Imaging Challenge Dataset contains 1447 subjects. -The qualitative analysis in the paper is based on the error maps. However, It could be interesting to involve a PET radiologist to interpret clinically the texture details of the HFResDiff generated PET images when compared to SPET.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application of the proposed approach on Muti-dose PET reconstruction is interesting when compared to the state of the art approaches. However, from methodological point of view, there is no difference from [20].

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I think this paper is qualified to be presented in MICCAI.



Review #3

  • Please describe the contribution of the paper

    This paper utilizes a diffusion model to complement the details of results generated by pre-trained models and achieves multi-dose PET reconstruction by leveraging dose levels as prompts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The task of multi-dose PET image reconstruction is more innovative and better aligns with the variations in dose seen in practical applications than the single-dose counterpart.
    2. Using a diffusion model to complement the details of results generated by pre-trained models is reliable. This is because while a diffusion model alone can produce high-quality images, it is difficult to guarantee the fidelity of the images.
    3. The authors propose using high-frequency information and dose as guidance, making the model more adaptive.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It seems that the comparison methods are not carefully implemented, as all of them exhibit stitching artifacts, even at DRF=2, as shown in Fig 2. Is it because these methods are not suitable for the multi-dose scenario? The author should provide an explanation for this observation.
    2. Some important papers for comparison and citation are missing, as follows: [1] Chan C, Zhou J, Yang L, et al. Noise adaptive deep convolutional neural network for whole-body pet denoising[C]//2018 IEEE Nuclear Science Symposium and Medical Imaging Conference Proceedings (NSS/MIC). IEEE, 2018: 1-4. [2] Yang Z, Zhou Y, Zhang H, et al. Drmc: A generalist model with dynamic routing for multi-center pet image synthesis[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 36-46. [3] Jang S I, Pan T, Li Y, et al. Spach transformer: spatial and channel-wise transformer based on local and global self-attentions for pet image denoising[J]. IEEE transactions on medical imaging, 2023. [4] Zhou Y, Yang Z, Zhang H, et al. 3D segmentation guided style-based generative adversarial networks for pet synthesis[J]. IEEE Transactions on Medical Imaging, 2022, 41(8): 2092-2104.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weakness and strengths.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The task and methodology presented in this paper demonstrate a certain level of innovation and thorough experimentation, warranting its acceptance. However, there are still some shortcomings that the authors need to address for improvement.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I believe this work is qualified to be accepted for MICCAI.

    However, I suggest that the author acknowledge and compare their work with the most relevant and recent studies. Specifically, the synthesis of multi-dose PET images has been explored by [1], and the author should include a comparison with this study. Additionally, studies involving training or testing on multi-center[2], multi-tracer[3], and multi-organ[4] datasets should be cited, as they demonstrate strong robustness. In summary, to enhance the credibility of the proposed method, the author should cite and compare their work with these mentioned studies in the future.

    [1] Chan C, Zhou J, Yang L, et al. Noise adaptive deep convolutional neural network for whole-body pet denoising[C]//2018 IEEE Nuclear Science Symposium and Medical Imaging Conference Proceedings (NSS/MIC). IEEE, 2018: 1-4. [2] Yang Z, Zhou Y, Zhang H, et al. Drmc: A generalist model with dynamic routing for multi-center pet image synthesis[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 36-46. [3] Jang S I, Pan T, Li Y, et al. Spach transformer: spatial and channel-wise transformer based on local and global self-attentions for pet image denoising[J]. IEEE transactions on medical imaging, 2023. [4] Zhou Y, Yang Z, Zhang H, et al. 3D segmentation guided style-based generative adversarial networks for pet synthesis[J]. IEEE Transactions on Medical Imaging, 2022, 41(8): 2092-2104.




Author Feedback

Q1: Differences from [20] (R1) Our proposed Dose Control Module addresses the clinical challenge of accommodating PET reconstruction across various dose levels. Based on PET imaging property, we utilize high and low pass filtering to explicitly extract frequency features, offering a more computationally efficient and explainable approach compared to the cross-domain feature fusion method in [20]. While [20] was initially designed for natural image super-resolution, we extend its application to 3D PET reconstruction through targeted modifications. Q2: Reasons for using wavelet and Fourier transforms (R4) We employ FFT to obtain frequency spectrum, which allows for explicit thresholding of high and low frequency components before feeding into the diffusion model. In contrast, due to noise interference, the boundary between high and low frequency components are implicit during the diffusion process. DWT is used as an adaptive segregation, which is more suitable in such context. Both FFT and DWT provide frequency-based guidance from different perspectives, enhancing the model’s performance at various stages. Q3: Loss function lacks innovation (R4) Our main contribution lies in the idea of frequency separation and dose control, rather than new loss functions. Actually, FFT and DWT are classic digital signal processing algorithms, instead of learnable machine learning algorithms that often require optimization objectives. Details can be found in [A]. [A] Hayes, Monson H. Statistical digital signal processing and modeling. John Wiley & Sons, 1996. Q4: Lacks mathematical analysis of high-frequency input (R4) Mathematically, diffusion models are capable of handling images across the entire frequency spectrum. We have included a reference to [9] for comprehensive mathematical derivation of the diffusion process. Due to page limit, we did not conduct detailed mathematical analysis to prove effectiveness of the diffusion model in handling high-frequency information. However, this viewpoint has been mentioned and validated in numerous studies [5,7,13]. Q5: Limited samples (R1) Collecting a large amount of PET images is particularly challenging and expensive in clinical settings. Our experiment included 197 subjects, each comprising 6 dose levels. For the 157 training subjects, there is a total of 30,144 training samples after splitting each image into patches as data augmentation. Most studies on PET reconstruction employed rather smaller datasets (i.e., 16 subjects for [B], 40 subjects for [C], and 36 subjects for [D]). In this context, our training dataset is substantial enough to meet the demand of generalizability. [B] Zeng, P., et al., 3D CVT-GAN: A 3D Convolutional Vision Transformer-GAN for PET Reconstruction. MICCAI, 2022: 516-526. [C] Rui, H., et al., DULDA: Dual-domain Unsupervised Learned Descent Algorithm for PET image reconstruction, MICCAI, 2023: 153-162 [D] Jiaqi, C., et al., Image2Points: A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction. ICASSP, 2024: 99
Q6: Artifacts in results (R2) All comparison methods were implemented using a consistent data processing approach. The models were trained and tested on patches, and the artifacts occur due to stitching of patches into the whole image. Our method reduces these artifacts by addressing the low-frequency components separately through the pre-trained CNN, which enhances the clarity of structural information. Also, the Dose Control Module enables our method to effectively generate PET images across different dose levels. Q7: MSE as an additional metric (R4) MSE can be calculated from PSNR inherently. Considering page limit, we only provide results for SSIM and PSNR. Q8: Comparative methods are insufficient (R2) We have selected five typical competing methods, encompassing three representative categories: CNN-based, GAN-based, and diffusion-based models. We think that these may be sufficient and diverse for experiments in this conference paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    good rebuttal period, reviewer increased the score, and answers were positive.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    good rebuttal period, reviewer increased the score, and answers were positive.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Both method and application are novel. To me, the major weakness is lack of downstream task or human perception score to demonstrate whether the results are useful in clinical practice.

    Considering the novelty, I lean to accept. But I strongly recommend the authors to add downstream task or human perception score, at least something like LPIPS.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Both method and application are novel. To me, the major weakness is lack of downstream task or human perception score to demonstrate whether the results are useful in clinical practice.

    Considering the novelty, I lean to accept. But I strongly recommend the authors to add downstream task or human perception score, at least something like LPIPS.



back to top