Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Acquiring high-quality Positron Emission Tomography (PET) images requires administering high-dose radiotracers, which increases radiation exposure risks. Generating standard-dose PET (SPET) from low-dose PET (LPET) has become a potential solution. However, previous studies have primarily focused on single low-dose PET denoising, neglecting two critical factors: discrepancies in dose response caused by inter-patient variability, and complementary anatomical constraints derived from CT images. In this work, we propose a novel CT-Guided Multi-dose Adaptive Attention Denoising Diffusion Model (MDAA-Diff) for multi-dose PET denoising. Our approach integrates anatomical guidance and dose-level adaptation to achieve superior denoising performance under low-dose conditions. Specifically, this approach incorporates a CT-Guided High-frequency Wavelet Attention (HWA) module, which uses wavelet transforms to separate high-frequency anatomical boundary features from CT images. These extracted features are then incorporated into PET imaging through an adaptive weighted fusion mechanism to enhance edge details. Additionally, we propose the Dose-Adaptive Attention (DAA) module, a dose-conditioned enhancement mechanism that dynamically integrates dose levels into channel-spatial attention weight calculation. Extensive experiments on 18F-FDG and 68Ga-FAPI datasets demonstrate that MDAA-Diff outperforms state-of-the-art approaches in preserving diagnostic quality under reduced-dose conditions. Our code is publicly available. Our code is publicly available at https://github.com/Long0121/MDAA-Diff.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2724_paper.pdf

SharedIt Link: https://rdcu.be/eHwPS

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_32

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Long0121/MDAA-Diff

Link to the Dataset(s)

N/A

BibTex

@InProceedings{NiuXia_MDAADiff_MICCAI2025,
        author = { Niu, Xiaolong AND Ye, Zanting AND Han, Xu AND Huang, Yanchao AND Sun, Hao AND Wu, Hubing AND Lu, Lijun},
        title = { { MDAA-Diff: CT-Guided Multi-Dose Adaptive Attention Diffusion Model for PET Denoising } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {333 -- 343}
}

Reviews

Review #1

Please describe the contribution of the paper

In order to acquire high-quality PET images that meet the requirements of tumor diagnosis and disease research while reducing the radiation exposure risks, the authors proposed a novel CT-guided multi-dose level PET reconstruction Diffusion model (MDAA-Diff) to reconstruct the standard-dose PET (SPET) images from low-dose ones (LPET) and corresponding CT images. Specifically, a CT-Guided High frequency Wavelet Attention (HWA) module which takes both PET and CT features is designed to incorporate the complementary anatomical constraints derived from CT images to the PET imaging process. A Dose-adaptive Attention (DAA) module, which can integrate the dose level information and guide channel-wise and spatial feature weighting, is exploited to boost the perception capability and robustness of the model. The comparison experiments prove that the MDAA-Diff achieves better results than both single-dose and multi-dose approaches.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

a) The problem setting of “CT-guided multi-dose level PET reconstruction” is interesting. Different from the previous methods that reconstruct the high-quality PET image from the single-dose LPET, the authors try to overcome the dose level discrepancies caused by inter-patient variability and leverage complementary anatomical constraints derived from CT images simultaneously, which is reasonable for clinical scenarios. b) The methodology integrates several advanced techniques in deep learning, such as diffusion models, wavelet transform and attention mechanisms, encouraging more realistic reconstruction results. c) Extensive experimental results are conducted on LPET images of six dose levels, which is representative for experiments.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

a) In “Introduction” section, the review of existing PET reconstruction methods is insufficient. As far as I know, in recent years, there are many PET reconstruction methods leveraging complementary anatomical information to boost the reconstruction results. I list some examples below. The authors should clearly state the differences and advantages of their method against these existing approaches in terms of the multi-modality information fusion. [1] K.T. Chen, et al. Ultra–low-dose 18F-florbetaben amyloid PET imaging using deep learning with multi-contrast MRI inputs[J]. Radiology, 290 (3) (2019), pp. 649-656. [2] L. Xiang, et al. Deep auto-context convolutional neural networks for standard-dose pet image estimation from low-dose pet/mri, Neurocomputing, vol. 267, pp. 406-416, 2017. [3] Y. Wang, et al. 3D auto-context-based locality adaptive multi-modality GANs for PET synthesis[J]. IEEE transactions on medical imaging, 38 (6) (2019), pp. 1328-1339. b) In “Methodology” section, the authors claimed that “PET low-frequency features capture global metabolic patterns, while high frequency features encode lesion edges. CT features are dominated by high-frequency components.” Is there any evidence supporting this conclusion? c) The motivation of the key components proposed in this paper is are not sufficient. For example, in “Methodology” section, the authors mentioned that “Global average pooling and max pooling operations are applied to the input features, and the results are concatenated with D_emb.” Why it simultaneously uses average pooling and max pooling in DDA? What advantages does the SE module have over processing the image features? The authors did not explain these problems clearly in the paper. d) In “Experiments” section, the proposed methods have not been compared with the existing CT-guided PET reconstruction methods. Considering the title “MDAA-Diff: CT-Guided Multi-Dose Adaptive Attention Diffusion Model for PET Denoising”, there are supposed to be some experiments to show the superiority of the multi-modality method. e) In “Experiments” section, there is no ablation experiment to validate the effectiveness of CT guidance. Although the authors designed the variant model to evaluate the HWA module, such experimental results can not reflect whether the performance improvement achieved by the method are owing to CT guidance or the fusion mechanism. In addition, the SE module is also not validated. f) The writing is not that good. Some details of the method are not clearly described and there exists mismatches for the text and the figure.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a CT-guided multi-dose level PET reconstruction method based on diffusion model. The main contribution of this paper is that the authors try to overcome the dose level discrepancies caused by inter-patient variability and leverage complementary anatomical constraints derived from CT images simultaneously, which is practical for clinical scenario. In addition, the popular deep learning mechanisms such as diffusion model, wavelet transform and the attention mechanism are incorporated to boost the reconstruction performance. However, its disadvantages are also obvious. As mentioned before, the motivation of the proposed methodology is not stated clearly and the details of existing PET reconstruction methods need to be better described. Another serious problem is that the comparison experiments is not sufficient to prove the superiority of the proposed method. In addition, the ablation experiments should also be supplemented. And the paper is rough in writing. Therefore, I recommend “Weak Reject” for this paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This manuscript introduces MDAA-Diff, a novel CT-guided denoising diffusion model for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) inputs across different dose levels. This work proposes High-frequency Wavelet Attention (HWA) module which extracts and fuses high-frequency anatomical features from CT using wavelet transforms and squeeze-excitation. Dose-Adaptive Attention (DAA) module is also proposed to dynamically condition channel and spatial attention on the input dose level. What’s more, Integration with Improved Denoising Diffusion Probabilistic Models (IDDPM) is implemented for enhancing both reconstruction quality and inference efficiency. The proposed model is evaluated on two datasets (18F-FDG and 68Ga-FAPI), showing strong improvements over state-of-the-art denoising networks in both PSNR and SSIM at ultra-low and moderate dose levels.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-structured and written clearly.
- The clinical significance of PET dose reduction is well-framed, and the importance of modeling dose variability and leveraging CT is convincingly argued.
- Across both datasets and dose levels (2–50%), MDAA-Diff consistently outperforms a range of strong baselines including SwinUnetr, MambaMIR, and HF-ResDiff. Visualizations also highlight sharper boundaries and fewer artifacts in critical anatomical areas.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- While quantitative metrics like PSNR and SSIM are useful and widely applied, they may not fully capture clinical utility. No lesion-level or functional region accuracy assessment is provided. Adding reader study would make the results more convincing.
- Although two tracers are used, the datasets are relatively small (51 and 60 patients).
- The results from different methods in Fig. 2 are difficult to compare.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a well-executed and solid application paper that tackles a clinically important problem. It introduces thoughtful architectural innovations and demonstrates clear improvements in reconstruction performance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I appreciate the feedback from the authors. They have addressed my main concern. I would like to maintain the accept decision.

Review #3

Please describe the contribution of the paper

MDAA-Diff addresses the core challenges of structural blurring, dose sensitivity, and low computational efficiency in low-dose PET denoising through a synergistic design integrating CT-guided anatomical constraints, dose-adaptive optimization, and an efficient diffusion framework.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The primary innovation of this work lies in proposing a channel-spatial collaborative enhancement mechanism that encodes dose information as conditional embeddings to dynamically adjust feature weights, thereby adapting to varying dose levels.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The modifications to DDPM are incremental and lack rigorous mathematical justification.
2. Multi-dose methods like HF-ResDiff are benchmarked, but the paper does not discuss why dose-level concatenation in prior works fails, leaving the novelty of the DAA module underexplained.
3. Key implementation details of the proposed modules (e.g., wavelet parameters in HWA, dose embedding dimensions in DAA) are insufficiently described, hindering reproducibility.
4. The interaction between the HWA and DAA modules is not thoroughly analyzed. It remains unclear whether their combined use introduces redundancy or computational overhead.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is solid overall, but requires further supplementation in experiments and theoretical foundations. For details, refer to Section 7 (Major Weaknesses).
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their valuable comments. Q1: Insufficient review of existing PET/CT fusion methods (R1) Following R1’s suggestion, we have reviewed methods [1-3]. [1, 2] concatenate multimodal images as input, ignoring varying modalities contribute differently across image locations. [3] proposes local adaptive fusion with position-specific weights; it operates only at the input-level without leveraging deep semantic features, which may reduce fusion effectiveness. Our HWA module addresses this by progressively integrating CT high-frequency details via wavelet transform for multi-scale PET-CT fusion, enhancing structural fidelity. Q2: Lack of literature support and CT-guided comparison methods. (R1) [A] showed that high frequency Fourier coefficients (FFC) relate to fine details. It uses a few low FFCs to represent PET’s complex metabolic patterns, indicating PET low-frequency features preserve global metabolic characteristics. As R1 noted, direct literature on “CT features are dominated by high-frequency components” is limited. [B] highlights that high-frequency features provide edge and texture information. Our statement may have been unclear; we intended to emphasize the importance of CT high-frequency features. The statement will be improved. For fair comparison, we clarify that all compared methods take concatenated PET-CT as input, this will be clarified. [A] Levy et al. Spatial low frequency pattern analysis in positron emission tomography: a study between normals and schizophrenics. JNM, 1992, 33(2): 287-295. [B] He, et al. Lrfnet: A real-time medical image fusion method guided by detail information. CBM, 2024, 173: 108381. Q4: The explanation of key components. (R1, R2, R3) Average pooling preserves global context, max pooling highlights prominent features. Their combination enhances feature representation, which is critical for accurate denoising. By concatenating D_emb with dual-pooling features, our model enables dose-adaptive channel modulation, suppressing noise under low-dose while preserving details at high-dose. Our model builds upon the published IDDPM framework, and we mainly focus on its improvements over DDPM. We will clarify details (e.g., biorthogonal wavelet in HWA, D_emb dimension of 256 in DAA) and improve figure clarity. Regarding limited samples and evaluation metrics. Considering space limit, we only report SSIM and PSNR, and will explore lesion-level analyses in future work. Our dataset includes two tracers, each patient has six dose levels, yielding 14006 training samples after patch extraction. Five-fold cross-validation and data augmentation further ensure the robustness and reliability. Q5: Ablation studies are insufficient (R1, R3) Table 2 shows our ablation results: “IDDPM” uses single-dose PET without CT. The incorporation of CT in “IDDPM+HWA” enhances performance (e.g., PSNR: +0.78 dB at 6s), demonstrating the value of CT images. “Ours” further incorporates both HWA and DAA, providing further improvements and indicating that both modules are effective without redundancy. Q6: Advantages of the SE module (R1, R3) The SE module improves feature representation by modeling inter-channel dependencies [C], which standard convolution lacks. By adaptively modulating channel importance, the network emphasizes relevant channels while suppressing redundant ones—critical for PET-CT fusion. As its effectiveness is validated in [C], we did not conduct a separate ablation study. [C] Hu, et al. Squeeze-and-excitation networks. CVPR. 2018: 7132-7141. Q7: HF-ResDiff limitations (R3) In “Introduction” section, we have stated the limitations of HF-ResDiff. They treat D_emb as static global condition by concatenating with time embedding, which lacks direct interaction with image features and is insufficient to capture the nonlinear, spatially-varying effects of dose levels. Our DAA module explicitly embeds D_emb into channel and spatial attention, allowing dynamic, dose-aware feature modulation.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

N/A
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MDAA-Diff: CT-Guided Multi-Dose Adaptive Attention Diffusion Model for PET Denoising

Author(s):