Abstract

Total body PET/CT systems, which enable unprecedented image quality and ultrahigh sensitivity, are widely utilized for diagnosing and treating diseases like tumors. Unlike regular protocols, dual-time-point imaging (DTPI)– where patients undergo a dual PET/CT scan to enhance lesion contrast – exposes them to higher radiation doses due to an additional CT scan for PET attenuation correction and anatomical localization. To mitigate radiation exposure, we introduce STMDiff, a spatiotemporal matching diffusion model, which reuse CT images from first scanning time point for PET attenuation correction at second scanning time point. Spatiotemporal matching strategy implemented with contrastive learning aims to find the k-best-matched CT images, which enriches the multimodal features of STMdiff and bypasses the cross-modal registration, facilitating the generation of attenuation-corrected (AC) PET images alleviating alignment errors. Both qualitative and quantitative results illustrate that the attenuation-correcte

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3332_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/LEE12365/STMDiff

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiWen_STMDiff_MICCAI2025,
        author = { Li, Wenbo and Huang, Zhenxing and Li, Lianghua and Yang, Chunyan and Wang, Yihan and Qin, Wenjian and Zhang, Na and Zheng, Hairong and Liang, Dong and Liu, Jianjun and Hu, Zhanli},
        title = { { STMDiff: Spatiotemporal Matching Diffusion Model for Dual-Time-Point Total-body PET/CT Imaging via Contrastive Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {545 -- 555}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript presents the STMDiff model, an innovative approach aimed at reducing radiation exposure for patients by utilizing CT images from the initial scan in dual-time-point whole-body PET/CT imaging to assist in attenuation correction for the subsequent PET scan. This concept is intriguing and holds significant potential for enhancing the safety and efficiency of PET/CT imaging. However, while the study is methodologically creative, there are areas in the description of methods, experimental design, and result analysis that require further clarification and refinement.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript presents the STMDiff model, an innovative approach aimed at reducing radiation exposure for patients by utilizing CT images from the initial scan in dual-time-point whole-body PET/CT imaging to assist in attenuation correction for the subsequent PET scan. This concept is intriguing and holds significant potential for enhancing the safety and efficiency of PET/CT imaging. However, while the study is methodologically creative, there are areas in the description of methods, experimental design, and result analysis that require further clarification and refinement.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The description of the STMDiff model’s methodology within the manuscript is not sufficiently clear. Specifically, it lacks detail on how the NAC PET and CT images, serving as generative control conditions, are integrated into the Unet architecture, and whether their integration methods are identical. This information is crucial for comprehending the model’s operational mechanisms and reproducing the experimental outcomes.
    2. Additionally, the choice of k=5 for the number of CT images used as prior conditions is not justified, nor is the impact of k-value on performance discussed. The article also does not explore an adaptive mechanism for selecting k. Given that whole-body scanning encompasses numerous organs and tissues, it seems implausible that five CT images alone could sufficiently correct PET scans across the entire body, warranting further validation and discussion.
    3. The presentation of the experimental data is also unclear. The manuscript does not provide details regarding the distribution of the experimental subjects in terms of gender, age, and health status, which are essential for assessing the model’s generalizability. Furthermore, it is unclear whether the organs presented in the experimental results exhibit any pathological changes and whether the model can accurately correct both healthy and diseased organs. Considering that the model is intended to correct whole-body PET scans, the use of only 84 training samples may not be adequate for training the network, casting doubt on the credibility and generalizability of the findings.
    4. In terms of comparative experiments, the manuscript mentions the use of DDPMs as a control method; however, it fails to explain how DDPMs, which do not inherently support conditional generative approaches, were trained for PET correction and compared with STMDiff. Clarification on these points is vital for evaluating the advantages of the STMDiff model over existing methods.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript presents the STMDiff model, an innovative approach aimed at reducing radiation exposure for patients by utilizing CT images from the initial scan in dual-time-point whole-body PET/CT imaging to assist in attenuation correction for the subsequent PET scan. This concept is intriguing and holds significant potential for enhancing the safety and efficiency of PET/CT imaging. However, while the study is methodologically creative, there are areas in the description of methods, experimental design, and result analysis that require further clarification and refinement.

    1. The description of the STMDiff model’s methodology within the manuscript is not sufficiently clear. Specifically, it lacks detail on how the NAC PET and CT images, serving as generative control conditions, are integrated into the Unet architecture, and whether their integration methods are identical. This information is crucial for comprehending the model’s operational mechanisms and reproducing the experimental outcomes.
    2. Additionally, the choice of k=5 for the number of CT images used as prior conditions is not justified, nor is the impact of k-value on performance discussed. The article also does not explore an adaptive mechanism for selecting k. Given that whole-body scanning encompasses numerous organs and tissues, it seems implausible that five CT images alone could sufficiently correct PET scans across the entire body, warranting further validation and discussion.
    3. The presentation of the experimental data is also unclear. The manuscript does not provide details regarding the distribution of the experimental subjects in terms of gender, age, and health status, which are essential for assessing the model’s generalizability. Furthermore, it is unclear whether the organs presented in the experimental results exhibit any pathological changes and whether the model can accurately correct both healthy and diseased organs. Considering that the model is intended to correct whole-body PET scans, the use of only 84 training samples may not be adequate for training the network, casting doubt on the credibility and generalizability of the findings.
    4. In terms of comparative experiments, the manuscript mentions the use of DDPMs as a control method; however, it fails to explain how DDPMs, which do not inherently support conditional generative approaches, were trained for PET correction and compared with STMDiff. Clarification on these points is vital for evaluating the advantages of the STMDiff model over existing methods.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    This manuscript is rejected due to critical concerns regarding its methodology and data. Primarily, the authors’ rebuttal acknowledges that a key parameter (k=5) used in the submitted work is suboptimal, as their subsequent analysis indicates k=2 yields superior results. This admission fundamentally undermines the reliability of the presented findings. Additionally, the training dataset, despite a high total slice count from 84 subjects, lacks the necessary diversity across distinct anatomical regions to convincingly support the ‘total-body’ correction claims, particularly given the limited and specific patient cohort.



Review #2

  • Please describe the contribution of the paper

    This paper targets CT-free attenuation correction in dual-time-point PET/CT imaging and proposes STMDiff, a framework that leverages contrastive learning to identify optimal PET-CT correspondences in the latent space. The matched CT features are then used as conditional priors for synthesizing attenuation-corrected PET images. Extensive experiments demonstrate the effectiveness of the proposed method.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper addresses an important and clinically relevant problem, mitigating radiation exposure in dual-time-point PET/CT imaging by enabling CT-free attenuation correction.
    2. The proposed contrastive learning-based spatiotemporal matching strategy, which retrieves the k-best-matched CT images as conditional priors, is both novel and effective, and contributes meaningfully to improving synthesis quality.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The model trains on AC PET as both input and output, instead of using NAC PET as input and AC PET as the output. Given that NAC PET images are typically the raw input, using them directly as input could provide more realistic attenuation correction and could help the model learn better mappings between uncorrected and corrected PET images. Clarifying why this design decision was made would be beneficial.
    2. The model uses Top-k CT images as the conditional prior for PET synthesis. It might be worth considering whether including both Top-k CT images and their corresponding Top-k PET images as conditions could potentially improve performance, as both sets of images provide complementary information.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper tackles a significant and relevant problem, presenting a novel and promising method with strong experimental results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed the concerns, and the paper is acceptable.



Review #3

  • Please describe the contribution of the paper

    The main contribution is STMDiff for dual-time-point PET/CT imaging. STMDiff contains two main components, (A) a multimodal spatiotemporal matching network, and (B) a diffusion-based attenuation correction network. Contrastive learning is used in the first component to find a match between pre-non-attenuated-corrected PET and the pre CT. In the second component, post-attenuated-corrected PET is generated through diffusion conditioned on the post-non-attenuated-corrected PET and 5 pre CTs.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript presents a new method for image generation, derived for a very specific (yet important and relevant) scenario. The novelty resides in the way that the authors make use of all the available information. Also, diffusion, which is the state of the art and not yet very often used in this application, is incorporated into the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The major weakness is the small ablation study and lack of experimentation with other ranges for the hyper-parameters.

    Also, a private dataset and no mention of any intention of making the code available decrease reproducibility.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Some spaces seems to be missing (e.g. “correction[6”)
    • Images should, ideally, be placed after being mentioned in the text
    • Justify the selection of RecNet-19 for G1 and G2
    • Why was k fixed at 5? Have other values been attempted?
    • What coefficient of temperature was used in the loss function?
    • How do morphological operations accelerate matching? Which, for what order, and with which parameters? Fixed? Randomly? Where is this represented in Figure 1?
    • Pay attention to typos, e.g. “regoin”
    • Check for typos in the figures
    • The order of the methods both in the text and in tables and figures should be UNet, CycleGAN, DDPM, Mamba, STMDiff, according to their publication year.
    • Since all results in table 1 demonstrate significant difference, the inclusion of the * only adds visual noise
    • Can AC be added to table 1?
    • Correct references, namely upper cases such as “3d”, missing publication years, etc.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the contribution to be interesting and relevant. Moreover, the document is fairly well written and illustrated. Methodology also seems acceptable.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thanks to reviewers for their time and insightful comments. They found our work is novel and well-organized, but also pointed out some issues. In this rebuttal, we will clarify main points and incorporate them into the final version. The code will be released later.

Model design: PET/CT systems utilized CT images to supply attenuation coefficients for tissues in standard PET attenuation correction (AC) process. Following this protocol, initial CT images were reused in this study for second PET AC, minimizing radiation risks from repeated scans in dual-time-point imaging. Given the time interval between two scans, we developed a spatiotemporal matching strategy to select the k-best-matched CT images for the second NAC PET image, which are anatomically consistent and could provide prior information for PET AC. We agree with the reviewer’s suggestion that incorporating both Top-k CT and corresponding PET images may potentially enhance model performance. This will be explored in our future work.

Matching Network Details: ResNet-19, a lightweight and robust architecture, enables to effectively encode PET-CT images by capturing local and global features through its residual blocks. Therefore, we selected it as the backbone for G1 and G2 in contrastive learning. Moreover, opening, closing and thresholding operations were employed to extract contour details from PET-CT images. Dice loss was applied to enhance geometric constraints and achieve coarse matching between modalities. The temperature coefficient in InfoNCE loss was set as 1. Further details will be provided in the revised version.

Condtional DDPM: Diffusion models excel in generative tasks by recovering target data from noise. Its training requires numerous real data samples (AC PET images) as targets. To retains the model’s ability to map between NAC and AC PET images, we introduced NAC PET images as conditional priors. Additionally, we also compared STMDiff with other end-to-end methods. The experimental results (Fig.3, Fig.4, Table.1) show that our approach outperforms these methods in both qualitative and quantitative analysis.

Condition integration: To effectively integrate conditions, we utilized two pretrained encoders: Gve, based on the VAE model from Stable Diffusion, and G1 in the matching stage. G1 maps CT images to the latent space to obtain representations zy, while Gve processes NAC PET images to generate latent representations znac. We concatenated zy, znac and the noise input, and fed the combined features into the UNet network, thereby achieving condition integration. Additionally, the comparative method DDPM also used this strategy to introduce condition priors (NAC PET).

Experimental Data: Our dataset comprised 104 male patients with prostate cancer, aged between 48 and 88 years, whose pathological status is unknown. 84 subjects (56,532 slices) were selected for training, 10 subjects (6,730 slices) for validation, and 10 subjects (6,730 slices) for testing. Despite the relatively limited number of patients, the dataset encompasses about 70,000 slices due to the total-body scanning. We think that this extensive slice count provides a basis for preliminary assessment of our model’s generalizability and reliability. More patient data will be included for further model validation.

Ablation Study: We have performed several ablation studies to assess our method. First, we compared model performance with and without spatiotemporal CT priors. Results showed that using matched CT priors improved synthesized AC PET image quality, increasing PSNR by 3.95%, SSIM by 1.00%, and decreasing RMSE by 5.78% compared to the baseline (Fig.5). Second, we explored the optimal number of CT priors (Top-k) in the diffusion model. Retraining with 1 to 5 CT priors revealed that k=2 achieved the best metrics (PSNR: 38.19 dB, SSIM: 0.97, RMSE: 2.25). We will update experimental results in the revised version.

Writing: We will check and proof our full text for writings.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    A potentially good contribution but need to be revised to clear the vagueness in the description of the method before can be accepted. A rebuttal is necessary.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top