Abstract

Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other mo-dalities, e.g., patients’ clinical tabular, resulting in compromised reconstruction with limited diagnostic utility. Moreover, they often overlook the semantic consistency between real SPET and reconstructed images, leading to distorted semantic contexts. To tackle these problems, we propose a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from multi-modal inputs, including LPET images and clinical tabular. Specifically, our MCAD incorporates a Multi-modal conditional Encoder (Mc-Encoder) to extract multi-modal features, followed by a conditional diffusion process to blend noise with multi-modal features and gradually map blended features to the target SPET images. To balance multi-modal inputs, the Mc-Encoder embeds Optimal Multi-modal Transport co-Attention (OMTA) to narrow the heterogeneity gap between image and tabular while capturing their interactions, providing sufficient guidance for reconstruction. In addition, to mitigate semantic distortions, we introduce the Multi-Modal Masked Text Reconstruction (M3TRec), which leverages semantic knowledge extracted from denoised PET images to restore the masked clinical tabular, thereby compelling the network to maintain accurate semantics during reconstruction. To expedite the diffusion process, we further introduce an adversarial diffusive network with a reduced number of diffusion steps. Experiments show that our method achieves the state-of-the-art performance both qualitatively and quantitatively.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1340_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1340_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Cui_MCAD_MICCAI2024,
        author = { Cui, Jiaqi and Zeng, Xinyi and Zeng, Pinxian and Liu, Bo and Wu, Xi and Zhou, Jiliu and Wang, Yan},
        title = { { MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript proposes a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) to reconstruct SPET images from LPET images and clinical tabular.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript has clear motivations. All limitations mentioned in the introduction are well-addressed. Experiments results are proven the effectiveness of the proposed methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    #1 The caption and explanation of Fig. 1 is not enough. For example, which part is pre-trained and which part is learnable? Which sub-figure is for training stage/inference stage?

    #2 The use of adversarial diffusive network is motivated by reducing the diffusion time step. However, there is not ablation studies to prove this statement.

    #3 No statistical test is provided.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A#1 Improve the Fig.1, especially the caption. #2 Ablation studies should consider the use of adversarial learning. #3 Statistical test should be conducted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    #1 Improve the Fig.1, especially the caption. #2 Ablation studies should consider the use of adversarial learning. #3 Statistical test should be conducted.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is quite good, but ablation studies are not enough.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces the utilization of textual information from the patient’s clinical tabular data as a condition to guide the diffusion model for PET reconstruction. Additionally, an M^3TRec strategy is introduced to maintain semantic consistency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work innovatively utilizes clinical tabular data as supplementary guidance for PET reconstruction.
    2. An M^3TRec strategy is introduced to ensure semantic consistency in the reconstruction process.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While claiming to be the pioneering endeavor in utilizing clinical tabular data to aid PET reconstruction, this paper overlooks a prior study [1] that similarly integrated tabular data for this purpose. The author should delineate the disparities between their approach and that presented in reference [1].
    2. The implementation details of the baseline models remain undisclosed. Given that certain baselines are tailored for 3D PET reconstruction, it remains unclear whether the author employed the original 3D models or re-engineered them into a 2D ones.
    3. Some important papers for comparison and citation are missing, as follows: [1] Jang S I, Lois C, Thibault E, et al. Taupetgen: Text-conditional tau pet image synthesis based on latent diffusion models[J]. arXiv preprint arXiv:2306.11984, 2023. [2] Yang Z, Zhou Y, Zhang H, et al. Drmc: A generalist model with dynamic routing for multi-center pet image synthesis[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023: 36-46. [3] Jang S I, Pan T, Li Y, et al. Spach transformer: spatial and channel-wise transformer based on local and global self-attentions for pet image denoising[J]. IEEE transactions on medical imaging, 2023. [4] Zhou Y, Yang Z, Zhang H, et al. 3D segmentation guided style-based generative adversarial networks for pet synthesis[J]. IEEE Transactions on Medical Imaging, 2022, 41(8): 2092-2104.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The algorithm of this article is easily reproducible, but it is preferable for the author to release the code to ensure the accuracy of the code details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In terms of writing and experimental completeness, this is a qualified MICCAI paper, but there are still some issues that require the author’s response and clarification.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper is attempting to improve reconstruction for low dose PET by integration of external information into the reconstruction process. Clinical tabular data is combined with the LPET data in a Multimodal Conditioned Adversarial Diffusion model. High level semantics are preserved by using multimodal masked text reconstruction. The model is tested on a public dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a novel Multi-modal Conditioned Adversarial Diffusion model (MCAD) which is designed to reconstruct images to look like standard dose PET from low-dose PET mages combined with other clinical data. The Multi-modal conditional Encoder extracts features from both imaging and non-imaging data, which helps in enhancing the quality of the reconstructed images. The inclusion of clinical tabular data is particularly innovative, providing a new way to incorporate additional relevant information into the reconstruction process. The introduction of Multi-Modal Masked Text Reconstruction is an interesting addition. It aims to preserve high-level semantic consistency during the reconstruction process, which allows the maintaining of the diagnostic utility of the images. This method uses semantic knowledge from denoised images to restore masked clinical tabular attributes, ensuring that the reconstructed images maintain accurate semantic contexts.The paper leverages adversarial learning combined with a diffusion process to expedite and enhance the reconstruction. This hybrid approach allows for a reduction in the number of diffusion steps while improving the reconstruction quality, addressing the common issue of over-smoothing found in other methods like GANs.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The use of adversarial and diffusion processes, combined with multi-modal data integration, likely results in a complex model that could be computationally intensive. This might limit its applicability in real-time or resource-constrained settings. While the paper demonstrates superior performance on a specific dataset (UDPET), it does not provide extensive validation on diverse datasets or under varying clinical conditions. This raises questions about the model’s robustness and generalization across different populations, scanner types, or varying levels of image quality. The performance of the MCAD model heavily relies on the availability of high-quality, multi-modal input data. In real-world scenarios, the quality and completeness of clinical tabular data might vary, which could affect the model’s performance. The paper does not address how it handles missing or incomplete data, which is a common issue in clinical settings.The use of deep learning models, especially those involving adversarial and diffusion components, often leads to challenges in interpretability. Understanding how changes in input data affect the reconstructed images is crucial for clinical acceptance. The paper does not address how clinicians can interpret the contributions of various model components to the reconstruction results.

    The problem with attempts to create standard dose images from low dose images is that the standard dose images contains physics information that is simply not present in the low dose image, so even if a model can make the output look like a standard PET, that is not the same thing as demonstrating that the output actually has the same clinical power as a standard dose PET. What sets this approach apart is the incorporation of clinical tabular data, so there is new information being added to the model that is creating the image. While it is believable that this approach can improve on the LPET image, it remains to be seen if it will neglect important clinical physics output that would be obtained from a real SPET image.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The model uses a publicly available dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is excellent and well organized, and the deep learning techniques are nicely demonstrated. I do have concerns about lost physics content when you go from SPET to LPET, and while the inclusion of patients’ tabular medical data can improve the LPET image to look more like an SPET, I would still like to see proof that relevant clinical information is not being lost in the process.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The inclusion of clinical tabular data is an interesting and novel advancement in the area of PET reconstruction. While I have some hesitation over the perceived clinical quality of the output, I am excited to learn more about the deep learning techniques used in the reconstruction.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers (R3, R4, R5) for their constructive comments, which have been carefully addressed as follows:

Q1: Disparities between our approach and reference [1]; Missing citations. (R3) A1: The disparities between our MCAD and TauPETGen proposed in [1] are two-fold. First, and most importantly, the aims of the two works are different. Specifically, TauPETGen aims to increase the availability of tau PET datasets via image synthesis. Accordingly, it utilizes textual descriptions and MR images as input to predict tau PET images. In contrast, our method focuses on reducing radiation exposure while ensuring the image quality of PET imaging. Therefore, we employ tabular and LPET images as input to generate SPET images. Second, the texts used in the two works are different. Concretely, TauPETGen uses short descriptions such as “generate a tau image” to describe the quality of the generation target, whereas our MCAD uses tabular to offer additional patient information that remains consistent for both LPET and SPET images. We will strengthen our novelty and cite these papers in the final paper.

Q2: Implementation details of the baseline model. (R3) A2: We would like to clarify that the SR3 blocks used in our model are inherently 2D. We only modified their channel dimensions in this paper.

Q3: Caption and explanation of Fig. 1. (R4) A3: We will provide a detailed explanation of each sub-figure in the final paper.

Q4: Robustness and generalization. (R5) A4: As a pioneering work, we investigate the feasibility and advantages of introducing clinical tabular data in PET reconstruction in a relatively ideal setting. We will address the robustness and generalization to complex real clinical scenarios (e.g., different populations, scanner types, varying levels of image quality, and incomplete data) as our future research direction in the final paper.

Q5: Other concerns raised by reviewers. (R4&R5) A5: Thank you for pointing out. Concerns including the validation of adversarial training, clinical quality of output images, and the statistical significance of the results will be addressed in the final paper.




Meta-Review

Meta-review not available, early accepted paper.



back to top