Abstract

Low-dose computed tomography (LDCT) and low-dose positron emission tomography (LDPET) imaging substantially reduce radiation exposure compared to their normal-dose counterparts, mitigating health risks such as elevated cancer incidence. However, the resulting LDCT and total-body LDPET images are often compromised by noise and artifacts stemming from photon starvation and electronic interference. While supervised reconstruction methods have tackled challenges like over-smoothing and training instability, their generalization is hindered by variations in imaging devices, dosage levels, and modality-specific characteristics. Recent advances in text-guided models have augmented traditional deep learning techniques, offering greater adaptability. Building on this, we propose a Text-guided Unified Framework (TUF) for high-precision reconstruction of LDCT and total-body LDPET images. Leveraging insights from cold diffusion paradigms, TUF introduces a novel mean-preserving degradation operator to model the physical process of image degradation. Additionally, we design a dual-domain fusion network that converts textual inputs into scaling and shifting factors, enabling seamless integration of text cues at each timestep. Extensive experiments across four publicly available datasets reveal that TUF surpasses state-of-the-art methods in both reconstruction quality and generalization across LDCT and total-body LDPET imaging scenarios. The code will be available at \href{https://github.com/AI-NMI/Text-Guided-Unified-Framework-for-Low-Dose-CT-and-Total-Body-PET-Reconstruction.git}{TUF-code}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1487_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/AI-NMI/Text-Guided-Unified-Framework-for-Low-Dose-CT-and-Total-Body-PET-Reconstruction.git

Link to the Dataset(s)

Mayo2016 dataset: https://www.aapm.org/grandchallenge/lowdosect/ Mayo2020 dataset: https://www.cancerimagingarchive.net/collection/ldct-and-projection-data/ Ui and Bern dataset: https://ultra-low-dose-pet.grand-challenge.org

BibTex

@InProceedings{WanWei_Towards_MICCAI2025,
        author = { Wang, Weitao and Huang, Yanyan and Dong, Shunjie and Xue, Le and Shi, Kuangyu and Fu, Yu},
        title = { { Towards Multi-Scenario Generalization: Text-Guided Unified Framework for Low-Dose CT and Total-Body PET Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {621 -- 630}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. Novel Reconstruction Framework for CT and PET The paper introduces a new approach for reconstructing high-quality CT and PET scans from their corresponding low-dose counterparts, addressing a critical need for safer and more effective imaging techniques.

    2. Application of Cold Diffusion in Medical Imaging It pioneers the use of the cold diffusion process for medical image reconstruction, demonstrating its potential in enhancing image quality while maintaining structural and semantic fidelity.

    3. Comprehensive Evaluation on Public Datasets The proposed method is rigorously evaluated on publicly available datasets, with results that substantiate its effectiveness and robustness compared to existing approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Innovative Use of Cold Diffusion The paper presents a novel application of the cold diffusion paradigm for image reconstruction, introducing a fresh perspective in the field.

    2. Dual-Domain Reconstruction Strategy The proposed Dual-Domain Reconstruction Network (DDRN) effectively leverages both the frequency and spatial domains, enabling more comprehensive feature extraction and improved reconstruction quality.

    3. Integration of Semantic Information By incorporating textual embeddings via CLIP into the reconstruction process, the model enhances its semantic understanding, leading to more context-aware and accurate reconstructions.

    4. Strong Empirical Performance Extensive experiments across multiple datasets demonstrate the superiority of the proposed method, consistently outperforming state-of-the-art (SOTA) approaches in key evaluation metrics.

    5. Insightful Ablation Study The ablation analysis offers clear evidence of the contribution of each component within the model, highlighting their collective importance in achieving state-of-the-art reconstruction performance.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Inconsistency Between Mathematical Formulations and Model Diagram The mathematical descriptions presented in the paper do not align clearly with the architecture illustrated in Figure 1, leading to a disconnect between the theoretical and visual representations of the model.

    2. Impact on Clarity and Comprehension This misalignment introduces ambiguity in understanding the overall methodology, making it difficult for readers to follow the flow of the proposed approach with confidence.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The paper is empirically sound and provides all the necessary experimental details. However, certain aspects of the methodology and presentation could benefit from improved clarity. Please refer to the recommendations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Undefined Acronyms in Figure 1
      The acronyms NDCT and NDPET are used in Figure 1 without being defined anywhere in the main text. While it’s assumed that “ND” refers to “Normal Dose,” it would be beneficial for clarity if these terms were explicitly introduced and defined in the manuscript.

    2. Typographical Error in Figure 1 The term “Condiction”* appears in Figure 1. It is unclear whether this is an intentional term or a typographical error, possibly meant to be *“Condition”. The authors should verify and correct this as needed.

    3. Lack of Clarity Regarding Decay of (\alpha_t) in Equation (1)
      In Equation (1), the term (\alpha_t) is introduced and described as decreasing over time. However, the paper does not specify the nature of this decay—whether it is linear, exponential, or follows some other schedule. Providing this detail is important to fully understand the behavior of the cold diffusion process over time steps.

    4. Ambiguity in MLP Representation in Figure 1 In Figure 1, the yellow boxes are labeled as MLPs. Visually, there appear to be two distinct MLP blocks. However, the same non-linear projection terms, (\gamma_m) and ( \beta_m), are shown for both. This raises confusion—are these two separate MLP blocks with shared parameters, or are they intended to be different MLPs with distinct projections? Clarifying this distinction is important for understanding the model’s design.

    5. Inconsistency Between Mathematical Formulation and Figure 1(a) Equation (7) introduces ( x_{frequency}), derived from Equation (4), which describes applying convolutions to the concatenated real and imaginary parts obtained via DFT from the low-dose image (x_t). However, in Figure 1(a), the low-dose image (x_t) is shown as being passed through the TIGA module prior to the convolution block. This leads to confusion—are the real and imaginary components computed from (x_t), or from (TIGA(x_t))? Additionally, if TIGA is applied first, how is its output integrated with the DDRN module? The paper should clarify this inconsistency.

    6. Lack of Clarity in Equation (9) Regarding Feature Fusion Equation (9) introduces (x_{input}), described as fused input features. However, the manuscript does not clearly explain how this fusion is performed or which features are being fused. A more detailed explanation of the fusion strategy and involved components would help readers better understand this step in the pipeline.

    7. Minor Suggestion – Ground Truth in Figure 2
      While not critical, it would improve the visual evaluation in Figure 2 if the ground truth high-dose CT and PET scans were also displayed. This would allow for a more direct and fair visual comparison between the reconstructed outputs and the actual targets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel framework for low-dose CT and/or low-dose PET reconstruction. It relies on several components, the base of which is a cold diffusion process. In this work, this diffusion process iteratively adds mean-preserving noise to the image to make it a noisy image (not white noise as classic diffusion models). It then learns to denoise it through a learnable parametrized process.

    During sampling, a refinement module relying on information at spatial level and frequency level is used. The spatial level module leverages Wavelet-Kolmogorov-Arnold Networks Unet (WKAN-Unet), a novel network design relying on Kolmogorov-Arnold Networks (KAN), which has recently been proposed as an alternative for the classic multilayer perceptron. Textual information (dose or experiment related information) is embedded (frozen CLIP embedding) and the TIGA modules leverage information from this embedding to enhance the image estimation.

    This new framework is validated on several open-access low-dose CT and low-dose PET datasets, against several other reconstruction methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Contributions

    The TUF framework relies on cross-modal guidance from both image and text. Furthermore, image information is exploited at both frequency and spatial domains within the dual-domain reconstruction network (DDRN). The authors have proposed a novel architecture relying on Wavelet-Kolmogorov-Arnold Networks (WKAN) named WKAN-UNet for spatial domain image refinement. It integrates WKAN in a UNet skip connections. KAN have recently been introduced as altenatives to the traditional neural networks. They rely on learnable activation functions on edges, summed in nodes, rather than fixed activation functions in nodes, making them more explainable. WKAN combine wavelets and KANs.

    Experiments

    Extensive validation was carried out against several other state-of-the-art methods. The authors have also tested two schemes, all-in-one experiments and single-task experiments. The results highlights that the proposed framework surpasses current state-of-the-art methods in both schemes. Finally, the results should be fairly easily reproducible as the code will be available and open-access datasets are used.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Title:

    The paper mentions image reconstruction but it looks more like denoising to me. The authors should consider changing the title of the paper: “reconstruction” —> “denoising”

    Section 2

    Additional explanation on the framework works as a whole would be a plus (basically describing figure 1). For instance, an explanation of the presence of a TIGA module before and after the dual-stream update in the DDRN would be nice. Also, would it be possible to get more details about the kind of textual information passed to the TIGA module? Is it only dose level and/or device name/manufacturer? Further details and examples would be an interesting addition.

    In figure 1 (b), what is SinPE(t)? It does not seem to be explained in the paper.

    A quick explanation of sampling in section 2.1 (cold diffusion sampling process) would be useful since the whole framework relies on it. More particularly, it is unclear why the diffusion preserves the mean as E[x_T] is not given

    Section 3

    Please add metric in tables 1, 2, 4 for tables to be standalone. In Figure 2, would it be possible to add the reference/’ground truth’ images for comparison purposes?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Possible error: in figure 1 (b), could there be an error with the arrows going from SinPE(t) to the MLP module in the encoding path? There seems to be either an extra or a missing arrow.

    A reference to a paper describing WKAN and their background would be useful.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors proposed a novel framework which demonstrates state-of-the-art abilities in low-dose CT/PET reconstruction/refinement. Within the framework, they proposed a new architecture named WKAN-UNet which uses WKANs in skip connections. WKAN-Unets could be used in other image enhancement/reconstruction frameworks.

    The proposed framework is thoroughly validated, and the ablation study highlights the importance of each component.

    The whole paper is well structured, nice to read through. Components of the framework are well explained, though further details on cold diffusion sampling process and the framework’s architecture would be a great addition for a clearer understanding.

    Finally, this work aims to be reproducible with soon available source code and datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    Paper propose Text guided framework (TUF) for image reconstruction. It consist of 3 major block, CDSP, DDRN and TIGA.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Paper propose text guided image restoration for CT and PET images which is a novel application. Paper is well structured and is easy to follow. Paper shows extensive evaluation and ablation.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Questions 1) Can you use CLIP + Bert (come variant with medical expertise) like model for textual features extraction? Relying on CLIP might not be the good option 1. 2) Can you give information about that type of textual information is provided? It would have been interesting to co-related the textual input to the restored image. 3) Any specific reason of using W-KAN? Did you try any other method like 2? 4) Do other baselines like Restormer and PromptIR was provided with the same text information? 3 can also be looked.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper shows a novel application of the method for CT and PET images, backed with promising results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the Reviewers for their detailed feedback. We will address the main concerns by clarifying misunderstandings and adding more details to the manuscript.

Primarily, several points concern clarity in Figure 1 and related descriptions (R1.1, R1.2, R1.4, R1.5; R2). We will define the acronyms NDCT/NDPET. The term “Condiction” in Figure 1 is a typographical error and will be corrected to “Condition”. The two MLPs depicted are indeed distinct with separate parameters, not shared. The input to the convolution block following DFT (Eq. 7, Fig. 1a) is derived from $TIGA(x_t)$, not $x_t$ directly.

Regarding methodological details: The decay of $\alpha_t$ in Equation (1) is linear (R1.3). The term $x_{input}$ in Equation (9) refers to “input features” rather than “fused input features,” and this will be corrected for accuracy (R1.6). For the cold diffusion sampling process (R2), the mean preservation $E[x_t] = \alpha_t E[x_0] + (1 - \alpha_t) E[x_T]$ relies on the assumption $E[x_T] \approx E[x_0]$, ensuring $\alpha_t + (1-\alpha_t) = 1$. This assumption has been proven useful in other cold diffusion models.

Concerning textual information (R2, R3.1, R3.2, R3.4): The textual information provided to TIGA includes dose level, device manufacturer/name details, and the type of denoising (CT or PET), which will be clearly stated. While using advanced medical-domain text models like CLIP+BERT (R3.1) is an excellent suggestion for future work, our current study did not explore this. For baselines such as Restormer and PromptIR (R3.4), they were evaluated based on their original architectures, which do not typically incorporate textual prompts in the same manner as our TIGA module; this distinction will be clarified.

Regarding the choice of W-KAN (R3.3): W-KAN was selected due to its superior performance and lower parameter count compared to other KAN variants in our specific task, which is supported by ablation experiments presented in the manuscript.

We will consider the suggestion to change the paper’s title from “reconstruction” to “denoising” to better reflect the method’s focus (R2). We also appreciate the suggestions to add ground truth images to Figure 2 and explicit metric labels in tables for enhanced clarity and standalone understanding (R1.7, R2). We will ensure the manuscript is as clear and self-contained as possible in these aspects within the existing submission’s framework.

We believe these clarifications directly address the reviewers’ main concerns without fundamentally altering the submitted work or promising new results.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    All three reviewers appreciated the model formulation and comprehensive evaluation. With that said, they noted a few aspects of the paper that require clarification. The authors should make these updates in the camera-ready version. The additional experiments suggested by Reviewer 3 should NOT be included, as these fall beyond the scope of allowable changes post-rebuttal.



back to top