Abstract

Artifacts and noise in low-dose CT images can degrade image quality, potentially hindering accurate diagnosis. In recent years, image-domain post-processing denoising methods have gained flexibility by eliminating the need for raw data. However, clinical scanning conditions vary widely, with most existing studies focusing on CT denoising under fixed or known conditions. Moreover, obtaining paired CT data in clinical settings is challenging, limiting the practical applicability of supervised learning methods. To address these challenges, we propose the self-supervised VQ-SCD, capable of denoising low-dose CT (LDCT) images under varying unknown scanning conditions using only normal-dose CT (NDCT) training data. For the first time, VQ-SCD uses a discretized codebook to approximate the distribution of LDCT features across various scanning conditions, enabling uniform characterization and denoising of data from multiple scanning setups. Additionally, we design a miniature diffusion model that uses up-sampled features as guidance to enhance image details. Our method outperforms both supervised and state-of-the-art self-supervised methods in terms of both quantitative metrics and visual quality, with a test time of only 0.25 seconds per image. Furthermore, training the model using only animal and phantom data still results in excellent denoising performance on human data. The code will be available at https://github.com/WHUSU/VQSCD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0516_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SuBo_VQSCD_MICCAI2025,
        author = { Su, Bo and Xu, Jiabo and Hu, Xiangyun and Deng, Kai and Li, Jiancheng and Lu, Zhouxian},
        title = { { VQ-SCD: Vector Quantization Meets Unknown Scan Condition Self-supervised Low-Dose CT Denoising } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {681 -- 691}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The main contribution of the paper is a self-supervised denoising framework, VQ-SCD, that trains exclusively on normal-dose CT (NDCT) images and can generalize to low-dose CT (LDCT) images acquired under unknown and heterogeneous scan conditions, without requiring paired LDCT–NDCT training data. The method combines a vector quantization module, which discretizes feature representations to unify domain variations, with a lightweight diffusion model that restores fine image details. This architecture enables strong zero-shot generalization across different doses, slice thicknesses, and scanner types.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Training with Only NDCT Data: The method requires only normal-dose CT images for training, avoiding the need for paired LDCT–NDCT data. This makes it highly practical for clinical use.

    Novel Use of Vector Quantization: The use of VQ to unify features across varying scan conditions (dose, thickness, device) is original and enables strong generalization.

    Efficient Diffusion Model: A lightweight diffusion decoder preserves fine image details with fast inference (0.25s per image), balancing quality and speed.

    Zero-Shot Generalization: As claimed by the authors the method generalizes from animal/phantom training data to human LDCT scans, showing robustness to domain shifts.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are not enough experiments to validate the claim of excellent performance on real data. All comparisons with other algorithms are conducted only on the piglet and phantom data. However, the only comparison that simulates a real-world scenario using actual scans is shown only for the proposed algorithm, without presenting the performance of other denoising algorithms. This is a major weakness of the paper.

    The results on real data could be better validated through quantitative metrics such as CNR or SNR.

    Moreover, the paper lacks an ablation study that directly assesses the contribution of the VQ component. For instance, what would happen if the encoder’s output were passed directly to the diffusion model’s decoder, bypassing the VQ step? This is a crucial ablation that should be included.

    An additional missing ablation concerns the loss functions. The method employs several loss terms, but the paper does not include any analysis demonstrating the contribution of each one.

    In addition, the visual results do not highlight clinically relevant regions, such as lesions. Including such regions is essential, as the denoising process could potentially remove or distort important pathological features, leading to misleading or clinically unsafe outcomes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and proposes a novel method. However, some key claims—especially about generalization to real clinical data—are not properly supported. The most critical issue is the lack of comparison with other methods on real human data. Since the main goal of the paper is to show good generalization to real-world scenarios, it’s essential to prove that the method performs at least as well as, if not better than, existing approaches in those settings. Without this, the practical value of the work remains unclear.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    A major limitation remains the absence of comparisons with other algorithms on real data. Without such evaluations, it is difficult to assess the algorithm’s ability to generalize effectively to practical scenarios.



Review #2

  • Please describe the contribution of the paper

    The authors proposed a self-supervised denoising framework VQ-SCD, which incorporates a self-supervised vector-quantization-based CT image encoder and a miniature diffusion model for low-dose CT denoising. The method is computationally efficient and achieves state-of-the-art denoising performance, including zero-shot generalization to human data, despite being trained solely on animal and phantom NDCT scans.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Training without paired LDCT data under unknown scan conditions is practical and clinically meaningful.
    • Both quantitative and qualitative experimental results are promising, showing the superiority of the proposed method.
    • The miniature diffusion model, despite lacking some details, balances clinical real-time requirment and fidelity.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • What is the motivation of using VQ in the proposed framework, rather than a continuous latent space (which preserves much more information than the discrete one)? The authors state that VQ “unifies heterogeneous scan features across conditions”, which is unclear, and there is no quantitative analysis or qualitative evidence provided to support this.
    • Details on the proposed miniature diffusion model are missing. How “miniature”, specifically? How is the diffusion model interact with the ViT decoder? Is it initialized using the self-supervised ViT with further finetuning? Some notations in Fig. 1 are not well explained in the manuscript (e.g., the green block with a plus sign).
    • While VQ is highlighted in the proposed framework, the authors only compared “w/ and w/o diffusion model”, without assessing if VQ alone brings value.
    • How much does the data augmentation help in image encoding? The ablation study on this seems lacking.
    • Basic statistics of the datasets used in this study is missing, e.g., the number of images in the training/validation/test sets.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The followings are minor weaknesses:

    • It will benefit the paper if the authors provide a clearer architecture flow on the proposed framework.
    • Fig. 3 can be confusing to the readers. There are repeated (a), (b), and (c) indexes for multiple sub-figures, yet they do not share the same meaning.
    • The sigma in Eq. (1) should be denoted by $\Sigma_\theta$, but it now looks like written in $sum_\theta$, confusing with the sum symbol.
    • Grammartical issues. For example, in the “Performance Comparison on Phantom Dataset” part, “Noise2Sim exhibit” -> “Noise2Sim exhibited”, “ZeroN2N demonstrates” -> “ZeroN2N demonstrated”. In the “Ablation Study” part, “…the third column shows the denoising result with the model is shown to the right of the red line in Fig. 3.”
    • I suggest unify the terminologies. For example, “miniature diffusion model” and “microdiffusion” as in the “Ablation Study” part.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses a meaningful problem and demonstrates promising empirical results. However, some important concerns remain, e.g., the lack of clarity on the design choice and architecture details, and absence of sound theoretical or ablation support, making its contributions seem incremental. Rebuttals and revisions are required.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal clearly addresses my confusion on the motivation of using VQ in the proposed framework. And the authors promise that they will improve the presentation. Here are some further comments. (1) Besides the mentioned “Minor Weaknesses”, the manuscript can be further enhanced with a cleaner and clearer Fig. 1, which helps readers grasp the idea at one glance. I also suggest making the text bigger and bolder Fig. 2. (3) While I understand that new experimental results are not allowed in the rebuttal session, I suggest further exploration on the contribution of the proposed VQ alone.

    Anyway, using pretrained VQ codebook as priors for downstream medical tasks is a novel and interesting direction. Once with a VQ-VAE-like foundation model in the medical field, we might be able to build a generalizable model across heterogeneous clinical scenarios. Therefore, I recommend accepting this paper.



Review #3

  • Please describe the contribution of the paper

    This manuscript proposes VQ-SCD, a novel self-supervised framework for denoising low-dose CT (LDCT) images under various unknown scanning conditions. The authors evaluated VQ-SCD on multiple datasets. Results show that VQ-SCD outperforms both supervised methods (REDCNN, CTformer, FDDiff) and self-supervised approaches (Noise2Sim, ZeroN2N) in terms of PSNR, SSIM, and visual quality assessment.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Self-supervised learning method. The method proposed by the manuscript is a novel self-supervised learning method that holds unique practical value in real-world applications. 2) Thorough comparison. The authors carried out quite comprehensive comparison to evaluate the performance of the proposed method. 3) computational efficiency compared to diffusion-based method. The proposed method yields good computational efficiency.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) It seems the theoretical justification of the proposed method is missing, i.e. why is this specific discretized representation can more efficiently denoise CT images? 2) The data generation section is not clear to me (Section 2.1). The authors seems to purposely distinguish between Gaussian noise insertion and project-domain noise insertion, which confused me. I understand sometimes we inserted Gaussian noise to emulate perturbation in diffusion model training. What is the theoretical explanation for Gaussian noise insertion here? What is the essential difference compared to the project-domain noise insertion?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, it is a quite comprehensive study that demonstrates a novel method for the real-world challenge (low-dose CT denoising).

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The novelty of the manuscript is generally recognized by the reviewers. The method is preliminarily validated, which is a reasonable demonstration for future readers. And also more comprehensive experiments are suggested and required but probably the authors can add them in the future.




Author Feedback

We sincerely thank the reviewers for their valuable comments and suggestions. They acknowledged the novelty of our work (R1, R2, R3), its efficient real-time performance (R1, R2, R3), the comprehensiveness of our comparative experiments (R1), the superiority of our experimental results (R3), and the clinical relevance of the problem addressed (R1, R2, R3). We address their comments in detail as follows. 1.VQ Discussion (R1,R3):The primary motivation for using VQ is not to significantly enhance denoising performance over existing methods, but rather to improve the model’s generalization across heterogeneous scanning conditions. By employing VQ, continuous features are mapped into a finite set of discrete vectors (codebook), forcing features from different scan settings into a unified semantic space, enabling the model to operate consistently under varying acquisition protocols. Regarding denoising, our advantage arises from the constraint imposed by the limited codebook size in the discrete space, which encourages the model to encode image structures in a compact and consistent manner, facilitating clearer separation between noise and meaningful structures. 2.Ablation Study Discussion (R2, R3):The Lcommit and Lcharbonnier losses are inherited from VQ-VAE and are retained in our method. We only added Ldiff from the diffusion model and LMSE to mitigate feature perturbation, both of which are essential to our framework and thus were not individually ablated. The use of VQ for improving generalization is a core novelty of our method; removing it would make our approach closely resemble previous diffusion-based methods such as FDDiff. Notably, our model outperforms FDDiff under multiple scan conditions, indirectly supporting the contribution of VQ to generalization. We appreciate the suggestion and will include more comprehensive ablation studies in future work. 3.Real Data Discussion (R2):We selected representative regions with varying densities from real patient data to comprehensively assess denoising performance. The low-density regions, such as the gallbladder, often exhibit thickened and heterogeneous walls, serving as a good test of the model’s ability to preserve subtle pathological details. We thank the reviewer for the suggestion and plan to include more lesion areas and additional metrics in future work. Notably, comparative methods such as FDDiff and CTformer require paired training data, which makes direct testing on unseen real data unfair, and these methods lack the capacity for cross-device generalization. In contrast, our model was trained solely on phantom and animal data and tested directly on human data, showcasing a unique strength. Hence, we provided only our method’s visualization results for fairness. 4.Data Augmentation (R1, R3):Gaussian noise is added during training to encourage the encoder to extract consistent features from perturbed and unperturbed inputs. As these features are later vector-quantized, their residual differences are further reduced, significantly improving cross-domain generalization. In contrast, omitting perturbation results in greater feature discrepancies between domains, which persist post-quantization and hinder generalization. Although other noise types (e.g., Poisson) could serve a similar role, Gaussian noise proved to be the most practical and effective in our experiments. We will include this observation in future discussions. 5.Improve Expression (R3):We will supplement the description of the micro diffusion model to ensure that each component is clearly and thoroughly explained. Subfigure indices will be adjusted and symbol meanings clarified in the structural diagrams. We will also correct grammatical and formula-related issues, unify terminology, and improve dataset descriptions accordingly.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While this paper presents a noval approach for low-dose CT denoising, the validation is not strong enough as concerned by reviewer.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although the feedback has been mixed, the approach offers a fresh perspective and is quite innovative. I recommend accepting it for further consideration



back to top