Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

To reduce radiation exposure and improve the diagnostic efficacy of low-dose computed tomography (LDCT), numerous deep learning-based denoising methods have been developed to mitigate noise and artifacts. However, most of these approaches ignore the anatomical semantics of human tissues, which may potentially result in suboptimal denoising outcomes. To address this problem, we propose ALDEN, an anatomy-aware LDCT denoising method that integrates semantic features of pretrained vision models (PVMs) with adversarial and contrastive learning. Specifically, we introduce an anatomy-aware discriminator that dynamically fuses hierarchical semantic features from reference normal-dose CT (NDCT) via cross-attention mechanisms, enabling tissue-specific realism evaluation in the discriminator. In addition, we propose a semantic-guided contrastive learning module that enforces anatomical consistency by contrasting PVM-derived features from LDCT, denoised CT and NDCT, preserving tissue-specific patterns through positive pairs and suppressing artifacts via dual negative pairs. Extensive experiments conducted on two LDCT denoising datasets reveal that ALDEN achieves the state-of-the-art performance, offering superior anatomy preservation and substantially reducing over-smoothing issue of previous work. Further validation on a downstream multi-organ segmentation task (encompassing 117 anatomical structures) affirms the model’s ability to maintain anatomical awareness.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2266_paper.pdf

SharedIt Link: https://rdcu.be/eHwMN

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_2

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanRun_AnatomyAware_MICCAI2025,
        author = { Wang, Runze AND Chen, Zeli AND Song, Zhiyun AND Fang, Wei AND Zhang, Jiajin AND Tu, Danyang AND Tang, Yuxing AND Xu, Minfeng AND Ye, Xianghua AND Lu, Le AND Jin, Dakai},
        title = { { Anatomy-Aware Low-Dose CT Denoising via Pretrained Vision Models and Semantic-Guided Contrastive Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {13 -- 23}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces an anatomy aware image-to-image translation method for translating from low-dose CT (LDCT) to normal-dose CT (NDCT) images using pretrained vision models (PVMs).
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Anatomy-Aware Discriminator (AAD) uses the hierachical semantic features of the real NDCT extracted from PVMs to distinguish between synthetic NDCT and the real NDCT. For utilizing a prior for conditional distributions, an Attention-based Feature Fusion (AFF) module is proposed, which performs an attention mechanism between semantic features extracted from the discriminator and semantic features extracted from PVMs.
2. Semantic-guided Contrastive Learning (SCL) is proposed to enhance anatomical consistency and mitigate noise oversmoothing effects. The authors design dual negative sampling strategy that contrasts the noise between the synthetic NDCT and the LDCT and the discrepancy between the synthetic NDCT and the real NDCT.
3. In experiments and results, the downstream multi-organ segmentation task is a good experiment to check whether the performance of segmentation can be affected when using the synthesized NDCT.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Although the designed concept is novel, it is questionable whether the proposed method solves the problems the authors seek to address.
1. The qualitative results demonstrate that that proposed method effectively addresses the oversmoothing effect and enhances anatomy-aware semantics compared to other methods. However, the quantitative scores show that there is only a slight improvement. Furthermore, this slight performance improvement is not effective, considering the consumed memeory resources.
2. I have concerns about the design of the Dual Negative Sampling Strategy in SCL. The authors define the InfoNCE loss for structure consistency and noise emphasis. However, location term would be hampered by noise emphasis term. This is revealed in the results when only SCL was used in the ablation study. Compared to the baseline, there is a clear performance improvement, but compared to the results when only AAD was used, it shows lower performance except in LPIPS. This is not convincing as a proposed method to address the anatomical consistency claimed by the authors.
3. The proposed method is conducted with two PVMs using DINOv2 and MedSAM, respectively. I think MedSAM would be more effective than DINOv2 since it has representation of medical images, but the overall score is better when DINOv2 is used. Please clarify the discussion about this. Also, in downstream multi-organ segmentation task, MedSAM is considered more appropriate for experimental design than DINOv2.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method proposed to solve problems arising in the CT denoising task is novel, however, the design of the proposed method, experimental design, and results are quenstionable.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors answered my concerns sincerely and provided sufficient reasons for their answers to make them more persuasive. After considering the feedbacks about other reviewers’ comment, I finally changed my mind to accept.

Review #2

Please describe the contribution of the paper

The paper proposes ALDEN, a low-dose CT denoising framework that integrates pretrained vision models (PVMs) into a GAN architecture with an anatomy-aware discriminator and semantic-guided contrastive learning to enhance anatomical fidelity and reduce over-smoothing.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Introduces anatomy-aware denoising by leveraging pretrained vision models. 2) Combines adversarial and contrastive learning to improve texture and structural preservation. 3) Demonstrates strong performance on both denoising and downstream segmentation tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1) Use of MedSAM Is Inconsistent: MedSAM is a segmentation foundation model expecting prompt-based inputs (like points or boxes). The paper uses it only as a feature extractor (like a ResNet), which doesn’t align with its design and may result in semantically shallow features.

2) The paper states that low-, mid-, and high-level features are extracted from PVMs (e.g., MedSAM/DINOv2) for cross-attention in the discriminator, but does not specify which layers or resolutions these features correspond to. Furthermore, it is unclear how hierarchical features are obtained from models like MedSAM, which are not designed for multi-scale feature extraction in the same way as typical CNNs.

3) Section 2.3 introduces symbols K, k, and m when defining contrastive sampling and the InfoNCE loss, but their meanings are not explicitly defined, also, the paper does not explain how the feature channel depth C is determined or whether it differs across PVMs (e.g., DINOv2 vs. MedSAM), which affects contrastive learning effectiveness and reproducibility.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The PVMs part is not clearly explained.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The application of low-dose CT denoising algorithms improves the effectiveness of downstream segmentation tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors’ approach of incorporating pretrained models into the discriminator is expected to be effective.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The definition of the three hierarchical semantic features (al, am, ah) is insufficiently explained.The paper does not clarify what specific anatomical or semantic information each level (low, middle, high) represents, nor how they are derived from the pretrained vision models (PVMs).
2. The role of the Anatomy-Aware Discriminator (AAD) is questionable.The authors claim the AAD leverages anatomical semantics, but its design appears to combine a standard discriminator with pretrained model features. It remains unclear how the accuracy of the generated semantic information is validated (e.g., whether it aligns with ground-truth anatomical structures or clinical annotations).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The model is theoretically feasible, but some details are not clearly explained.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their thoughtful comments and organize the major questions (Q) with our answer (A) below: Q1 (R1&3): Hierarchical feature extraction details. A: Both MedSAM and DINOv2 use the ViT-base architecture with 12 transformer blocks. Although ViT doesn’t explicitly change scales, features from different layers are analogous to CNNs: early layers focus on low-level features, while later ones concentrate on high-level semantics (refer to ViT paper by Dosovitskiy et al., ICLR2021). We use outputs from the 4th, 8th, and 12th transformer blocks as low-, mid-, and high-level features for cross-attention in the discriminator. Q2 (R1): Inconsistent use of MedSAM. A: We utilize PVMs as feature extractors to 1) generate semantic features from NDCT to guide the discriminator to focus on semantically relevant texture distributions; 2) ensure anatomical consistency between denoised CT and reference NDCT. As MedSAM is pretrained with masked auto-encoder (MAE) on diverse datasets, capturing underlying structures and semantics, it meets our method’s requirements for PVMs. Q3 (R1): Symbol definitions. A: K refers to sampled positive pair feature points per image, with indices k = 1 to K. M denotes spatially discordant regions, with indices m = 1 to M. Using ViT-base architecture, MedSAM and DINOv2 achieve an embedding dimension of 768, corresponding to feature channel depth C. Q4 (R2): Slight improvement. A: Our method primarily aims to mitigate oversmoothing and preserve anatomical details, with its perceptual effectiveness mainly demonstrated through the LPIPS (perceptual metric). Notably, ALDEN-DINOv2 achieves a 44.1-59.7% reduction in LPIPS against CoreDiff and ASCON on the GBA dataset, demonstrating its ability to balance performance by significantly lowering LPIPS and maintaining leading fidelity metrics (PSNR, SSIM, and RMSE). In addition, we highlight downstream task evaluations where our method outperforms the second-best method, CoreDiff, by an average DSC improvement of 1.07% across 117 anatomical structures in high-noise scenarios. These two key quantitative results demonstrate the superiority of the proposed method over the alternatives. Q5 (R2): Location term hampered by noise emphasis term in SCL loss. A: First, we examined this by removing the noise emphasis term from the SCL loss, resulting in lower performance: 33.18 PSNR, 0.9314 SSIM, 8.92 RMSE, and 0.0142 LPIPS —indicating the term enhances performance by supporting, not hindering, the location term. Second, AAD and SCL are independent modules improving the performance from different perspective: AAD enables fine-grained, semantic-aware LDCT denoising, while SCL maintains anatomical consistency and reduces noise and artifacts. They can improve the performance separately and combining together leads to the best balanced performance. Q6 (R2): DINOv2’s Superiority over MedSAM. A: We believe DINOv2 outperforms MedSAM for several reasons. Firstly, DINOv2 is trained on a significantly larger dataset of 1.2 billion images, compared to MedSAM’s 1.5 million. Secondly, MedSAM utilizes SAM with MAE pretraining, while DINOv2 shows considerably higher performance than MAE. Hence, although MedSAM is fine-tuned on medical images with domain-specific knowledge, it might be inferior to DINOv2 in feature generalizability. We additionally conducted downstream segmentation experiments with ALDEN-MedSAM, achieving DSC of 89.07% and 80.43% at low/high noise levels, surpassing other methods but slightly behind ALDEN-DINOv2 (89.2% and 81.06%). Q7 (R3): Role of the Anatomy-Aware Discriminator (AAD). A: AAD uses PVMs for semantic feature extraction, utilizing their high capacity for general-purpose visual features applicable across distributions and tasks without finetuning. While direct evaluation of PVM semantic accuracy is challenging, ablation studies, qualitative results, and downstream tasks confirm AAD’s enhancement of anatomy preservation.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper introduces ALDEN, an anatomy-aware low-dose CT denoising framework that integrates pretrained vision models (PVMs) into a GAN-based architecture using a novel discriminator and semantic-guided contrastive learning. All three reviewers recommended acceptance after rebuttal, citing the method’s innovation, empirical validation, and clinical relevance.
- Reviewer #1 appreciated the novel integration of pretrained models into the adversarial and contrastive learning pipeline, with solid performance in denoising and downstream segmentation. However, they noted unclear details around PVM usage, particularly with MedSAM, and suggested better specification of feature representations. These issues were not seen as blocking, and they recommended weak accept.
- Reviewer #2 highlighted the contributions of the Anatomy-Aware Discriminator and the dual negative sampling strategy in contrastive learning. Although they initially questioned the clarity and effectiveness of the design and its computational tradeoffs, the rebuttal successfully addressed their concerns, leading to a final recommendation of acceptance.
- Reviewer #3 found the overall direction sound and noted the effective use of pretrained features in the discriminator. While they raised concerns about insufficient explanation of hierarchical features and validation of anatomical fidelity, they ultimately viewed the method as feasible and impactful, supporting a weak accept.
Summary: Despite some concerns over technical clarity and justification of specific design choices, the reviewers agreed that the framework is novel, the evaluation is comprehensive, and the results demonstrate practical value. The rebuttal addressed key concerns sufficiently, justifying acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All three reviewers are inclined to accept the paper, and I agree with their recommendation.

back to top

Anatomy-Aware Low-Dose CT Denoising via Pretrained Vision Models and Semantic-Guided Contrastive Learning

Author(s):