Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Metal artifacts, caused by high-density metallic implants in computed tomography (CT) imaging, severely degrade image quality, complicating diagnosis and treatment planning. While existing deep learning algorithms have achieved notable success in Metal Artifact Reduction (MAR), they often struggle to suppress artifacts while preserving structural details. To address this challenge, we propose FIND-Net (Fourier-Integrated Network with Dictionary Kernels), a novel MAR framework that integrates frequency and spatial domain processing to achieve superior artifact suppression and structural preservation. FIND-Net incorporates Fast Fourier Convolution (FFC) layers and trainable Gaussian filtering, treating MAR as a hybrid task operating in both spatial and frequency domains. This approach enhances global contextual understanding and frequency selectivity, effectively reducing artifacts while maintaining anatomical structures. Experiments on synthetic datasets show that FIND-Net achieves statistically significant improvements over state-of-the-art MAR methods, with a 3.07% MAE reduction, 0.18% SSIM increase, and 0.90% PSNR improvement, confirming robustness across varying artifact complexities. Furthermore, evaluations on real-world clinical CT scans confirm FIND-Net’s ability to minimize modifications to clean anatomical regions while effectively suppressing metal-induced distortions. These findings highlight FIND-Net’s potential for advancing MAR performance, offering superior structural preservation and improved clinical applicability. Code is available at: https://github.com/Farid-Tasharofi/FIND-Net.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1842_paper.pdf

SharedIt Link: https://rdcu.be/eHw8u

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_19

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Farid-Tasharofi/FIND-Net

Link to the Dataset(s)

AAPM CT-MAR Grand Challenge dataset: https://rpi.app.box.com/s/7p8tkqj5ewhtdad2h8kx975i9qg6b7a4 Dataset details and instructions: https://github.com/xcist/example/tree/main/AAPM_datachallenge

BibTex

@InProceedings{TasFar_FINDNet_MICCAI2025,
        author = { Tasharofi, Farid AND Fan, Fuxin AND Qahqaie, Melika AND Thies, Mareike AND Maier, Andreas},
        title = { { FIND-Net – Fourier-Integrated Network with Dictionary Kernels for Metal Artifact Reduction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {192 -- 201}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper introduces FIND-Net, a new metal artifact reduction framework that integrates frequency-domain processing with spatial-domain techniques. Its key innovation lies in combining Fast Fourier Convolution with a trainable Gaussian filtering module within a dictionary kernel framework. This hybrid approach enables the network to capture long-range dependencies via global frequency information while preserving local anatomical details through spatial processing.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novel Formulation: The integration of FFC and trainable Gaussian filtering within a dictionary kernel framework is an innovative approach that bridges the gap between spatial and frequency domains. This allows the network to capture both local details and global dependencies.

Strong Empirical Results: Experiments on synthetic and clinical CT datasets (AAPM CT-MAR challenge) show statistical improvements in MAE, SSIM, and PSNR over state-of-the-art methods (e.g., DICDNet, OSCNet).

Clinical Feasibility: The method demonstrates robust artifact suppression while preserving anatomical structures, which is crucial for clinical applications.

Comprehensive Evaluation: The paper includes extensive ablation studies and quantitative comparisons, which support the claimed advantages of the proposed approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Increased Inference Time: The introduction of frequency-domain operations results in a higher inference time (e.g., 1.07 s per image versus 0.15 s for simpler baselines), potentially limiting real-time application.

Loss of Image Details: In Fig. 3, the area below the red mask in the center, where a strong signal is present, exhibits a significant loss of image details. Although this issue is common across all methods, can an explanation be provided? Could it be that the methods are struggling to recover details in regions with particularly intense artifacts?

Insufficient Novelty in Some Components: While the integration is novel, many individual components (e.g., FFC, dictionary kernels) have been previously explored in related works. The added value primarily comes from the specific combination.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper offers a well-motivated and innovative framework that integrates frequency-domain processing (via FFC and trainable Gaussian filtering) with spatial-domain dictionary kernels to address the metal artifact reduction problem. The proposed hybrid approach captures both global spectral features and local spatial priors, and it demonstrates statistical improvements in MAE, SSIM, and PSNR over state-of-the-art MAR methods. However, the increased model complexity and longer inference time raise concerns regarding its practical deployment, particularly in resource-limited or real-time settings. Moreover, the method shows challenges in correctly distinguishing between high-signal regions and regions with strong artifacts, which could affect the overall image quality in critical areas. Although the method achieves reduced computational cost via FFT-based optimization and shows strong potential for clinical applicability by preserving anatomical structures while effectively suppressing metal-induced distortions, further refinements are necessary to fully address these efficiency and discrimination challenges. Overall, while the strengths of the method are notable and its robustness is demonstrated, these practical concerns warrant a weak accept recommendation, pending satisfactory responses to these issues.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors provided satisfactory responses to my concerns, which ultimately led to a decision for acceptance.

Review #2

Please describe the contribution of the paper

This paper addresses a key MAR challenge, aiming to solve the problem of standard CNNs’ limited receptive fields in handling global artifacts. FIND - Net innovatively integrates frequency domain processing by incorporating GFFC into DICDNet. The paper shows strong results, with significant improvements in MAE, SSIM, and PSNR compared to state - of - the - art methods on the AAPM CT - MAR dataset, along with better visual effects.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The manuscript states that the channel proportion α allocated to the global (frequency) branch increases from 0.0 in the initial stage to 0.8 in later stages. Why was this specific range and trend (from 0.0 to 0.8) chosen? The paper does not provide justification for selecting this strategy.
- The manuscript introduces a trainable Gaussian filter G_σ,c, where the center frequency c and bandwidth σ are learnable. Firstly, are σ and c shared across all channels in the GFFC block, or does each channel have independent σ and c? Or are they shared within an FE-ResBlock, or does each FE-ResBlock have independent parameters? Furthermore, how are these parameters initialized? The authors need to clarify this.
- In Equation 6, the summation for the first two L1/L2 loss terms starts from s=0, while the third L1 loss term starts from s=1. X(0) is initialized using LI and is not a network output. Typically, in iterative unfolding networks, the loss is calculated starting from s=1 (the first network output). Calculating a loss term involving X(0), like ||I ⊙ (X - X(0))||, seems unusual as it penalizes the difference between the network output and the initial guess, rather than the ground truth X. This requires clearer explanation from the authors.
- The manuscript discusses computational complexity (GFLOPs) and inference time, noting that GFLOPs decrease but time increases. However, it does not mention the change in the model parameter count for FIND-Net compared to DICDNet. Introducing FFC and trainable Gaussian filter parameters is likely to increase the total number of model parameters.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The manuscript states that the channel proportion α allocated to the global (frequency) branch increases from 0.0 in the initial stage to 0.8 in later stages. Why was this specific range and trend (from 0.0 to 0.8) chosen? The paper does not provide justification for selecting this strategy.
- The manuscript introduces a trainable Gaussian filter G_σ,c, where the center frequency c and bandwidth σ are learnable. Firstly, are σ and c shared across all channels in the GFFC block, or does each channel have independent σ and c? Or are they shared within an FE-ResBlock, or does each FE-ResBlock have independent parameters? Furthermore, how are these parameters initialized? The authors need to clarify this.
- In Equation 6, the summation for the first two L1/L2 loss terms starts from s=0, while the third L1 loss term starts from s=1. X(0) is initialized using LI and is not a network output. Typically, in iterative unfolding networks, the loss is calculated starting from s=1 (the first network output). Calculating a loss term involving X(0), like ||I ⊙ (X - X(0))||, seems unusual as it penalizes the difference between the network output and the initial guess, rather than the ground truth X. This requires clearer explanation from the authors.
- The manuscript discusses computational complexity (GFLOPs) and inference time, noting that GFLOPs decrease but time increases. However, it does not mention the change in the model parameter count for FIND-Net compared to DICDNet. Introducing FFC and trainable Gaussian filter parameters is likely to increase the total number of model parameters.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

FIND-Net presents a technically sound and innovative approach to CT MAR by integrating frequency-domain processing (via FFC with trainable Gaussian filtering) into an established iterative dictionary-based framework. The method directly targets a known weakness of purely spatial CNNs in handling global artifacts while preserving details. Its strengths include this novel hybrid architecture, the specific contribution of the adaptive Gaussian filter, strong quantitative results showing statistically significant improvement over relevant baselines on a standard benchmark, and compelling qualitative results on both synthetic and real clinical data (including analysis of structural preservation).However, the most significant drawback is the substantial increase in inference time, which could hinder practical clinical adoption despite the theoretical FLOP reduction. Additionally, while statistically significant, the magnitude of quantitative improvement over the main baseline (DICDNet) is relatively modest. Ablation studies could also be more comprehensive to fully justify all design choices (like dynamic channel allocation).Overall, the paper makes a valuable contribution by demonstrating the effectiveness of integrating adaptive frequency processing into deep learning MAR. The technical novelty is clear, and the results validate the approach. Despite the practical concerns regarding inference speed and the limited magnitude of quantitative gains, the demonstrated improvements in artifact suppression and structural preservation, supported by solid methodology and evaluation, make it a worthwhile contribution.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The ultimate goal of this paper is to address the problem of metal artifact reduction (MAR) in CT imaging. While various MAR methods have been proposed, the authors point out key limitations in existing approaches, particularly those based on conventional CNNs and spatial-domain techniques. To overcome these issues, the paper introduces a frequency-domain method using Fast Fourier Convolution and trainable Gaussian filtering. This approach enables effective suppression of metal-induced distortions without oversmoothing, while preserving important structural details.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

One of the key strengths of this paper lies in the development of FIND-Net, an extension of the previously proposed convolutional dictionary model, DICDNet. DICDNet is composed of two sub-networks: M-Net, which identifies metal artifact components, and X-Net, which reconstructs the final image. By leveraging dictionary kernels, DICDNet is optimized to perform refined metal artifact reduction (MAR). However, DICDNet is limited to spatial-domain processing. FIND-Net effectively overcomes this limitation by incorporating frequency-based processing into the framework. The proposed frequency-domain technique is implemented through Fast Fourier Convolution, which integrates a local branch (based on spatial convolutions) with a global branch (based on FFT-based spectral transformations). This cross-branch interaction enhances the overall performance of the model. Notably, the global branch introduces a novel component: trainable Gaussian Filtering. This innovation plays a critical role in strengthening frequency-domain training and is a key contribution of the study. Moreover, the network is designed to gradually shift its learning focus from the spatial to the spectral domain as training progresses. The dynamic weighting strategy allows the model to better balance spatial and spectral learning, ultimately improving training accuracy. Finally, the effectiveness of the proposed approach is validated through extensive experiments, demonstrating superior performance compared to existing methods. The authors also conduct ablation studies comparing the model with and without the trainable Gaussian Filter, further highlighting its significant contribution and enhancing the reliability of the findings.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Despite its strengths, the paper has several limitations that are worth noting. In the Introduction, the authors classify existing MAR approaches into sinogram-domain and image-domain methods, and highlight their respective drawbacks, motivating the use of a frequency-domain technique as an alternative. However, many recent deep learning-based MAR methods adopt dual-domain strategies that combine both sinogram- and image-domain processing to leverage the strengths of each domain while mitigating their individual limitations. In contrast, although this study incorporates both spatial- and frequency-domain techniques, it does not include training in the sinogram domain. As a result, it cannot fully exploit raw projection data, which may limit its effectiveness in capturing and correcting artifacts as their source. This limitation becomes evident in Figure 3, where the model successfully reduces metal artifacts but fails to preserve fine structural details, specifically bone structures are overly smoothed and appear tissue-like, indicating a potential overcorrection due to lack of sinogram-domain information. Another notable drawback, as acknowledged by the authors, is the increased inference time. The use of FFT operations during inference, along with the dual-network architecture, introduces computational overhead. In clinical settings where time efficiency is critical, this additional time cost could limit the practical applicability of the method.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, this paper presents a meaningful contribution by effectively integrating frequency-based processing into a deep learning framework for metal artifact reduction. The spectral learning approach using a global branch with FFT is not only beneficial for MAR but also holds potential for broader applications in solving various inverse problems. The dynamic weighting strategy between spatial and spectral learning further strengthens the theoretical soundness of the approach. However, due to certain limitations, particularly the absence of sinogram-domain processing and the increased inference time caused by the complex network architecture, the method may face challenges in real clinical deployment. Considering both its innovation and practical limitations, I recommend a weak accept.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their constructive feedback and for acknowledging FIND-Net’s technical novelty (R2), especially the integration of FFC and trainable Gaussian filtering (R3), strong quantitative results (R1,R2) on synthetic and clinical CT data, structural preservation with robust artifact suppression (R1), and clinical feasibility as an image-domain MAR method (R1) with shared code and data. R-Meta: We appreciate the concern regarding QuadNet, a method we were aware of during our study. Our decision to exclude it as a baseline was based on two key reasons. First, FIND-Net is an image-domain method operating on reconstructions from linearly interpolated sinograms, without sinogram-domain training. Accordingly, we selected state-of-the-art image-domain baselines (DICDNet,OSCNet) aligned with our deployment setting. Second, QuadNet was developed for lower resolution sinograms (640×640) and already requires high memory and training cost. Adapting it to our high-resolution setting (900×1000) would further increase resource demands and runtime, making benchmarking impractical and less reproducible. R3: We appreciate the reviewer’s observation regarding the lack of sinogram-domain training. While training on sinograms allows direct access to artifact origins, it may introduce secondary artifacts, especially when refilling metal-corrupted regions without proper image geometry constraints. FIND-Net instead operates post-reconstruction using LI-based inputs and frequency-aware refinement, offering compatibility with clinical pipelines. Importantly, its modular design permits future integration into dual-domain architectures if desired. R1: We acknowledge that the red-masked region in Fig. 3 presents challenges for all MAR methods due to severe attenuation and information loss. Still, FIND-Net improves streak suppression and edge clarity in surrounding areas. This limitation reflects the difficulty of recovering heavily corrupted regions, not a shortcoming specific to FIND-Net. R2: We thank the reviewer for inquiring about the reason for the progressive alpha schedule (from 0.0 to 0.8 across stages). This setting allows a gradual transition from local spatial filtering to global spectral modeling, balancing early-stage denoising with later-stage global feature extraction. It yielded the best performance among the various configurations we evaluated. R2: Thank you for the clarification request concerning the Gaussian filter parameters. The center frequency (c) and bandwidth (σ) are independently learned for each frequency-domain channel, with no sharing across blocks or stages. Parameters are initialized as c=0.1 and σ=1.0 to ensure smooth early filtering behavior. R2: We appreciate the reviewer’s comment on the loss formulation in Equation 6. While X(0) is initialized using LI, it is processed through a learnable proxNet module. Supervising X(0) encourages a strong initialization for subsequent iterative refinement. In contrast, A(s) is undefined at s=0, so its loss begins at s=1. R1: Regarding R1’s concern about insufficient novelty in some components, while FFC and dictionary kernels are known individually, our contribution lies in integrating them within a proximal optimization framework for MAR. The combined use of trainable Gaussian filtering, staged frequency allocation, and iterative refinement forms a novel spatial-spectral strategy adaptable to broader inverse problems. R1,R2,R3: We acknowledge the concern about the 1.07 s inference time. As explained in Section 3.2, this results from FFT/IFFT overhead. Nonetheless, FIND-Net reduces GFLOPs by ~16% and has fewer trainable parameters (1.11M vs. 1.28M) compared to DICDNet, as FFT-based processing replaces spatial convolutions with lighter 1×1 operations. Designed as a post-processing tool, FIND-Net targets offline diagnostics, where real-time inference is not critical. Its architecture also generalizes well to other inverse problems requiring global-local trade-off (R3).

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The paper overlooks key prior work that has explored the use of FFC for CT metal artifact reduction, such as https://arxiv.org/abs/2207.11678. Without comparisons to such relevant baselines, the claimed performance improvements are not convincingly demonstrated.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All three reviewers are inclined to accept the paper, and after reading the rebuttal, I agree with their recommendation.

back to top

FIND-Net – Fourier-Integrated Network with Dictionary Kernels for Metal Artifact Reduction

Author(s):