Abstract

The U-Net architecture remains pivotal in medical image segmentation, yet its skip connections often propagate redundant noise and compromise edge information. We propose a Parameter-Free Edge and Structure Attention (PFESA) based on Fast Fourier Transform (FFT) to address these limitations. PFESA employs frequency-domain feature decoupling to separate high-frequency (edge details) and low-frequency (structural components) representations. Leveraging feature Signal-to-Noise Ratio(SNR) analysis, we devise dual attention paths: a High-frequency Edge Attention (EA) enhances gradient-sensitive regions to preserve anatomical contours, while a Low-frequency Structure Attention (SA) suppresses noise through energy redistribution. This frequency-aware attention mechanism enables adaptive feature refinement in skip connections without introducing trainable parameters. The parameter-free design ensures robustness against overfitting in medical datasets with scarce data. Extensive experiments on multi modal 2D/3D medical image datasets demonstrate PFESA’s superiority over existing attention methods, achieving SOTA performance with statistically significant improvements in Dice Similarity Coefficient (DSC: +3.3% vs.baseline) and Hausdorff Distance metrics. Code is available at: https://github.com/59-lmq/PFESA

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3694_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/59-lmq/PFESA

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiMin_PFESA_MICCAI2025,
        author = { Li, Mingqian and Yan, Zhiqian and Yan, Miaoning and Liang, Yaodong and Zhang, Qingmao and Ma, Qiongxiong},
        title = { { PFESA: FFT-based Parameter-Free Edge and Structure Attention for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {475 -- 485}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a parameter-free attention mechanism, named PFESA, that operates in the frequency domain using Fast Fourier Transform (FFT). The key idea is to decouple high- and low-frequency components of skip connection features using Gaussian filtering in the spectral domain. Then, separate attention mechanisms are applied to enhance high-frequency (edge) and low-frequency (structure) representations based on signal-to-noise ratio (SNR) estimates. The attention is applied in a parameter-free manner, aiming to reduce overfitting. The method is evaluated on several 2D and 3D medical image segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The idea of explicitly decoupling frequency components and designing attention mechanisms tailored to high/low-frequency characteristics is compelling and distinguishes this method from prior parameter-free attentions (e.g., SimAM); 2.The formulation of EA and SA using signal-to-noise ratio provides an interpretable, physically grounded mechanism rather than a black-box parameterized model; 3.The paper includes qualitative resultsthat support the claim that the method helps preserve structural and boundary information in skip connections.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While the paper uses Gaussian filtering to separate high- and low-frequency components, it lacks a detailed justification for why Gaussian filtering achieves this decomposition effectively in the Fourier domain. For instance, what frequency bands are retained or suppressed given different σ, and how that relates to anatomical features, is insufficiently explained.
    2. Though the parameter-free design is emphasized, recent works like WaveSNet [9] and PFD-Net [10] have explored frequency-based attention. PFESA builds upon these ideas but may not clearly surpass them in terms of novelty without a more thorough comparative discussion or architectural innovation.
    3. The study evaluates the role of each module and the r value for the Gaussian cutoff but does not explore other frequency-separation strategies (e.g., Laplacian filters, wavelets) or explain the rationale behind the specific SNR-based formulations.
    4. The code repository is currently inaccessible (https://anonymous.4open.science/r/PFESA-821D/), which prevents full reproducibility at this stage.
    5. The authors are recommended to cite recent advances in medical image segmentation, such as: 1. A novel deep network with triangular-star spatial–spectral fusion encoding and entropy-aware double decoding for coronary artery segmentation; 2.SASAN: Spectrum-Axial Spatial Approach Networks for Medical Image Segmentation
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Positive factors:

    1. The idea of frequency-domain feature decomposition and parameter-free SNR-based attention is elegant and interpretable.
    2. The parameter-free nature makes the module lightweight and generalizable, which is especially important in medical imaging.

    Negative factors:

    1. The explanation of Gaussian frequency decomposition lacks rigor, especially given that it’s central to the method.
    2. Some recent works on spectral attention (e.g., PFD-Net, WaveSNet) are not sufficiently contrasted, raising questions about novelty.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The improved theoretical grounding of the core PFESA module, along with a clearer differentiation from related work and the inclusion of critical ablation studies, now provide a sufficient foundation to warrant conditional acceptance, contingent upon the thorough execution of the promised revisions in the final version.

    My initial evaluation leaned towards rejection, primarily due to concerns regarding the methodological novelty, the rationale behind certain design choices—particularly the frequency separation and attention mechanisms—and insufficient differentiation from prior work.

    However, the current rebuttal has addressed several of these core issues to some extent, leading me to revise my recommendation to Weak Accept.



Review #2

  • Please describe the contribution of the paper

    The authors propose a FFT-based parameter-free attention mechanism that enables frequency-aware feature refinement in skip connections. The mechanism decouples high and low frequnce features and applies mathematically tracebale attention mechanisms before fusing them back together. The mathod is evaluated on two 2D and two 3D datasets and compared against other parameterized and parameter free attention mechanisms.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The methodology is clearly articulated, and the mathematical formulations are presented in a straightforward and comprehensible manner.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The experimental comparisons are weak. Competing networks and attention mechanisms are used without any hyperparameter tuning, which may lead to unfair or suboptimal baselines.

    It would be informative to compare segmentation performance between a sota segemnation network such as nnUNet and UNet+PFESA under controlled and optimized conditions.

    In both 2D datasets, all other attention mechanisms appear to degrade segmentation performance, and similar issues are observed in some 3D cases. This raises concerns about whether the proposed method’s advantage is due to inherent robustness or simply favorable hyperparameter settings.

    A clearer analysis is needed to assess whether PFESA performance is sensitive to network configurations.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    refrences [6] and [7] are named in related work but not told what the methods are. sounds like relevant work that should be discussed.

    add clarification why sigmoid normalization is needed for SA

    figure 3 right hand side images more contrast

    ablation study: r = 0,1 is used. how about lower r?(table2) table3: combination SA and EA without FD missing

    figure captions could be longer/explain more

    References [6] and [7]: References [6] and [7] are mentioned in the related work section but are not described in detail. These appear to be relevant contributions that warrant further discussion to provide context and comparison to the proposed method.

    Sigmoid Normalization for SA: The rationale behind the use of sigmoid normalization for the self-attention (SA) mechanism is not sufficiently explained. A brief clarification of why this normalization step is necessary would help improve the understanding of the method.

    Figure 3 – Contrast Adjustment: The images on the right-hand side of Figure 3 could benefit from higher contrast to improve visibility and clarity, making it easier for readers to interpret the results.

    Table 2 - Lower r: In the ablation study, the parameters r>=0.1 are tested, but it would be valuable to explore the impact of lower values of r.

    Table 3 – Combination of SA and EA: In the ablation study the combination of self-attention (SA) and enhanced attention (EA), without the Feature decoupling (FD) is missing. Including this combination would provide a more complete evaluation of the method’s performance.

    Figure Captions: The figure captions could be expanded to provide more detailed explanations of the images and their significance. This would help readers better understand the key findings and context of each figure.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is well-described, and the mathematical formulations are clear and easy to follow. While the manuscript demonstrates potential, it requires some improvements. Key references ([6] and [7]) are mentioned but not discussed in detail. Although I have some reservations about the evaluation, the presented idea is promising and has the potential to make a valuable contribution to medical image segmentation.segmentation.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Unfortunately, the rebuttal did not clarify or resolve some of the concerns. On the contrary, it raised additional issues regarding the applicability and usefulness of the method, and further emphasized the lack of comparison to the state-of-the-art, both in the motivation and in the experimental evaluation.



Review #3

  • Please describe the contribution of the paper

    Due to limitations brought by skip connections that propagate redundant noise and compromise edge information, authors propose a Parameter-Free Edge and Structure Attention (PFESA) based on Fast Fourier Transform. Specifically, PFESA employs frequency-domain feature decoupling to separate high-frequency (edge details) and low frequency (structural components) representations. Extensive experiments on multi modal 2D/3D medical image datasets demonstrate PFESA’s superiority over existing attention methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This parameter-free attention mechanism tries to address the intrinsic property of vanilla skip connections, which shows great potentials in generalizing to other research topics.
    2. The theoretical proof of the structural and edge attentions makes sense by introducing the concept of feature signal-to-noise.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. For experimental evaluations, authors provide the performance boost of PFESA on the baselines, including 3D UNet, UNETR++. However, they cannot be recognized as the SOTA segmentation models. Instead, nnUNet and MedNeXt should be taken into account. It is be better to conduct ablation studies when PFESA plugged into nnUNet and MedNeXt.
    2. Besides, authors choose public datasets of LA and Tooth, which are well-structured and regular in shape. Thus, those datasets are not representative enough to assess the efficacy of PFESA. I strongly suggest that authors can implement quantitative evaluations on more challenging anatomical structures, such as FLARE, etc. Also, the edge information between different classes is more difficult to be outlined compared with binary segmentation.
    3. For the choice of attention blocks: since self-attention is good at extract low-frequency details, authors can compare PFESA with the attention block combining convolutions and self-attentions [1, 2].
      • [1] MixFormer: End-to-End Tracking with Iterative Mixed Attention
      • [2] Learning With Explicit Shape Priors for Medical Image Segmentation
    4. On the design of structure attention (SA), why authors can calculate E(Sl2) with the approximation of Xl2 - ul. That is somewhat confusing to me. Maybe a detailed description is required.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper addresses the limitation of skip connections, which is a quite interesting topic and worth more discussions. However, there are still some drawbacks in the methodology and experimental parts.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors have well addressed my concerns. Thus, I believe it will be a good contribution to the society.




Author Feedback

We appreciate the valuable suggestions from the reviewers. Below are our responses to the questions raised:

Method: PFESA transfers features to the frequency domain via FFT, separates high/low- frequency components(H_h/H_l) using Gaussian low-pass filtering, then applies Edge Attention(EA) and Structure Attention(SA) to high/low-frequency features(X_h/X_l) after iFFT. This generates attention weights for edge details and structural components, respectively. (R1-Q3): Wavelet transforms fail to separate high/low frequencies in medical modalities like ISIC-2017 datasets and introduce noise due to overlapping frequency components[Doi:10.1109/ICCV51070.2023.01928]. Laplacian operators directly enhance high frequencies via second-order differentiation but can’t isolate low frequencies[6]. FFT in PFESA provides global frequency representation, making it suitable for multi-modal medical images. (R1-Q1): Gaussian low-pass filtering retains frequencies below cutoff σ(also r). H_h are obtained by subtracting H_l from full-spectrum features. In the final version, Fig 1 will replace symbolic representations with actual high/low-frequency feature maps to clarify the filtering process. Future work will explore adaptive r values for anatomical structures across modalities. (R1-Q3,R2-Q6,R3-Q4): PFESA assigns higher attention weights to regions with larger signal-to-noise ratios(SNR=E[S^2]/E[N^2]=μ_S^2/σ_N^2). For X_h, approximating σ_h^2 as σ_N^2, SNR_H ≈ (X_h -μ_h)^2/σ_h^2 amplifies signal energy while suppressing noise via mean subtraction (range: [0, ∞]). For X_l, as signal energy distribution is similar to the noise[Doi:10.1002/mp.14160], SNR_l ≈ (X_l^2-μ_l) /σ_l^2 emphasizes signal-noise differences by squaring X_l itself (e.g., a 2x2 X_l [[0.6,0.4], [0.7, 0.5]] yields SNR_H = [[0.15,1.35], [1.35,0.15]], while SNR_L=[[-11.39,-23.39], [-3.60,-17.99]]). Negative values necessitate normalize the value to [0,1] by sigmoid. Combined attention weights (EA+SA) are normalized to [0,1].

Experimental Design: We validated different attentions in skip connections on four multi-modal datasets using U-Net/TransUNet. (R2-Q1/2/3/4, R3-Q1/2): All compared methods used original parameters; PFESA only requires cutoff σ(also r), which sensitivity analysis in Table 2. Learning rate and other parameters during training followed [16, 17]. Lower performances of other methods (Table 1) stem from poor robustness on medical image tasks [Doi:10.1016/j.cosrev.2024.100721]. PFESA demonstrated robustness across models/tasks, laying groundwork for future SOTA applications and complex tasks (e.g. nnUNet, FLARE dataset). (R3-Q3): self-attention comparisons will be added in future work due to computational costs and task-specific designs (e.g. R3-ref2).

Ablation Study: (R2-Q8/9): As for the ablation study with smaller r value and w/o Feature Decoupling (FD), we did not put it in the paper due to page limitations. But it could be noted that when r=0.1, PFESA shows the best performance; And the EA+SA without FD did not achieve better result than baseline, as EA and SA are designed for the X_h/X_l extracted from FD, respectively.

Introduction Revisions: (R1-Q2/Q5,R2-Q5): Liu et al. [6] enhanced ViT high frequencies via Laplacian operators but lost low-frequency structures. SF-Net[7] used rectangular masks extracting frequencies after FFT, causing discontinuous/noisy high frequencies. WaveSNet[9] replaced downsampling with wavelets but ignored high-frequency in skip connections, introducing extra noise. PFD-Net[10] applied 1x1 convolutions in frequency domain without frequency band analysis. MC-FDN[R1-ref1] neglected frequency continuity via channel attention on real/imaginary components after FFT. SASAN[R1-ref2] fused denoised low frequencies but discarded high-frequency feature.

Other Revisions: (R1-Q3, R2-Q7/10): Fig 1 will illustrate Gaussian filtering with actual feature maps; Fig 3 and all captions will be clarified. (R1-Q4):Code will be released upon acceptance.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Most of the reviewers (R1/R3) posted positive reports to the rebuttal and highlight its sufficient novelty of this paper. In the post-rebuttal report of R2’s, follow-up issues were raised and such issues were not mentioned in the initial review for authors to clarify. To be fair, I neglect the additional comments from R2 and recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Most important main comments were addressed. The authors should incorporate the details mentioned in the rebuttal in the final version of the work.



back to top