Abstract

Against this endoscopic exposure correction task, although some past studies have yielded promising results, these methods do not fully explore the task-specific priors, and they generally require a large number of parameters thus compromising their applications on resource-constrained devices. In this paper, we carefully explore that regardless of the exposure level degradation, the illumination information is usually contained in the low frequency part, and the relative smoothness of structures in captured endoscopic images generally lead to the sparse high-frequency representation. Motivated by such prior understandings, we specifically construct a lightweight wavelet transform-based hierarchical network structure for this correction task, called WTNet, which utilizes the inherent frequency decomposition characteristics of wavelet transform and makes the core of network learning focus on the modelling of low-frequency information. Based on four datasets and three different tasks, including exposure correction, low-light enhancement, and downstream segmentation, we comprehensively substantiate the superiority of our proposed WTNet. With only 1.41M model parameters, our WTNet achieves a better balance between performance and cost, and demonstrates favorable clinical application potential. The code will be available at https://github.com/charonf/WTNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5419_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WuZhi_APriorDriven_MICCAI2025,
        author = { Wu, Zhijian and Wang, Hong and Shi, Yuxuan and Huang, Dingjiang and Zheng, Yefeng},
        title = { { A Prior-Driven Lightweight Network for Endoscopic Exposure Correction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces WTNet, a lightweight network for endoscopic exposure correction that leverages wavelet transform to exploit task-specific priors of endoscopic images. The authors observe that illumination information is predominantly captured in low-frequency components, while high-frequency components remain sparse due to the smooth anatomical structures typical of endoscopy. WTNet hierarchically decomposes and reconstructs these components using discrete wavelet transform (DWT), Transformers, and depth-wise convolutions, achieving competitive results with only 1.41M parameters.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The paper is clearly written and easy to follow. 2) The idea of focusing the learning on low-frequency components via wavelet decomposition is simple and well aligned with the nature of endoscopic imagery. 3) Good performance with less parameters than original models. 4) The authors perform a thorough evaluation across multiple datasets.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) Figure 1 is central to motivating the method but is not clearly explained and is difficult to follow. 2 ) The method is quite similar to Restormer, with only moderate architectural differences, raising concerns about the novelty. 3) While Table 4 includes Restormer in the ablation study, its results are missing in Table 1 alongside the other state-of-the-art methods. 4) The ablation study shows that WTNet substantially reduces the number of parameters, but does not provide significant gains in terms of PSNR, SSIM, or LPIPS. There is no discussion why is that.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper presents a clear idea and achieves a reduction in model size, the performance improvement over prior work is not as evident. Given the similarity to Restormer and the lack of a noticeable performance advantage, the contribution feels incremental.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The concept of emphasizing low-frequency components through wavelet decomposition is straightforward and well-suited to the characteristics of endoscopic imagery. I would lean toward acceptance.



Review #2

  • Please describe the contribution of the paper

    This paper introduces a lightweight model to perform exposure correction based on wavelet transform.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Strong performance across metrics and across datasets while using a lightweight model. -Interesting use of Transformer vs convolutions to capture low vs high frequency information.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    -Writing clarity could be improved (e.g. are LL, LH, HL, HH common abbreviations?) -Missing description of segmentation task and missing reference to Table 3 in the text.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strong results using a lightweight model but the paper’s clarity is inhibited by missing components.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    (1) The paper explores the inherent prior that endoscopic illumination information resides in low-frequency components while high-frequency parts are sparse, guiding the design of frequency-aware correction. (2) It proposes WTNet, a wavelet-based network with Transformers and depth-wise convolutions for frequency-aware correction. (3) WTNet achieves SOTA results with only 1.41M parameters across multiple tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) ​​Novel frequency-aware formulation​​. (2) ​​Lightweight yet powerful architecture​​. (3) ​​Strong multi-task evaluation​​.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) The claim that low-frequency components dominate illumination lacks rigorous validation. For example, the authors only provide a few visual examples without comprehensive frequency-domain statistical analysis (e.g., power spectrum distribution across the entire dataset).

    (2) The paper fails to specify complete implementation details. We strongly recommend providing full reproducibility information, ideally through complete code release in future work.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a technically sound study with comprehensive experiments and clear contributions.

  • Reviewer confidence

    Not confident (1)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    based on other reviewers.




Author Feedback

Thank the three reviewers for their consistent and positive comments, e.g., novel idea, good performance and less parameters, and thorough evaluation.

Common Issue Q1: Open source. A1: We promise that once acceptance, we will publish codes, training scripts and checkpoints.

Q1: More explanations about Fig. 1. (R1) A1: In Fig. 1, at the first row, we perform wavelet transform on endoscopic images with different exposure levels and get four components, including low-frequency LL, and another three high-frequencies LH, HL, and HH. At the second row, take the first virtual frame as an example, when performing inverse wavelet transform on the four components (LL of underexposure image, and HL, LH, HH of overexposure image), the image presents an underexposure effect. This suggests the illumination mainly exist in low-frequency. We’ll provide more details.

Q2: Similar to Restormer and adding Restormer in Table 1. (R1) A2: We want to kindly argue that Restormer is just a way for us to experiment and has no direct relationship with our innovation. Our core contribution lies in the prior analysis of the frequency property of endoscopic images and the customized designs about different frequencies. We observe that luminance is mainly in low-frequency, while the high-frequency is relatively sparse. Based on such prior analysis, we propose to use long-range operator to model the low-frequency and efficient convolutions to handle the high-frequencies. From Eq. (2), in the implementation, we use the common Transformer block in Restormer for low-frequency modeling. Actually, we can adopt other operators, such as Mamaba that has long-distance modeling capability. Unlike Restormer, which proposes a new Transformer block, our study aims to construct a multiscale architecture with multiorder frequency decomposition that greatly reduces complexity while guaranteeing performance. For Table 1, the average PSNR/SSIM/LPIPSs of our WTNet with only 1.41M and Restormer with 18.78M are [34.44/93.87/0.0817] and [34.23/93.83/0.0821], respectively. We’ll further clarify this in revision.

Q3: WTNet substantially reduces parameters, but has limited gains. (R1) A4: It is expected parameter reduction generally leads to limited gains. For example, when we reduced Restormer parameters from 18.78M to 1.45M, the PSNR dropped drastically by more than 1dB. Although our WTNet reduces the parameter by 92% with only 1.41M, it still achieves SoTA results, which fully shows the superiority of our designs.

Q1: Are LL, LH, HL, HH common abbreviations? (R2) A1: These abbreviations are commonly used in wavelet transform. We’ll explain more. Thanks.

Q2: Missing description of segmentation task and missing reference to Table 3. (R2) A2: We explained segmentation task at the end of Sec. 3.1 Dataset, along with the Details at the end of Sec. 3.1. We referenced and explained Table 3 at the end of Sec. 3.3. We’ll add more descriptions in revision.

Q1: The conclusion that low-frequency dominates illumination lacks statistical validation. (R3) A1: As suggested, for the entire dataset, we first generate paired images, i.e. the ideal-exposed images and the images obtained through the inverse wavelet transform on the four components ( LL of ideal image and LH, HL, HH of degraded image), and then adopt the FFT transform to extract the power spectrum. Through the paired t-test, we find that the p-value is less than 0.01 which directly indicates that after replacing the low frequency, the illumination is successfully transferred. This further verifies that the illumination mainly exists in low frequency. Thanks.

Q2: Specify implementation details. (R3) A2: We provided critical training settings in Sec. 3.1 Implementation Details. We’ll ensure full reproducibility by open-sourcing codes and training scripts.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents WTNet, a lightweight neural network tailored for endoscopic exposure correction, leveraging wavelet-based frequency decomposition to inform its architectural design. The authors insightfully observe that endoscopic images exhibit strong illumination information in the low-frequency domain, while high-frequency components are sparse due to the relatively smooth anatomical surfaces. WTNet exploits this prior by hierarchically separating frequency bands with discrete wavelet transform and dedicating distinct processing strategies to each: Transformers for low-frequency components and depth-wise convolutions for high-frequency features. The model achieves competitive results across multiple datasets with a remarkably compact parameter count (1.41M), significantly outperforming much larger architectures in terms of parameter efficiency.

    The reviewers largely agreed on the merits of the paper. They commended its strong empirical performance, clarity of presentation, and practical relevance, particularly given the model’s suitability for resource-constrained deployment in real-time surgical imaging. While concerns were raised about the similarity to Restormer and the relatively modest performance gains in PSNR/SSIM, the rebuttal convincingly clarified that WTNet is not simply a Restormer variant but a frequency-aware architectural formulation that delivers strong results with an order of magnitude fewer parameters. The authors further substantiated their claim about low-frequency illumination dominance with statistical validation and provided clarifications around Figure 1 and implementation specifics.

    In light of the paper’s focused and well-motivated design, strong empirical validation, and clear relevance to the CAI community, I recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top