List of Papers Browse by Subject Areas Author List
Abstract
White Light Imaging (WLI) and Narrow Band Imaging (NBI) are the two main colonoscopic modalities for polyp classification. While NBI, as optical chromoendoscopy, offers valuable vascular details, WLI remains the most common and often the only available modality in resource-limited settings. However, WLI-based methods typically underperform, limiting their clinical applicability. Existing approaches transfer knowledge from NBI to WLI through global feature alignment but often rely on cropped lesion regions, which are susceptible to detection errors and neglect contextual and subtle diagnostic cues. To address this, this paper proposes a novel holistic classification framework that leverages full-image diagnosis without requiring polyp localization. The key innovation lies in the Alignment-free Dense Distillation (ADD) module, which enables fine-grained cross-domain knowledge distillation regardless of misalignment between WLI and NBI images. Without resorting to explicit image alignment, ADD learns pixel-wise cross-domain affinities to establish correspondences between feature maps, guiding the distillation along the most relevant pixel connections. To further enhance distillation reliability, ADD incorporates Class Activation Mapping (CAM) to filter cross-domain affinities, ensuring the distillation path connects only those semantically consistent regions with equal contributions to polyp diagnosis. Extensive results on public and in-house datasets show that our method achieves state-of-the-art performance, relatively outperforming the other approaches by at least 2.5% and 16.2% in AUC, respectively. Codes are at https://github.com/Huster-Hq/ADD.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3003_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/Huster-Hq/ADD
Link to the Dataset(s)
N/A
BibTex
@InProceedings{HuQia_Holistic_MICCAI2025,
author = { Hu, Qiang and Wang, Qimei and Chen, Jia and Ji, Xuantao and Liu, Mei and Li, Qiang and Wang, Zhiwei},
title = { { Holistic White-light Polyp Classification via Alignment-free Dense Distillation of Auxiliary Optical Chromoendoscopy } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15970},
month = {September},
page = {256 -- 266}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper proposes a novel alignment-free holistic polyp classification framework. The proposed Alignment-free Dense Distillation (ADD) module, performs pixel-wise cross-domain knowledge distillation by learning fine-grained feature correspondences between WLI and NBI modalities. And integrates Class Activation Mapping refine the distillation.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) Unlike previous methods that rely on explicitly cropped lesions and strict alignment, ADD enables pixel-wise affinity-based knowledge distillation between misaligned image domains. The proposed method can overcome the common problems of motion artefacts and registration errors in endoscopy to a certain extent and has certain clinical feasibility.
(2) CAM filters distillation paths to retain only those connecting semantically consistent regions, thus enforcing that the transferred knowledge aligns with diagnostic relevance. The combination of pixel-wise affinity and semantic consistency is both novel and effective in enhancing interpretability and clinical robustness.
(3) The method is evaluated on widely used public datasets and clinically annotated in-house datasets, demonstrating its effectiveness.
(4) Ablation study verified the effectiveness of each module.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) The method strongly relies on NBI–WLI paired images to construct the cross-modal distillation path during the training phase. This is a major limitation in real clinical settings, as bimodal paired images are difficult to obtain in practice.
(2) ADD uses pixel-wise dense affinity matching and further combines CAM for filtering, which may result in higher computational cost. However, the paper does not provide quantitative analysis of FLOPs, GPU memory usage, or running time during training/inference.
(3) The authors claim to use ViT-S as the backbone in CPC-trans, and the backbone of other models is ResNet-50, but it is unclear whether the two architectures are trained under the same training strategy. ViT usually requires stronger regularisation, data augmentation, and longer training time. Using a unified training process may have an adverse effect on the Transformer backbone network and affect the fairness of the comparison.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper introduces a technically interesting solution to a real challenge in multimodal endoscopic classification. The paper relies on paired data and lacks analysis of computational costs, which affects its clinical value. However, the method shows potential and the experiments are relatively complete and comprehensive.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper proposed a novel holistic WLI polyp classification framework, which was the first exploration of leveraging NBI knowledge to assist holistic WLI polyp classifier without requiring polyp location.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
This work presents a novel approach that utilizes an NBI-based model to assist in training the WLI model, leading to improved performance.
-
An alignment-free model distillation strategy is proposed, enabling the model to learn from unpaired NBI and WLI image samples.
-
The proposed method achieves state-of-the-art performance on two benchmark datasets.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
In Section 2.3, the authors mention using a ResNet-50 pretrained on ImageNet as the feature extractor. They should explain why a model pretrained on natural images under white light can be directly applied to feature extraction on NBI polyp images without additional fine-tuning.
-
The authors should provide more details regarding the training process of the “pretrained NBI classifier.”
-
How are the different loss functions balanced during training? The authors are encouraged to elaborate on this aspect.
-
To ensure fairness in the comparative experiments, the authors need to clearly explain how the original single-modality methods were adapted to the dual-modality classification task involving both WLI and NBI.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Baesd on its strength and waskness in #6 and #7.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper presents a novel holistic classification framework that performs full-image diagnosis without requiring explicit polyp localization. The key contribution is an Alignment-free Dense Distillation (ADD) module, which enables fine-grained cross-domain knowledge distillation between WLI and NBI images, regardless of spatial misalignment.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) Novelty: The proposed ADD module is an innovative contribution that effectively addresses the challenge of spatial misalignment by enabling alignment-free, fine-grained knowledge transfer between WLI and NBI domains. (2) Comprehensive Evaluation: The paper includes extensive experiments and comparisons with recent state-of-the-art methods published in leading venues such as Medical Image Analysis and MICCAI, enhancing the credibility of the proposed approach
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) Unclear semantic interaction mechanism: The semantic relationship formulation in Equation (3) is ambiguous. It appears that the Intersection of refined CAM maps is used to calculate the semantic relation, but maybe the Intersection is null (Fig 1), In this case, Equation (3) may not be valid, and the semantic relation computation may be ineffective or misleading.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The motivation of the paper is well-grounded, particularly in addressing the domain gap between NBI and WLI modalities. The proposed ADD module is technically sound and contributes a novel perspective to cross-domain feature alignment without relying on pixel-level correspondence. the overall structure, empirical rigor, and contribution of the work meet the acceptance criteria. Enhancing the clarity of certain technical components would further strengthen the impact of this paper.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
1.Will the semantic relations be ineffective ? (R#1) No, semantic relations in Eq. (3) are always valid. In Eq. (5), the semantic relation is clearly NOT derived from the direct interaction between the two CAM maps. Instead, it represents a dense pixel-to-pixel correspondence, indicating whether a pixel in the WLI image shares the same class (e.g., polyp or background) with a pixel in the NBI image. This results in a relation matrix of size HWHW, not HW. Therefore, even in cases where the spatial intersection of the CAM maps is null (as shown in Fig. 1), the semantic relation remains meaningful. The dense correspondence across classes continues to provide effective guidance during the distillation process.
2.Does our method strongly rely on NBI-WLI paired images? (R#2) If ‘paired data’ refers to images of the same polyp captured in both NBI and WLI modalities, then our method does not strictly rely on such data. Many previous methods require spatially aligned cross-domain images, which implicitly assumes the availability of paired samples. In contrast, our method adopts an alignment-free dense distillation strategy, making it naturally applicable to both paired and unpaired datasets. We believe that our dense probabilistic connectivity still enables valid distillation, as common features shared across polyps of the same class (e.g., polyp or background) can provide effective guidance for enhancing WLI representation learning, even without exact image-level correspondence. In future work, we will extend our comparisons to unpaired data and update the results accordingly on our GitHub repository.
3.Quantitative computational cost. (R#2) -Training: Each batch is processed in approximately 0.45 seconds, with GPU memory consumption of around 12.6 GB. For the CPC-Paired and In-house datasets, training for 200 epochs takes approximately 50 minutes and 80 minutes, respectively. -Inference: Only WLI modal is used during inference. The inference time per image is approximately 0.03 seconds, with GPU memory usage of around 1.5 GB. The inference FLOPs are: 33.156G.
4.Training strategy of CPC-trans.(R#2) We retain ViT-S as the backbone in CPC-Trans, consistent with its original design. To ensure valid performance benchmarking, we also preserve the original training configurations, including all hyperparameters and architectural choices, as provided in the official CPC-Trans implementation.
5.Pre-training details of NBI classifier(R#3) Before cross-modal distillation, the NBI classifier was pre-trained on the NBI dataset. Specifically, it was initialized with ImageNet weights and then fully fine-tuned on NBI images. During distillation, the classifier remains frozen. We used an input size of 448x448, batch size 16, trained for 200 epochs with an initial learning rate of 1e-4 using Adam optimizer (weight decay: 1e-8).
6.Weight setting of the loss term. (R#3) We assigned equal weights to all loss terms. The results suggest the model is not highly sensitive to these settings, as uniform weighting already achieved strong performance. We plan to explore more refined weight tuning and will update our GitHub repository accordingly.
7.Training details of single-modality methods. (R#3) For single-modality methods, we combined the NBI and WLI datasets into one training set, where each batch includes both WLI and NBI images. The model treats all images equally without distinguishing between modalities. In contrast, cross-modality methods train on WLI and NBI images in two separate, modality-specific batches, processing each modality independently while applying cross-modal knowledge distillation between them.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A