Abstract

Biomedical image classification faces several adversarial challenges, including occlusions from artifacts, variations in tissue pigmentation, and class imbalance, which hinder model generalization. Existing attention mechanisms enhance region localization but often introduce redundant dependencies across attention heads, limiting feature diversity. We propose the Background-Invariant Independence-Guided Multi-head Attention Network (BIIGMA-Net) to address these issues. BIIGMA-Net employs Multi-head Independence-Guided Channel Attention (MICA), where each head independently learns feature importance while enforcing neuron-wise independence using the Hilbert-Schmidt Independence Criterion (HSIC) to enhance feature diversity. Additionally, a saliency-driven mechanism suppresses background activations by selectively shuffling non-salient vectors, preventing the model from relying on static background cues. By integrating these strategies, BIIGMA-Net improves robustness against spurious background noise while ensuring complementary feature extraction. Extensive experiments on popular skin cancer datasets (ISIC-17, ISIC-18 and ISIC-19) demonstrate the framework’s effectiveness and robustness. Our code is available at: https://github.com/shb2908/BIIGMA-Net

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2868_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/shb2908/BIIGMA-Net

Link to the Dataset(s)

https://challenge.isic-archive.com/data/#2017 https://challenge.isic-archive.com/data/#2018 https://challenge.isic-archive.com/data/#2019

BibTex

@InProceedings{RoyDeb_BackgroundInvariant_MICCAI2025,
        author = { Roy, Debasmit and Dutta, Srinjoy and Bose, Soham and Schwenker, Friedhelm and Sarkar, Ram},
        title = { { Background-Invariant Independence-Guided Multi-head Attention Network for Skin Lesion Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {34 -- 43}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The proposed Multi-head Independence-Guided Channel Attention (MICA) enforces the independence of multi-head feature projections through the Hilbert-Schmidt Independence Criterion (HSIC), reducing redundant features. The introduced Saliency-Guided Background Invariance mechanism suppresses the model’s dependence on static background noise by selectively mixing feature vectors from non-salient regions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    For the first time, by combining independence criteion in CNN-based attention heads with background agnosticism in CNN, a new framework for feature redundancy control has been provided. On the ISIC-17/18/19 datasets, the F1 score performed excellently and the method demonstrates robustness in scenarios with class imbalance. Ablation experiments verified the necessity of the module.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Current mainstream medical image classification has shifted towards architectures such as ViT, and it is worth considering the proposed modules and methods to be attempted on a framework based on ViT. The motivation and role of HSIC are not elaborated in detail, and the rationality of its application has not been fully demonstrated. It is only explained experimentally in terms of ablation experiments (Figure 3), lacking theoretical support. Moreover, it is only qualitatively explained through correlation matrix (Figure 3), and more quantitative indicators (such as mutual information or entropy measurement) should be used for explanation. The comparative methods on the three datasets were not identical and lacked rigor; identical comparative methods should have been used. The lack of ablation studies on SA blocks should also be validated for its usefulness.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited theoretical contribution

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    After reviewing the rebuttal from authors, I still hold the opinion that the contribution of this work is marginal.



Review #2

  • Please describe the contribution of the paper

    BIIGMA-Net’s main contribution is the integration of independence-guided multi-head attention (MICA using HSIC for diverse features) and saliency-driven background invariance (selective shuffling of non-salient features) in a CNN, uniquely addressing feature redundancy and background noise simultaneously.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a background-independent framework for lesion analysis. This framework addresses known biases in the ISIC dataset arising from background variations (Bissoto, Benini, & Garagnani, 2020). By focusing the AI model on the lesion itself, the method mitigates the influence of noisy backgrounds. Consequently, the proposed approach demonstrates clear performance improvements over existing methods. The study’s utilization of public datasets facilitates reproducibility and future comparisons.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper does not provide publicly available code, hindering the easy reproduction of its findings. The reported performance demonstrates lower recall on the ISIC 18 and ISIC 19 datasets. This reduced recall is clinically significant, particularly for the identification of malignant cases. Including a table with detailed performance metrics, specifically for malignant cases, would enhance the understanding of the framework’s clinical relevance.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper shows promise, its actual clinical utility requires further investigation. Specifically, a more detailed analysis of per-class performance metrics is necessary to accurately assess its practical value, especially given the noted lower recall.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My primary concern was the model’s relatively low recall, which is particularly important in medical image classification tasks to minimize missed diagnoses. In the rebuttal, the authors clarified that their method maintains a balanced trade-off between precision and recall and provided additional analysis showing that the recall is competitive when compared fairly across classes and settings. They also clarified how background invariance and attention mechanisms contribute to this balance.

    Given these clarifications, and the method’s strong overall performance and design novelty, I am satisfied with the response and recommend acceptance.



Review #3

  • Please describe the contribution of the paper

    The proposed method, BIIGMA-Net, successfully combines two approaches-

    1. Enforcing independence on multi head features using HSIC
    2. Supressing the effect of background noise by pruning channels with low variance
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel combination of approaches
    2. Thorough experimentation
    3. Well written paper
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    No major weaknesses

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Ensure consistency in language wrt “saliency-guided” and “attention guided vector sampling”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Paper proposes a novel method that seems to have a slight edge over SOTA

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    While I acknowledge the validity of the concerns raised by other reviewers, I find the modeling to be thorough and the results to clearly demonstrate improved performance. I recommend accepting the paper.




Author Feedback

We sincerely thank all reviewers for their time and constructive feedback. Due to anonymity requirements, the source code link is omitted. If accepted, we will release it publicly. R1/C1 (Validation with ViT Backbone):Our framework uses a CNN backbone but is largely backbone-agnostic. The MICA module (Fig. 1) can be inserted before MLPs in ViTs, whereas SA and Vector Sampling modules can be applied after reshaping ViT features to (H×W×C). However, our multi-depth feature aggregation depends on CNN’s hierarchical outputs, which ViTs lack. We leave the ViT integration for future work. R1/C2 (Theoretical Motivation & Quantitative Evaluation of HSIC):HSIC measures dependency among neuron activations from two heads (h1, h2) producing CNN features shaped (B × HW × C). Since these features are continuous tensors without explicit probability distributions, mutual information methods like KL divergence or entropy don’t apply. Distance metrics like Euclidean or Cosine fail because they depend on element positions and don’t capture semantic equivalence under index permutations. Eg. Vector A changes from <1,2,3> to <2,3,9>, and vector B from <2,7,1> to <6,8,2> as the batch evolves. A’s 1st and 3rd indices consistently track values similar to B’s 3rd and 1st indices, indicating those indices carry redundant or highly correlated information across heads but in different positions. HSIC computes kernelized correlations between representations, capturing both intra- and inter-head dependencies across batches and feature indices (Sec 2.1), unlike traditional distance metrics. In the last para, Sec 3.2 we discussed the essence of HSIC. R1/C3 (Justification of Comparative Methods and Evaluation of Spatial Attention via Ablation Studies):We compare against attention-based methods (Zhang et al. [24], Ding et al. [7], Wei et al. [19]) and feature robustness approaches (Li et al. [14], Chu et al. [5]). Our method introduces an independence criterion with collaborative spatial-channel attention, a novel combination in skin cancer analysis. Due to no directly comparable methods, we made ablation studies to evaluate hyperparameters and the importance of components. Our Spatial Attention (SA) combines Squeeze-and-Excitation, Channel Pruning, and Saliency-driven Vector Sampling. Prior work (Zhang et al. [24], Ding et al. [7]) show SA’s effectiveness. Table 2 shows Vector Sampling improves F1 scores by 2.1%, 2.7%, and 0.6% on ISIC 17, 18, and 19. Further ablation studies will be done in the future. R2/C1 (Addressing Clinical Relevance Concerning Recall Rates):We admit that recall is a critical parameter for this medical research and, we demonstrate high recall on ISIC17 binary tasks, alongside strong macro-averaged F1, precision, and recall—acceptable by most clinical population screening thresholds (Harrison et al. [A]). Clinical frameworks adjust precision-recall trade-offs based on disease prevalence (di Ruffano et al. [B]). Our high F1, supported by strong precision and recall, reduces false positives and minimizes missed cases. The model supports per-class threshold tuning (e.g., for melanoma) while maintaining overall specificity. Though confusion matrices were excluded due to space (will be added in the GitHub repo), our dataset-agnostic, generalizable framework remains robust across diverse medical imaging tasks. R3: We sincerely thank Reviewer 3 for recommending acceptance of our work. External Citations: [A] Harrison, Kathryn. “The Accuracy of Skin Cancer Detection Rates with the Implementation of Dermoscopy Among Dermatology Clinicians: A Scoping Review.” The Journal of Clinical and Aesthetic Dermatology 17.9-10 Suppl 1 (2024): S18. [B] Ferrante di Ruffano, Lavinia, et al. “Computer‐assisted diagnosis techniques (dermoscopy and spectroscopy‐based) for diagnosing skin cancer in adults.” Cochrane Database of Systematic Reviews 2018.12 (1996).




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The proposed MICA) enhances the independence of multi-head feature projections through HSIC), thereby reducing redundant features. The introduction of the saliency-guided background invariance mechanism suppresses the model’s dependence on static background noise by selectively mixing feature vectors from non-salient regions. Compared with existing methods, the proposed method shows significant performance improvement. This research has certain clinical significance. It would be more meaningful if the code of the research work could be made public.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper aims to address feature redundancy and background noise in CNN. The contribution of this paper is incremental, and the evaluation of the proposed blocks is not sufficient. It is good to demonstrate the proposed block in different network architectures, tasks, and data modalities.



back to top