Abstract

Domain adaptation is crucial for deep learning in skin lesion analysis because models trained on dermoscopic images often struggle to generalise to clinical images, which exhibit variations in lighting, resolution, and background conditions. We propose Selective Alignment Transfer for Domain Adaptation (SAT-DA), a fully supervised framework that significantly reduces this domain gap by dynamically assigning feature importance weights based on statistical moments from both source and target domains. SAT-DA emphasises domain-invariant features and suppresses domain-specific noise to preserve crucial diagnostic cues. Our multi-loss strategy combines classification, alignment, and diversity losses to optimise feature selection and prevent feature collapse onto a narrow set. SAT-DA was evaluated on six public datasets comprising dermoscopic and clinical images and consistently outperformed state-of-the-art supervised and unsupervised methods. On Derm7pt-Derm to Derm7pt-Clinic, SAT-DA achieves 82.46\% AUROC, surpassing the strongest baseline by over 6\%. Notably, SAT-DA also maintains high performance on completely unseen datasets not used as source or target, demonstrating robust cross-domain generalisation. Overall, these results highlight SAT-DA’s ability to address practical clinical deployment challenges, offering a reliable, fully supervised solution for cross-domain skin lesion analysis. The details of the code will be made available upon acceptance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2099_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/mmu-dermatology-research/sat-da

Link to the Dataset(s)

ISIC 2017: https://challenge.isic-archive.com/data/#2017 ISIC 2018: https://challenge.isic-archive.com/data/#2018 Derm7 (Dermoscopic & Clinical): https://derm.cs.sfu.ca/Download.html Fitzpatrick17k: https://github.com/mattgroh/fitzpatrick17k?tab=readme-ov-file PAD-UFES-20: https://data.mendeley.com/datasets/zr7vgbcyr2/1

BibTex

@InProceedings{SulNur_Selective_MICCAI2025,
        author = { Sultana, Nurjahan and Lu, Wenqi and Fan, Xinqi and Yap, Moi Hoon},
        title = { { Selective Alignment Transfer for Domain Adaptation in Skin Lesion Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {607 -- 617}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes Selective Alignment Transfer for Domain Adaptation (SAT-DA), a fully supervised framework for addressing domain shift between dermoscopic and clinical skin lesion images. Key contributions include:

    1. Dynamic feature weighting based on statistical moments to prioritize domain-invariant features.
    2. Multi-loss strategy combining classification, alignment, and diversity losses to prevent feature collapse.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novelty: The dynamic feature weighting mechanism is innovative, addressing domain-specific noise while preserving diagnostic cues.
    2. Rigorous evaluation: Extensive experiments on six datasets demonstrate consistent superiority over SOTA methods.
    3. Diversity preservation: The diversity loss mitigates over-reliance on narrow feature subsets, critical for medical imaging.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Handling extreme domain shifts: Performance drops on datasets like Fitzpatrick17k due to acquisition angle variability (acknowledged in limitations).
    2. Comparison breadth: Unsupervised baselines (e.g., ADDA, DANN) are included, but recent SOTA methods in medical DA (e.g., self-supervised approaches) are not benchmarked.
    3. Ablation experiments were partially missing, adding alignment loss alone AUROC to 56.83% and the full SAT-DA framework achieves a substantially higher average AUROC of 75.65%. There is a huge boost change in the results of both, suggesting that there is something well worth exploring. It is recommended to add a set of ablation experiments W/O(F.S ) to complement the effectiveness of your component.
    4. Combined loss has 4 components, is the composition too complex and the difficulty of hyperparameter tuning too high.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    SAT-DA introduces a novel, clinically relevant framework with significant empirical validation. Its dynamic feature weighting and multi-loss strategy outperform SOTA methods across metrics (AUROC, DSC). However, there are still problems such as limited robustness in extreme domains, difficulty in optimizing loss function hyperparameters, and insufficient ablation experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a domain generalization method that adapts the model pretrained on dermoscopic images to clinical images. The main contributions consist of a Dynamic Feature Weighting module and a diversity loss. The proposed method is extensively validated by jointly training on source and target datasets and testing on multiple unseen target datasets. The results demonstrate its superiority over existing approaches in domain generalization.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Supervised Domain Adaptation for Clinical Applications: Unlike unsupervised domain adaptation (UDA), this work leverages labeled target domain data during training, making it more suitable for high-risk clinical applications where reliability is crucial.
    • Dynamic Feature Weighting Module: A dynamic feature alignment module is introduced to adaptively adjust feature importance weights, enhancing generalization from dermoscopic to clinical images.
    • Extensive Cross-Domain Evaluation: The proposed approach is validated across multiple unseen target datasets, demonstrating its effectiveness in real-world clinical scenarios.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The motivation for conducting supervised domain adaptation (SDA) is unclear. The authors justify SDA over unsupervised domain adaptation (UDA) for better model performance but do not explain why the model is not directly trained on the target domain data alone, given that labeled target data is available. Additionally, the experimental setup should include key baselines: (1) training solely on the target domain data and evaluating on unseen target datasets, (2) training only on the source domain data and testing on unseen target datasets, and (3) training on a mixture of source and target data and evaluating on unseen target datasets. These comparisons would provide a clearer understanding of the effectiveness of the proposed method.
    • The introduction states that prior studies on dermoscopic adaptation mainly focus on biological or technical factors unique to dermoscopy rather than the dermoscopic-to-clinical image gap. However, given that the proposed method utilizes supervised target domain data, the paper lacks a discussion on existing supervised approaches for clinical images. Additionally, it is important to clarify the significance of transferring knowledge from dermoscopic images when labeled clinical data is available. Providing such justification would strengthen the motivation for the proposed approach.
    • The experimental comparisons are not comprehensive. (1) As noted earlier, Table 2 should include more complete baseline results, particularly to establish a lower bound for domain adaptation (DA) methods. Specifically, results for training only on the target domain, only on the source domain, and on a mixed dataset should be reported to provide a clearer performance reference. (2) In Table 3, the results for the model without both F.S. (Feature Selection) and L_align (Alignment Loss) are missing.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Aligning cross-domain features is not a novel idea, as many prior works have explored this in the context of unsupervised domain adaptation (UDA). However, this paper utilizes supervised target domain data, which is indeed more suitable for high-stakes clinical applications. Given the seemingly significant performance improvements over previous methods, the authors should provide a more detailed explanation of the motivation behind this approach. Specifically, what specific limitations of UDA or target-only training does this method address? Clarifying these points would strengthen the justification for the proposed approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    Domain adaptation is proposed to allow different dermoscopic datasets to be analyzed and to better preserve performance of these models in the clinic and with unknown hardware. The method combines different losses, including classification, alignment and diversity, and tests the performance on six different datasets. An ablation study is performed to evaluate the impact of different modules.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    . The performance overall is good although it is noted that the method struggles to handle different skin tones. Overall this is a useful contribution, it is well written, compares well against other available metrics and has a clear clinical application. It would be strengthened if the team could analyze new datasets or clinically apply the method in a prospective study.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    An alternative approach would be to standardize the hardware better. At the moment, since there are few quantitative analysis methods for these types of image, the hardware is quite disparate, including acquisition on mobile phones. This situation would be quite easy to change if the value of automated analysis was better demonstrated.

    The authors refer to ‘dermoscopic images’ in contrast to ‘clinical images’ although these two types of images seem to be the same thing. Clinical images are acquired with a dermoscope. Perhaps it is the variety of dermoscopes that are used clinically that the authors are referring to.

    Typos: - ‘intrigrates’ Table 2 is too small.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Useful method and application, evaluated on large public dataset, but would be strengthened with a prospective field study with new data.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their valuable feedback and address their concerns below. R1 Q2 Why exclude recent self-supervised DA methods? We selected SDA and UDA as baselines since they directly and explicitly address domain shift using either labelled data or explicit domain alignment. Self-supervised DA methods focus on auxiliary tasks for representation learning and typically do not leverage target labels or explicit alignment, making them less directly comparable to our approach.

R1 & R2 Can you add W/O (F.S) and W/O (F.S & L_align) ablations to show each component’s impact? The diversity loss is intrinsically tied to Feature Selection (F.S) as it operates on the channel-wise weighting vector produced by this module. Removing F.S. makes the diversity loss undefined, since there are no weights to regularise. Consequently, removing only F.S. while retaining the diversity loss is not meaningful. Similarly, removing both F.S. and the alignment loss invalidates the role of the diversity term, as it lacks its operational basis without feature weights.

R1 Q4 Is the loss function overly complex with difficult hyperparameter tuning? Our classification loss ensures task-specific performance, alignment loss minimises domain shift, and diversity loss encourages non-trivial feature utilisation. In practice, tuning λ₁ and λ₂ is not prohibitively sensitive. We found that these values exhibit stable performance across datasets and λ₁ = 0.1 and λ₂ = 0.01 were used consistently without task-specific re-tuning. This indicates that the model does not require extensive hyperparameter tuning for effective deployment.

R2 Q1 Why not train directly on the labelled target domain data instead of using SDA? In real-world practice, clinics use many different devices, so training only on one labelled clinical dataset leads to overfitting to the specific characteristics of that dataset but fails to generalise from that specific clinical dataset to the other unseen clinical datasets. Our approach encourages learning domain-invariant features that transfer across both dermoscopic and clinical domains. Features such as irregular lesion borders or asymmetry are clinically meaningful and remain important for diagnosis, whether the image is taken with a dermatoscope or a mobile phone. Learning these stable features, rather than device-specific characteristics, is crucial for developing reliable models. R2 Lower-bound (base+DA), upper-bound, and combined baselines: We did run these 3 key baselines. Due to the page limit, only the Lower bound and Combined results are shown in Table 4. Both Upper and Lower bounds show significant performance drops due to domain shift. The Combined baseline performs better than the Lower bound but still underperforms compared to SAT-DA. Notably, training on clinical data but testing on unseen clinical sites still results in performance drops, highlighting the impact of heterogeneity in clinical image distribution. R2 The main limitation of UDA and motivation of SDA. UDA cannot guarantee class-wise alignment without target labels. Features from the same lesion class in the source and target domains often remain scattered, which reduces diagnostic sensitivity. Our SDA method uses the available target labels to group features from the same class together, while also suppressing domain-specific artefacts. In this way, our method directly addresses UDA’s inability to ensure reliable, class-consistent alignment, which is also a limitation of simple target-only fine-tuning. R4 Why not standardise hardware, and are clinical and dermoscopic images different? We respectfully clarify that, clinical images are taken with mobile phones in clinics, while dermoscopic images are acquired using dermatoscopes. Mobiles are widely used by doctors and patients for skin monitoring. As mobile phone cameras vary greatly across brands and models, standardisation is not practical. R4 Spelling and Table size We will correct the typo and enlarge Table 2.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This manuscript introduces a fully supervised domain adaptation method for skin lesion classification. It uses a dynamic feature selection mechanism to generate feature importance weights, which facilitate domain-invariant feature learning. It also adopts a new diversity loss to prevent usage of only a small set of features. The proposed method outperforms other competitors on multiple datasets. The rebuttal has addressed most of the reviewers’ concerns, although the manuscript can be further improved by clearly clarifying the study’s motivation (e.g., why fully supervised domain adaptation) and including the results of upper-bound models.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top