Abstract

Deep learning models have achieved great success in automating skin lesion diagnosis. However, the ethnic disparity in these models’ predictions needs to be addressed before deployment of these models. We introduce a novel approach: PatchAlign, to enhance skin condition image classification accuracy and fairness through alignment with clinical text representations of skin conditions. PatchAlign uses Graph Optimal Transport (\texttt{GOT}) Loss as a regularizer to perform cross-domain alignment. The representations thus obtained are robust and generalize well across skin tones, even with limited training samples. To reduce the effect of noise/artifacts in clinical dermatology images, we propose a learnable Masked Graph Optimal Transport for cross-domain alignment that further improves the fairness metrics.

We compare our model to the SOTA model (FairDisCo) on two skin lesion datasets with different skin types: Fitzpatrick17k and Diverse Dermatology Images (DDI). Our proposed approach, PatchAlign, enhances the accuracy of skin condition image classification by 2.8\% (in-domain) and 6.2\% (out-domain) on Fitzpatrick17k and 4.2\% (in-domain) on DDI compared to FairDisCo. In addition, it consistently improves the fairness of true positive rates across skin tones in all of our experiments.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2609_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2609_supp.pdf

Link to the Code Repository

https://github.com/aayushmanace/PatchAlign24

Link to the Dataset(s)

https://github.com/mattgroh/fitzpatrick17k https://ddi-dataset.github.io

BibTex

@InProceedings{Aay_Fair_MICCAI2024,
        author = { Aayushman and Gaddey, Hemanth and Mittal, Vidhi and Chawla, Manisha and Gupta, Gagan Raj},
        title = { { Fair and Accurate Skin Disease Image Classification by Alignment with Clinical Labels } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper propose PatchAlign, to enhance skin condition image classification accuracy and fairness through alignment with clinical text representations of skin conditions. They use a Graph Optimal Transport Loss to perform cross-domain alignment. PatchAlign model is an alignment-based skin disease classifier, which consists of an Image Encoder, a Text Encoder, and three output branches. Finally, they conducted extensive experiments to demonstrate that their method is superior to the SOTA model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.This paper introduces PatchAlign, an Image-Text alignment-based model for skin disease classification. PatchAlign divided clinical skin images into patches and their embeddings are aligned with the text embeddings of clinical labels to fusion features between visual and textual modalities. 2.This paper proposes Masked Graph Optimal Transport(MGOT) based on GOT to decrease many irrelevant patches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.In Sec. 3, the model architecture should be described in more detail according to Fig.2. For example, the loss function only introduces the loss function for the target branch, but the loss functions for the remaining two branches should also be briefly described. 2.In Sec. 3.1, MGOT is the main innovation of this paper. Therefore, it should be described in more detail, such as how the Mask weights are obtained, as you can add in Fig.2. Meanwhile, the impact of different values of and on the experiments should be discussed in detail. 3.In Sec. 3.2, the content is expressed unclearly. For example, why does the use of multi-task learning require changing the loss function? How to learn high-quality representations through Multi-Task learning. 4.In Sec.5.2, the ablation experiments are not comprehensive enough. Compared to the FairDisCo model, this paper’s experimental variables have two components: healthy skin embedding, which is generated by eudermic label, and the Alignment Branch. In my understanding, FairDisCo⊘ is the framework FairDisCo without the contrastive loss, while PatchAlign⊘ does not use eudermic skin type during cross-domain alignment. They should start from the base (VIT) network and conduct ablation study for each module of PatchAlign, including different losses and network structures.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The paper contains a few language and grammar errors. a)In Sec. 3.1, “ we calculate similarly using We calculate…” b)In Sec. 3.1, “ reduce the time complexity. : “, the full stop should be removed.

    1. There are some recent work on text alignment for natural images, which should be compared.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Missing some experiments

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The proposed approach (PatchAlign) is designed to improve skin condition image classification accuracy and fairness by aligning them with clinical text representations of skin conditions. PatchAlign utilizes Graph Optimal Transport (GOT) Loss as a regularizer for cross-domain alignment. The resulting representations are robust and can be generalized effectively across skin tones, even with limited training samples. A learnable Masked Graph Optimal Transport for cross-domain alignment has been proposed to mitigate the impact of noise or artifacts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work proposes a novel approach called Masked Graph Optimal Transport (MGOT) as an explicit domain alignment method for noisy and limited clinical skin images. The generalization of the proposed approach has been evaluated using Fitzpatrick17k and DDI benchmarks datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors are suggested to proofread the manuscript. Section 3.1 seems unclear in the second paragraph: “We calculate similarly using We calculate {cos(xi , xj )}i,j , ….”
    2. An appropriate ablation study needs to be included to calculate how the hyper-parameter controls the importance of the entropy term, which has been decided.
    3. The novelty of the proposed work is limited as it utilizes the existing model, i.e., Graph Optimal Transport (GOT), which is a fundamental component of the proposed work.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In some places, there are typos, so better to proofread the manuscript.
    2. An ablation study should determine how the hyper-parameters are decided.
    3. Authors should describe how they obtained the results for other fairness methods.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    An appropriate ablation study is needed to determine how the hyper-parameter controlling the importance of the entropy term was decided. Moreover, the novelty of the proposed work is limited as it relies on the existing model, Graph Optimal Transport (GOT), which is a fundamental component of the proposed approach.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents PatchAlign, a novel approach that enhances skin lesion classification accuracy and fairness across skin tones. PatchAlign aligns skin condition image representations with clinical text descriptions using Graph Optimal Transport (GOT) Loss and a learnable Masked Graph Optimal Transport to reduce noise and artifacts. The method outperforms the state-of-the-art model on two diverse skin lesion datasets, improving both accuracy and fairness metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Fairness in skin lesion classification is an important topic, and the proposed method shows impressive performance improvement in terms of both classification accuracy and fairness metrics.
    2. The experiments are comprehensive, and the out-domain experiments are interesting.
    3. Using Masked graph optimal transport for domain alignment appears novel.
    4. The code is public, which improves reproducibility.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Baselines are limited; more baselines should be included in Tables 1 and 3. The author is encouraged to include baselines from the paper [1].
    2. The method is mainly developed based on FairDisco, and the comparison may not be entirely fair. For example, the proposed method PatchAlign employs an additional text encoder to generate text embedding, which is much larger than the linear layer used in FairDisco. More related analysis and comparison should be discussed. The model parameters and FLOPs should also be added.

    [1] Change is Hard: A Closer Look at Subpopulation Shift

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. More skin tone-based fairness studies or dermoscopic artifacts issues in skin lesion classification, such as [2-3], should be discussed.
    2. Have you tried other text encoder variants? How do different text encoders affect the model performance and fairness? [2] Debiasing skin lesion datasets and models? not so fast [3] Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and meaningful for the fairness community. The technique appears sound and has quite effective improvement compared to FairDisco. The main concerns are the limited baseline comparisons and the additional parameters of the text encoder. However, the overall quality is good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thank you for your thoughtful review of our manuscript. We appreciate your valuable insights and suggestions, which will help us improve the clarity and comprehensiveness of our work.

We have addressed the reviewer’s comments regarding the following aspects: Novelty(R3): Our work introduces two key novelties: Masked GOT (MGOT) and incorporating Eudermic skin labels. We agree GOT is foundational. However, MGOT utilises a learnable mask to prioritise disease-relevant patches during alignment, leading to better accuracy than GOT. Additionally, including the Eudermic skin label allows MGOT to distinguish healthy from diseased regions, further improving alignment. In MGOT, the mask weights are learnt using a neural network. These innovations address the specific challenges of clinical skin image alignment.

Text Encoder to Generate Encoding(R4): We evaluated different text encoders for clinical label encoding, including BERT, Medical BERT, and OpenAI’s text-embedding-3-large. The latter achieved the best performance and was also the most explainable (evaluated based on the t-SNE plot shown in Fig. 1 of the paper). As it a pre-trained model, it requires a one-time computational cost for encoding and has minimal impact on overall model complexity (FLOPs).

Loss Function(R1): The target branch uses the standard cross-entropy loss for classification. The sensitive attribute branch employs cross-entropy and confusion losses, similar to FairDisCo. Finally, the alignment branch leverages the GOT loss for aligning image and text representations. We will briefly explain each loss function in the camera-ready version.

Baseline(R4): We employed several strong baseline models to ensure a comprehensive evaluation. We began with a simple ResNet-18 architecture as a foundation. We then incorporated the results from FairDisCo, a well-established approach, as a baseline for comparison. Finally, to demonstrate the effectiveness of our multi-tasking strategy, we included the performance of a multi-tasking model as an additional baseline. Also, as suggested, we will try to add more baseline models to the camera-ready version.

MultiTasking(R1): Multi-tasking necessitates a modified loss function because it concurrently trains the model for both skin condition prediction and meta-label prediction. This requires cross-entropy for classification for the meta-label prediction task. Multi-tasking implicitly encourages alignment through shared representations across both tasks. In contrast, PatchAlign explicitly addresses alignment through the GOT loss.

Ablation(R1 and R3): Our initial ablation study included BaseVIT with GOT, the original FairDisco model, and two variants of PatchAlign (with and without the Eudermic skin label). To enhance the comprehensiveness of the ablation analysis in the camera-ready version, we will try to include Base VIT as a baseline and FairDisco with VIT for a more direct comparison.

More Skin Tone Fairnes Studies(R4): Unlike prior fairness-focused methods that often require extensive data normalisation[2] or additional human intervention[3], PatchAlign leverages a simpler and potentially more effective approach: cross-domain alignment with MGOT. This strategy promotes generalizability across diverse skin tones without significant data manipulation or human oversight. We will try to include a more detailed comparison of these techniques in the camera-ready version to elucidate the advantages of PatchAlign further.

Lastly, We will thoroughly revise the manuscript to eliminate grammatical errors before submitting the camera-ready version.




Meta-Review

Meta-review not available, early accepted paper.



back to top