Abstract

Medical image analysis powered by artificial intelligence (AI) is pivotal in healthcare diagnostics. However, the efficacy of machine learning models relies on their adaptability to diverse patient populations, presenting domain shift challenges. This study investigates domain shift in chest X-ray classification, focusing on cross-population variations, specifically in African dataset. Disparities between source and target populations were measured by evaluating model performance. We propose supervised domain adaptation to mitigate this issue, leveraging labeled data in both domains for fine-tuning. Our experiments show significant improvements in model accuracy for chest X-ray classification in the African dataset. This research underscores the importance of domain-aware model development in AI-driven healthcare, contributing to addressing domain-shift challenges in medical imaging.



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0442_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Mus_Analyzing_MICCAI2024,
        author = { Musa, Aminu and Ibrahim Adamu, Mariya and Kakudi, Habeebah Adamu and Hernandez, Monica and Lawal, Yusuf},
        title = { { Analyzing Cross-Population Domain Shift in Chest X-Ray Image Classification and Mitigating the Gap with Deep Supervised Domain Adaptation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper explores the issue of domain adaptation in the context of Chest X-Ray classification for an underrepresented population (using an African dataset), in comparison with three other baseline datasets. Adversarial Domain Adaptation is applied to classification models trained on the baseline datasets to achieve improved classification results on the target dataset (African).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper generally addresses an important translational problem of generalizing trained models to underrepresented populations. The problem of exploring an African dataset in the context of domain adaptation is new. [18] used an African Chest X-Ray dataset for classification but didn’t perform domain adaptation. The authors also propose to release the African dataset (expert-annotated, 6000+ Chest X-Rays) for public use later on (Sect 3.1).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Incomplete description of the Adversarial Domain adaptation method: is not claimed or explained as a novel method but is missing a specific reference from which it is used. Lack of rigorous comparison with SOTA methods.

    • Sect 3.4: The paper proposes the use of adversarial domain adaptation (A-DA) as the most powerful technique based on the survey article [31]. It discusses the generic details of how A-DA works with domain invariant feature representations. However, it is not clear exactly which A-DA technique has been applied (missing a specific reference).

    • In this regard, several other important domain adaptation (DA) techniques (some of which also use feature-level adversarial learning) are not discussed or used for exploration of the African dataset. Some examples include:

    • Ganin et al. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
    • Ganin et al. Unsupervised domain adaptation by backpropagation. ICML 2015.
    • He et al, Domain adaptation via Wasserstein distance and discrepancy metric for chest X-ray image classification. Nature Scient. Reports. 2024.
    • Thiam et al, Unsupervised domain adaptation for the detection of cardiomegaly in cross-domain chest X-ray images. Front of AI, 2023
    • He et al, Classification-aware semi-supervised domain adaptation. IEEE/CVF Conf. on CVPR Workshops, pages 964–965, 2020a.
    • Liu et al. Data augmentation via latent space interpolation for image classification. In 24th ICPR, pages 728–733, 2018.
    • Guan et al, Domain Adaptation for Medical Image Analysis: A Survey. IEEE Trans. Biomed Eng, 2022.

    • Sect 3.1 mentions that the target dataset (African) is labeled. Then why would this data be fed into the network as an unlabeled target (Fig 2)? Would any other supervised or semi-supervised DA technique work better (e.g. Madani 2018 which uses Chest X-rays and GAN based DA)?

    • Fig 2: Does the target dataset contribute to the loss function when classified incorrectly (if yes, is it enforced for all the three labels or only selected ones)?

    Weak empirical evaluation.

    • Sect 3.3, Table 3: The implementation details and results are a bit confusing. The paper mentions ‘validation set’- is it coming from the data split of the 3 source datasets during the 5-fold cross-validation (train/validate/test)? Is the 5-fold CV performed only on the 3 source datasets? Or, was the African dataset also used during data splitting (for testing)? Test data: The paper mentions that 10% of the samples were used as test data in 5-fold CV. Does this imply that this 10% test data is from the source dataset (because it seems like the African data is also treated as the test data).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see Strengths/Weakness for detailed comments.

    Additional comments:

    • The authors can consider adding an ablation study focused on small sample sizes more typical of underrepresented populations. In this case, the African dataset is 6000+ but what if it was much smaller- how would that impact the Domain adaptation performance as well as which DA techniques might work better in that case. Similarly, would an imbalanced African dataset further exacerbate the domain shift issue and its adaptation performance?

    • How would other approaches like continual learning (e.g. Lenga et al PMLR 2020) or multi-task learning (e.g. Imran 2019, MICCAI-MLMI) compare with A-DA for improved generalization of Chest X-Ray classification.

    • References can be formatted better- some are missing author names and/or publication venue (e.g. 19, 24, 26, 5, 8).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper lacks a comprehensive and nuanced exploration of this particular dataset and/or the challenges that it exposes, beyond the basic need for a domain adaptation approach. It also lacks a comparison of various domain adaptation techniques useful to this specific dataset (African).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    I thank the authors for their rebuttal. After reading the rebuttal as well as other reviewers’ comments, I am leaving my initial score unchanged. The rebuttal vaguely clarifies a few points but there is no way to ensure that these important details would be explained clearly in the camera-ready. E.g., There seems to be a two-stage process which adds a supervised step using the target labels, but its loss functions and details are not explained. To justify the particular ADA technique that is used, reasoning/evidence beyond [31] is still important, specifically when applied to the underrepresented dataset. Overall, this paper can really benefit from a significant revision and resubmission as it does address a very interesting and important health-equity/under-served population issue.



Review #2

  • Please describe the contribution of the paper

    The paper describes an investigation into the impact of domain shift in AI based X-ray diagnosis models. Specifically, models are trained on 3 different public datasets and evaluated both internally and on an external African dataset. Performance of the models drops when evaluated on the African dataset but the use of an adversarial domain adaptation technique is shown to improve performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Investigating techniques to ensure good performance on data from under-served populations is important
    • The experiments are well planned and the results are good
    • The paper is easy to follow and clear
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper does not describe novel methodology
    • Some aspects of the Introduction to the paper could be improved
    • Not all aspects of the experiments are reported clearly
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Overall the descriptions of the experiments were clear and complete. The only detail missing was which parts of the African dataset were used for feature alignment and testing. If this information is provided I would change my response to this section.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I congratulate the authors on performing a thorough internal and external evaluation of deep learning models for X-ray image diagnosis and in particular for using an external validation set from a location which is traditionally under-served. Their commitment to publicly release this dataset is also commendable. I believe that more research should focus on such goals and it is important that this paper should be published.

    However, I do have some concerns about whether the main conference at MICCAI is the right place to publish it. My understanding from the reviewer guidelines is that even papers that focus on clinical translation and/or health equity (as this paper does) need to have some kind of novel contribution beyond demonstrating existing techniques on new data. E.g. this could involve the development of a novel approach to solve a problem that is specific to an under-served population. This paper demonstrates that an existing approach (adversarial domain adaptation) works quite well in terms of reducing the drop in performance due to domain shift. This is useful to report but I don’t think it is enough to warrant acceptance at the MICCAI main conference. In my opinion, the paper would be better submitted to one of the more focused MICCAI workshops (e.g. MICCAI Meets Africa - https://event.fourwaves.com/miccaiafrica/pages). If I have misunderstood the reviewing guidelines then I would adapt my recommendation accordingly.

    Below are some further comments aimed at improving the paper for a future submission.

    1. Section 2, paragraph 1: I think this introductory paragraph mostly repeats what has already been said in the first part of the Introduction. The Literature Review section should be more focused on specific parts of the literature, so this paragraph can be removed/shortened.

    2. Section 2, paragraph 4: The text on DA for different patient populations (i.e. “Although DA techniques …” until the end of this paragraph) seems out of place. The paragraph below (which starts “Given the unique challenges …”) introduces this point so I think this text should be moved to there.

    3. Section 3.3, paragraph 1 – “… by freezing all the layers except the final one, to ensure that only weights of the final layer are updated during training”: Is this a sensible approach? I can understand using ImageNet weights to initialise the network prior to training using the X-ray data, but to freeze all layers except the last one seems like it would limit the models’ ability to adapt to the X-ray data too much.

    4. Section 3.3, paragraph 2 – “… Five fold cross-validation was used, utilizing the 10% of the samples as test size”: This is not very clear. Why was cross validation used rather than just using the 90% of the source domain data as training/validation? Were the five models used as an ensemble on the test set? Please clarify.

    5. Section 3.4, paragraph 2: What data from the African dataset were used in the feature alignment process? Were these data also used when testing the models? This is important information as it determines whether the results reported are validation results or test results.

    6. The paper is comprehensible but the scientific English needs to be improved and I think the paper would benefit from a thorough proof-reading/revision. Some specific examples are provided below but these are not exhaustive: • Section 1, paragraph 3, line 5: “… results into longer …” -> “… results in longer …” • Section 1, paragraph 3, line 6: suggest to remove “all over the process” • Section 2, paragraph 1, line 5: “… based in machine …” -> “… based on machine …” • Section 2, paragraph 1, line 6: “… achieving a promising …” -> “… achieving promising …” • Section 2, paragraph 4, line 3: “… it is common the need to handle with domain …” -> “… there is commonly a need to handle domain …” • Section 2, paragraph 4, lines 4/5: “… adapt variations …” -> “… adapt to variations …” • Section 2, paragraph 5, line 5: “… which try to align …” -> “… which tries to align …”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In addition to the specific comments made in the review, my main concern is whether papers which “just” apply existing technology to new data are appropriate for a MICCAI main conference paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    Thanks to the authors for clarifying some points raised. But it seems as if the paper needs quite a few changes which obviously would not be re-reviewed. So unfortunately considering this, and the lack of further clarification about the acceptance criteria for clinical translation & health equity papers, I have to recommend reject. But I wish the authors good luck in publishing their important work in the future.



Review #3

  • Please describe the contribution of the paper

    The paper introduces an X-ray dataset and applies a modeling approach for impressive cross-domain performance on other similar datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Interesting dataset with good modeling results. Great cross-domain performance!

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I’m not clear on how the network works. Some terms are used only once or twice in the paper and not really explained (eg feature fusion block and . Maybe I missed the citation?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    My only concern is about understanding the DANN and how it works. I am unfamiliar with DANN, so if primary audience is familiar with it this may be OK.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall good modeling results and very interesting paper! Addressing the potential readers who are unfamiliar with this method (like me) might improve the accessibility of your work.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Great paper and can be an easy accept if address reproducibility concerns. Your work seems like a great work to try to reproduce if the dataset is open or with the other datasets you referenced.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    The authors’ replies did not make me feel confident in the edits I’d expect for camera ready.




Author Feedback

We sincerely thank the reviewers for their valuable and constructive feedback. We will make the amendments needed to satisfactorily answer all the reviewer comments (R1, R3, R4), especially the ones regarding the description of the technical details of ADDA and the implementation details conducted to successfully generate our models. The African dataset, the source code, and the models will be made publicly available upon acceptance. We focus this rebuttal on the most relevant concerns. All the typos and minor concerns will be addressed in the camera ready.

R1, R4. Incomplete description of the Adversarial Domain adaptation method. R3. Concerns regarding the novelty of our work and suitability to Miccai conference We selected ADA [26] as the best method in the extensive survey conducted in [31] for adversarial domain adaptation. We extended ADA with subsequent steps involving fine-tuning and the use of labels for supervised adaptation. We will improve the description of our extended ADA in the camera ready. We believe that the extension of ADA methodology, the use of ADA in a very particular medical imaging context, the application of ADA to cross-population domain shift in a relevant clinical task, our comprehensive evaluation with different populations, and the outperformance of the proposed method in our particular problem of classification may yield a contribution suitable for a conference such as Miccai.

R1. Lack of rigorous comparison with SOTA methods. We agree with the reviewer that there are other interesting domain adaptation techniques that may provide interesting insights into our problem. We found the citations suggested by the reviewer very valuable. We will include the most relevant ones in our Literature Review. We intend to conduct a thorough comparison and evaluation of different domain adaptation techniques in an extended journal version, thus providing the rigorous comparison of the SOTA methods requested by the reviewer. We will also address the problems of small sample size and class imbalance, typically found in the clinical translation of AI models. From our experience, continual or muti-task learning is better suited in environments with small sample sizes. We believe, that alone would not provide acceptable results for datasets with a great shift among them. It would be a good idea to combine ADA with CL or MTL for the improvement of our models.

R1. Take advantage of the labels in the African dataset. ADA is designed for training with an unlabeled target dataset. However, we agree with the reviewer’s concern about utilizing available labels. Therefore, our method incorporates labeled data with a different loss function to fine-tune the models, enabling supervised adaptation. We will revise the manuscript to reflect this two-stage process better. A comparison with Madani et al. 2018 will be considered in an extended journal version.

In ADA, the target dataset contributes to the loss function through the domain discriminator but not directly through the classification loss. We use adversarial training to align source and target feature distributions. The domain discriminator loss is updated based on distinguishing features from both domains. Classification loss is calculated using labeled samples from both domains, penalizing incorrect classifications. This combination of domain discriminator loss and classification loss ensures robust adaptation and improved generalization across diverse populations.

R1, R3, R4. Confusing implementation details and results. We will amend our manuscript to clearly describe the data-splitting process for cross-validation on the source datasets. Also, we will explicitly state that the African dataset is used solely as the target test dataset, not included in the cross-validation splits. Finally, we will clarify the distinction between validation and test sets derived from the source datasets during cross-validation and the target dataset used for final testing.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I agree with the reviewers that the decision ‘Reject’ should be assigned.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I agree with the reviewers that the decision ‘Reject’ should be assigned.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All three reviewers gave “Reject” decisions. There are several major weaknesses. (1) Lack of detailed explanation of the two-stage process and loss functions. (2) Insufficient reasoning/evidence for the ADA technique, especially for underrepresented datasets. (3) Clarification on criteria is unclear for clinical translation and health equity papers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All three reviewers gave “Reject” decisions. There are several major weaknesses. (1) Lack of detailed explanation of the two-stage process and loss functions. (2) Insufficient reasoning/evidence for the ADA technique, especially for underrepresented datasets. (3) Clarification on criteria is unclear for clinical translation and health equity papers.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    As pointed out by all three reviewers, this paper presented an effort on evaluating and adapting X-ray AI classification on African data. The effort of exploring an African dataset in the context of domain adaptation is new, and the results on the African dataset can provide new observations of health equity for the community. The authors also intend to release the African dataset (expert-annotated, 6000+ Chest X-Rays) for public use upon publication of the paper. This will be a very valuable resource for the research field. This meta reviewer is glad to champion this submission. It will definitely contribute to the research diversity and health equity of MICCAI.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    As pointed out by all three reviewers, this paper presented an effort on evaluating and adapting X-ray AI classification on African data. The effort of exploring an African dataset in the context of domain adaptation is new, and the results on the African dataset can provide new observations of health equity for the community. The authors also intend to release the African dataset (expert-annotated, 6000+ Chest X-Rays) for public use upon publication of the paper. This will be a very valuable resource for the research field. This meta reviewer is glad to champion this submission. It will definitely contribute to the research diversity and health equity of MICCAI.



back to top