Abstract

The rising interest in pooling neuroimaging data from various sources presents challenges regarding scanner variability, known as scanner effects. While numerous harmonization methods aim to tackle these effects, they face issues with model robustness, brain structural modifications, and over-correction. To combat these issues, we propose a novel harmonization approach centered on simulating scanner effects through augmentation methods. This strategy enhances model robustness by providing extensive simulated matched data, comprising sets of images with similar brain but varying scanner effects. Our proposed method, ESPA, is an unsupervised harmonization framework via Enhanced Structure Preserving Augmentation. Additionally, we introduce two domain-adaptation augmentation: tissue-type contrast augmentation and GAN-based residual augmentation, both focusing on appearancebased changes to address structural modifications. While the former adapts images to the tissue-type contrast distribution of a target scanner, the latter generates residuals added to the original image for more complex scanner adaptation. These augmentations assist ESPA in mitigating over correction through data stratification or population matching strategies during augmentation configuration. Notably, we leverage our unique in-house matched dataset as a benchmark to compare ESPA against supervised and unsupervised state-of-the art (SOTA) harmonization methods. Our study marks the first attempt, to the best of our knowledge, to address harmonization by simulating scanner effects. Our results demonstrate the successful simulation of scanner effects, with ESPA outperforming SOTA methods using this harmonization approach.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1131_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1131_supp.pdf

Link to the Code Repository

https://github.com/Mahbaneh/ESPA.git

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Esh_ESPA_MICCAI2024,
        author = { Eshaghzadeh Torbati, Mahbaneh and Minhas, Davneet S. and Tafti, Ahmad P. and DeCarli, Charles S. and Tudorascu, Dana L. and Hwang, Seong Jae},
        title = { { ESPA: An Unsupervised Harmonization Framework via Enhanced Structure Preserving Augmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper aims to address harmonization through simulating scanner effects with data augmentation and proposes an unsupervised harmonization framework.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    – The paper focuses on the domain shift problem due to the scanner effects, which is common in medical imaging field. – The paper utilized several sound technical approaches and the proposed approach can be promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    – I think the technical contributions are somewhat limited. The proposed approach is just a combined pipeline of the existing approaches with slight modifications and the harmonization may not be achieved with real data. – The paper is a bit difficult to understand. The language usage and the flow of information are not clear in several places. Readers need to put some effort to understand the full procedure and experiments, so some re-writing is required for clarity. – More importantly, some references can be confusing. For example, the paper states “We refer to the data targeted for harmonization as multi-scanner data.” In page 3. Then, in the page 5, the paper uses this reference in this sentence: “Treating this data as unmatched for the multi-scanner data in ESPA, …”. If the multi-scanner data in page 3 is the original target data, what is the meaning of the part in page 5? – There are also other confusing references such as referring preprocessed data as “RAW”, which has a common and different meaning in the literature. I would highly encourage to rewrite these confusing references. – Unfortunately the validation in the paper is weak. The paper lacks significant amount of information about the target data, which is an in-house dataset. Inclusion and exclusion criteria are unclear. How many subjects were used for target data per scanner at each stage? (Some were included but not all) What is the meaning of “… matched dataset”? What parameters are matched? What are the age and sex differences in target data per scanner? Were the confounding effects considered? – Number of test cases per scanner is only 3 images, which is quite small. – The objective and the procedure in validation of augmentation for domain adaption are unclear. How were the experimental and testing phases set up? Several comparisons can be made. Two examples are the classifiers differentiating between the adapted source and the target images per scanner, or all adapted source images across all scanners.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I think that the authors have a good vision on harmonization for the scanner effects but I would highly encourage the authors to address the weak evaluation and rewrite unclear paragraphs for readibility.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited technical contributions, weak evaluation and unclear paragraphs are the main factors for my score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel unsupervised method for MRI data harmonization. This approach aims to address untrivial challenges including the modification of neuroanatomy structures and the alteration of biological variability, in MRI harmonization.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method employs data augmentation to facilitate an unsupervised learning process, which is valuable in real-world scenarios where paired scans are not available across different sites.

    2. The authors provide a thorough evaluation to demonstrate the effectiveness of both the data augmentation and the harmonization processes.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed method does not demonstrate superior performance over the baseline method, MISPEL, particularly in terms of structural similarity improvements after harmonization.
    2. While the method shows high effect sizes between low and high SVD groups, the larger effect sizes do not indicate true group differences.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. It appears only very limited number of subjects are involved in the harmonization evaluation.
    2. It would be beneficial to demonstrate a gold standard that validates whether the estimated Cohen’s d values accurately reflect the biomedical differences.
    3. CALAMITI (Contrast Anatomy Learning and Analysis for MR Intensity Translation and Integration) is an unsupervised harmonization method.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel and the evaluation is comprehensive.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors adequately address my concerns.



Review #3

  • Please describe the contribution of the paper

    This paper introduces augmentation techniques that can simulate scanner effects of MRI machines, and use that simulated data to train an image-to-image translation model to harmonize MRI scans. The objective is to reduce brain structure deformations and over corrections in existing harmonization methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The articulation of the paper is really nice which makes the reader easily follow the motivation and contributions behind this research.
    2. This is an important research challenge in the context of medical imaging requiring more concrete developments.
    3. Authors have included relevant baselines for comparison in the manuscript.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Authors have highlighted that this method is an unsupervised task agnostic framework, however, tissue-type augmentations (step-2) do use source and target distribution difference to derive parameters. And since the input to MISPEL are matched simulated images, this acts as voxel-level supervision. It would be great if authors can highlight more explicitly what makes the method unsupervised.
    2. One of the objectives of proposed framework is to reduce over-correction and structural modifications while harmonizing images. However, their results show a great increase in SSIM metric whereas in ablation study SSIM metric does not increase. Increase in SSIM seems counterintuitive when focus is on reducing structural modifications.
    3. Last line in section 3.1 states “our structural similar- ity analysis for harmonization yielded SSIMs similar to that of RAW, suggesting no significant modification and thus no harmonization.” This is misleading, in the case of paired registered images, a decrease in MAE and JD surely suggests harmonization to some extent. One would expect SSIM to remain more or less similar when images are paired at voxel-level. Authors should rephrase/clarify this statement.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would be great if authors can respond to above mentioned comments in weaknesses section to better aid readers. Additionally, since the augmentation parameters included here considered specific set of scanner machines, would the performance of model be limited to similar scanner types; or is it generalizable beyond that? If so, it is not apparent from results & conclusions section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I considered a weak accept because the showcased results are impressive and useful for researchers interested in harmonization of neuroimaging. However, few details of methods are not clear and ambiguous and hence suggesting a rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The reviewer appreciates authors’ rebuttal to provided feedback on the manuscript. Major concerns the reviewer had was related to verbiage as it was not clear if supervision was used in training as the proposed work claims unsupervised settings. Authors clarify this and promises to make changes in the paper for clarification. Considering the contributions, results, reviewers’ comments and rebuttal I would still consider it a ‘weak accept’.




Author Feedback

We thank reviewers for their valuable feedback. We are glad that they 1) found our task “important” (R4), 2) realized our method “sound and novel” (R3, R1) addressing “untrivial harmonization challenges”, which makes it “valuable in real world” (R1), 3) described our evaluation and baselines “through and relevant” (R1, R4), and 4) found our algorithm description “detailed” and our articulation “really nice” (R4). (The main text reference list is used)

a) R4 asked for clarity on supervision in our method. The “supervision” is the scanner effects that appear as dissimilarity in matched images, which are images of a subject taken on the scanners with short gap. Such images show same brain with scanner effects as voxel-wise differences. Matched data made of matched images for a population and is ideal for evaluating harmonization but is inherently small. Ours is one of the largest [3, 4]. We used our matched data as “target data” and used its matched aspect (supervision) for evaluation and neither augmentation nor model training. We did cross-validation and evaluated our methods on the “combination of all tests sets across folds”. For the tissue-type augmentation, we hypothesized that scanner effects appear as tissue distribution differences, which is not the golden truth (scanner effects). We clarify this point in paper.

b) R1 was concerned that MISPEL surpassed our method in SSIMs and that lack of ground truth does not validate the post-harmonization increase of SVD grouping as the accurate biomedical difference. We state that scanner effects appear as image modification and disturbance of biological signals in data. These should be evaluated “alongside”, leading us studying SSIM, bias for biomarkers of AD, and SVD as biological signal. We surpass MISPEL in the last two as more important criteria. Showing “solely” improvement in possibly disturbed biological signal was commonly used for harmonization evaluation in literature. Spec. increase in effect size was conducted in [3, 13]. Thus, golden truth for SVD grouping is not necessary.

c) R4 was confused with SSIM interpretation in our harmonization evaluation and ablation study. We clarify that SSIM was used for “evaluating harmonization” and not showing the anatomy preservation. We expect “minor” increases in SSIMs as the result of increased similarity at the image level. For anatomy, we rely on our evaluation using anatomy-based biomarkers of AD. Please see response a) for more details. For the ablation study, we state that MAE and JD were reported for augmentation removal on “augmented images” and SSIMs were for harmonization evaluation on “matched data”. We thus can conclude that the trained models for ablation removed augmentation but did not harmonize as the result of unchanged SSIMs. We clarify this point in paper.

d) R3 was concerned with our technical contribution. We remind R3 that our first augmentation method was taken from another task and used mainly as a baseline. Our second augmentation is pure novel and beats all methods. Also, we used MISPEL in our pipeline to be able to validate that harmonization is due to augmentation when compared to MISPEL as SOTA.

e) We apologize for ambiguity of our paper to R3. We assure R3 that their concerns will be resolved by brief clarifications in paper, e.g., our objective for “validation on augmentation”. For R3’s concern with “confusing references for matched aspect of target data”, we refer them to response a). We also remind R3 that we defined “matched data”, target data’s demographics and scanner information in Introduction, Section 3, and Supp Table1, respectively. For more clarity, we refer R3 to responses a) and b).

f) Minor concerns We refer R1 and R3 for their concern with size of data for evaluation to response a). We inform R1 that we trained CALAMITI in a supervised setting. We clarify this point. We inform R4 that we will add their comment for limitation of our method to target scanners to the Conclusion.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents goods result for a problem that is of broad interest in the community. The evaluation is appropriate. The presentation of the material should improve.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents goods result for a problem that is of broad interest in the community. The evaluation is appropriate. The presentation of the material should improve.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a novel method for multi-site/scanner data harmonization in neuroimaging via simulation of scanner-related biases/effects through data augmentation. I believe that the method is sufficiently interesting and novel and the main concerns about unclear parts of the paper were addressed sufficiently in the rebuttal. Still a borderline paper, though.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a novel method for multi-site/scanner data harmonization in neuroimaging via simulation of scanner-related biases/effects through data augmentation. I believe that the method is sufficiently interesting and novel and the main concerns about unclear parts of the paper were addressed sufficiently in the rebuttal. Still a borderline paper, though.



back to top