Abstract

Deep learning-based medical image recognition requires a large number of expert-annotated data. As medical image data is often scarce and class imbalanced, many researchers have tried to synthesize medical images as training samples. However, the quality of the generated data determines the effectiveness of the method, which in turn is related to the amount of data available for training. To produce high-quality data augmentation in few-shot settings, we try to adapt large-scale pre-trained generative models to medical images. Specifically, we adapt MAGE (a masked image modeling-based generative model) as the pre-trained generative model, and then an Adapter is implemented within each layer to learn class-wise medical knowledge. In addition, to reduce the complexity caused by high-dimensional latent space, we introduce a vector quantization loss as a constraint during fine-tuning. The experiments are conducted on three different medical image datasets. The results show that our methods produce more realistic augmentation samples than existing generative models, with whom the classification accuracy increased by 5.16%, 2.74% and 3.62% on the three datasets respectively. The results demonstrate that adapting pre-trained generative models for medical image synthesis is a promising way in limited data situations.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2585_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/YuanZhouhang/VQ-MAGE-Med

Link to the Dataset(s)

https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000 https://odir2019.grand-challenge.org/dataset/

BibTex

@InProceedings{Yua_Adapting_MICCAI2024,
        author = { Yuan, Zhouhang and Fang, Zhengqing and Huang, Zhengxing and Wu, Fei and Yao, Yu-Feng and Li, Yingming},
        title = { { Adapting Pre-trained Generative Model to Medical Image for Data Augmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper is highly relevant to MICCAI as it introduces a method for adapting large-scale pre-trained generative models to efficiently generate high-quality data augmentations in few-shot settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The evaluation is executed very well.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    NA

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    NA

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned above.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a data augmentation method based on a large pre-trained generative model. The method integrated a masked image modeling-based generative model (MAGE) and an Adapter with VQGAN to synthesize data with higher quality. The authors also propose a vector quantization loss to improve the performance of generative model. The proposed method was tested on 3 datasets and achieved better performance compared to cutting-edge methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written. It proposes to use a large pre-trained generative model to obtain high quality synthetic data and improve the performance of subsequent tasks.
    2. This paper provided extensive evaluation in 3 medical image datasets and the proposed work shows superior performance compared to other commonly used generative models.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Novelty: this method combines MAGE [1] and an Adapter [2] with VQGAN as the framework of the generative model, in order to improve the quality of synthetic data and avoid catastrophic forgetting problem. Of these parts, MAGE is an effective pre-trained generative model; and the Adapter paper claims that “it can be plug-and-play in different Transformers” to avoid catastrophic interference. Then in addition to simply combining the two modules, are there any modifications made to further improve the performance? [1] Tianhong Li, Huiwen Chang, Shlok Mishra, Han Zhang, Dina Katabi, and Dilip Krishnan. Mage: Masked generative encoder to unify representation learning and image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2142–2152, 2023. [2] Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022
    2. The results in Table 5 indicates better performance of the proposed method. Are these results based on only one experiment for each method? It is important to provide more analysis on whether the improvement is meaningful or not.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. (Please see the main weaknesses) If the mesh and/or Adapter in the proposed method is different/improved from the existed methods in [1] and [2], it is recommended to specify that in Section 2.1. [1] Tianhong Li, Huiwen Chang, Shlok Mishra, Han Zhang, Dina Katabi, and Dilip Krishnan. Mage: Masked generative encoder to unify representation learning and image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2142–2152, 2023. [2] Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022
    2. In the result tables, it would be better to provide some analysis results (e.g. statistical test) on whether the improvement is meaningful or not.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The clarity of the paper.
    2. The extensive experiments on both generation performance and subsequent classification performance in multiple datasets.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors extend a generative model MAGE, with an adapter module and vector quantization loss to adapt it to data augmentation for the medical image domain. The paper details the model and evaluates the proposed model on three medical image datasets, by generating synthetic images and evaluating the quality by Frechet Inception Distance (FID) compared to FastGAN, StyleGAN2, LDM and MAGE baselines. Furthermore, the utility of the generated images are evaluated by utilizing them for data augmentation for the training sets of ViT and Swin Transformer image classifiers. The proposed method generally improves image generation and the performance of the evaluated classifiers.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Adapting large pretrained generative models for medical image generation to provide synthetic training data is required, which this paper addresses. The proposed architecture is a novel adaptation to this domain, and is well founded, the authors describe and support their architecture choices in detail. The generation of synthetic images is strongly evaluated on three medical datasets against sota baselines. The proposed method generally improves the FID scores of the generated images, while the limitations of the approach is well described. The authors further test the generated images by utilizing it for data augmentation and training of sota image classification architectures, ViT and Swin, where they show good improvements compared the other baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors should discuss more on what did they based on how many images were generated, as for different classes there are different number of images were generated (even if we count training+generated), there are few classes which has the same number of total training images others have more. (i.e.: HAM10000, train+generated: bkl: 2000, but nv is 10000).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is reproducible as is, but it would be nice if the code was released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The formatting of the tables should be synchronized, in Table 2 there is a bold line between baselines and “ours” on others it is not bold. Could put the Table 2-3-4 into same order as they are in Table 1 (3-2-4). On page 7: “Furthermore, we analyze the categories in which our method exhibits significant disadvantages. Notably, the three classes with the largest gaps are class N in the ODIR dataset (with an average difference of 18.0 from the optimal method), and classes nv and bcc in the HAM10000 dataset (with average differences of 22.8 and 24.3 from the optimal method, respectively). Table 1 shows that these categories have relatively high sample sizes of 3104, 6705 and 514 respectively.” – Here calling HAM100000 bcc 514 relatively high sample size is quite a stretch, it is just the 4th most numerous class out of the 7 classes. Could consider adjusting this statement.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a state of the art method for medical image generation, and strongly validate it. The topic is highly relevant and timely.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

First of all, I would like to express my gratitude to you for your insightful comments and suggestions on this paper, which were delivered in a constructive and objective manner. I will provide a preliminary response to some of the review comments and make the necessary modifications to the paper in the subsequent camera-ready version. For reviewer 1: Thank you very much for your review. You have given us an objective evaluation and score on our work. We have noticed that your review content does not mention questions and doubts about our work. If you have any further questions that have not been recorded in the system or if you have any other questions, please do not hesitate to contact us at any time. We are happy to continue to answer any questions you may have. For reviewer 3: First of all, thank you for your review, you have raised a weakness in our article, as for “different classes there are different number of images were generated (even if we count training+generated), there are few classes which has the same number of total training images others have more.” Your consideration is thoughtful. We set a different amount of generation for some classes in the datasets, because the amount of training data for these classes in the datasets is too different, which leads to a relatively low performance if a very strict setting method is adopted. We will add a discussion on this setting in Chapter 3.1 of the paper in the camera-ready version. Other comments raised by reviewers will be modified in the camera-ready version. Bold line will not appear in Table 2. We will adjust the order of the datasets in Table 1 to match the order in Table 2-3-4. The ‘bcc’ class from HAM10000 will be discussed separately in the discussion on page 7. We are grateful for your comments and suggestions. For reviewer 4: Thank you for your comments and suggestions. You have raised two weakness in our article. First you asked whether the MAGE and Adapter modules were simply combined. In fact, the method you have mentioned is merely a baseline for our proposed method, which can provide the function of image generation. However, there is still room for improvement under the condition of few-shot setting. Therefore, we have also proposed a quantified loss to constrain the intermediate features. The results of the experiment presented in Table 5 demonstrate the efficacy of the introduced VQ loss. This constitutes our complete method. We hope that this response addresses your queries. The second identified weakness is the desire for further analysis to ensure the promotion is meaningful. First of all, in Table 5, we employ 5-fold cross-validation for each column, with the median serving as the final result. In addition to the aforementioned analysis, we have also conducted experiments with the objective of enhancing the classification performance of our method with respect to a number of specific sample classes. The experiment tested the classification results of each subcategory and found that the performance improvement after augmentation was significant in some few sample classes. For example, for HAM10000 dataset with 7 classes (‘ BKL ‘, ‘nv’, ‘df’, ‘MEL’, ‘vasc’, ‘BCC’, ‘akiec), the classification accuracy before augment (%) are 37.2, 31.2, 47.5, 52.3, 39.8, 92.2, 28.6, while the classification accuracy after augment (%) are 67.4, 43.0, 43.3, 75.0, 36.3, 90.9, 65.7. It can be observed that the improvement in the final classification performance of our method is largely attributable to the notable enhancement in the classification performance observed in several categories with small sample sizes. This growth is meaningful. We hope that this experiment will provide the answers you seek. We are grateful for your valuable comments.




Meta-Review

Meta-review not available, early accepted paper.



back to top