Abstract

Accurate and robust classification of diseases is important for proper diagnosis and treatment. However, medical datasets often face challenges related to limited sample sizes and inherent imbalanced distributions, due to difficulties in data collection and variations in disease prevalence across different types. In this paper, we introduce an Iterative Online Image Synthesis (IOIS) framework to address the class imbalance problem in medical image classification. Our framework incorporates two key modules, namely Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS), which collectively target the imbalance classification issue at both the instance level and the class level. The OIS module alleviates the data insufficiency problem by generating representative samples tailored for online training of the classifier. On the other hand, the AAS module dynamically balances the synthesized samples among various classes, targeting those with low training accuracy. To evaluate the effectiveness of our proposed method in addressing imbalanced classification, we conduct experiments on the HAM10000 and APTOS datasets. The results obtained demonstrate the superiority of our approach over state-of-the-art methods as well as the effectiveness of each component. The source code is available at https://github.com/ustlsh/IOIS_imbalance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0901_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ustlsh/IOIS_imbalance

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Li_Iterative_MICCAI2024,
        author = { Li, Shuhan and Lin, Yi and Chen, Hao and Cheng, Kwang-Ting},
        title = { { Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present an Iterative Online Image Synthesis (IOIS) framework designed to tackle the class imbalance issue in medical image classification. This framework comprises two essential modules: Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS), which collectively address imbalance classification at both instance and class levels. The OIS module addresses data insufficiency by generating representative samples for online classifier training, while the AAS module dynamically balances synthesized samples across classes with low training accuracy. Experimental evaluations on the HAM10000 and APTOS datasets demonstrate the superior performance of the proposed method compared to state-of-the-art approaches, highlighting the effectiveness of each component. The authors plan to release the source code upon acceptance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper highlights the effectiveness of a new approach in dealing with imbalanced classification in medical image analysis. The method proposed in the paper surpasses current techniques when tested on datasets such as HAM10000 and APTOS. It demonstrates better results in creating synthetic images and enhancing classification accuracy for both the majority and minority classes. The approach combines re-weighting, re-sampling, and data synthesis techniques to achieve balanced classification outcomes, essential for precise medical image analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper’s main weaknesses include not thoroughly comparing with existing methods in terms of computational efficiency and scalability. While the proposed approach shows promising results for addressing imbalanced classification in medical image analysis, there is limited discussion on the resources needed for implementation and scalability to larger datasets. Previous research has highlighted the importance of considering computational efficiency in developing solutions for imbalanced classification. Moreover, the paper could improve by conducting a more extensive comparison with a broader range of existing methods to better showcase the strengths and limitations of the proposed approach.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper introduces an Iterative Online Image Synthesis (IOIS) framework for addressing imbalanced classification in medical image analysis, incorporating Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS) modules. While the integrated strategy effectively targets both instance and class-level imbalances, the paper lacks detailed discussion on computational efficiency and scalability, which are crucial for practical applications. A more comprehensive comparison with existing methods and exploration of potential implementation challenges would enhance the paper’s contribution and practical relevance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weak accept recommendation for this paper is justified based on several factors:

    1. Novelty: The paper introduces a novel approach, the Iterative Online Image Synthesis (IOIS) framework, to address the challenging problem of imbalanced classification in medical image analysis. This framework incorporates two key modules, Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS), which collectively target both instance and class-level imbalances.
    2. Effectiveness: The experimental results demonstrate the superiority of the proposed IOIS framework over state-of-the-art methods on two medical image datasets, HAM10000 and APTOS. The effectiveness of each component of the framework is also demonstrated, indicating its potential practical utility in real-world scenarios.
    3. Contribution: The paper makes a significant contribution to the field by proposing a comprehensive strategy to tackle the challenges posed by imbalanced datasets in medical image classification. By addressing
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The paper’s main weaknesses are the insufficient comparison with existing methods regarding computational efficiency and scalability. While the proposed approach yields promising results for addressing imbalanced classification in medical image analysis, it lacks a detailed discussion on the resources required for implementation and scalability to larger datasets. Previous research has emphasized the importance of considering computational efficiency when developing solutions for imbalanced classification. Additionally, the paper would benefit from a more extensive comparison with a broader range of existing methods to better highlight the strengths and limitations of the proposed approach.



Review #2

  • Please describe the contribution of the paper

    The paper introduces a novel framework termed the Iterative Online Image Synthesis (IOIS) framework to tackle the issue of class imbalance in medical image classification. This framework comprises two main modules: Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS), which collectively aim to address the imbalance classification problem at both the instance and class levels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    First, the motivation provided is weak. The paper states, ““However, existing methods typically generate images independently before commencing the training process, keeping the training images unchanged throughout. This approach can lead to overfitting on the synthetic images due to the discrepancy between the generator and the downstream classifier. Moreover, the portion of synthetic images for each class is often determined manually in these methods, which may not align with the dynamic requirements of the classifier for each class during the training process.”, without offering any citations to support these claims.

    Second, the paper’s evaluation is lacking. It is expected that the paper would compare its approach against existing state-of-the-art large language-based data augmentation methods to demonstrate how the proposed method surpasses them.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    NA

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    NA

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Similar to R1, I am not convinced that the paper performs a fair comparison with current techniques.



Review #3

  • Please describe the contribution of the paper

    The authors introduce two novel techniques, Online Image Synthesis, and Accuracy Adaptive Sampling, that work in symbiosis, to tackle class imbalance in medical classification problems. These modules dynamically generate classes that are under-represented during training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper addresses a significant problem in medical image analysis, one that has not been tackled by the community as much as others. The introduction of a diffusion model, guided by the classifier and its performance, to generate images that are under-represented represents a novelty. The methodology is well explained and the experiments, and comparison with the state-of-the-art, seem to provide enough evidence to support the claim. The paper is well written, and understandable. Language and style are acceptable. There is no concern about reproducibility: the authors used public datasets and state that the code will be made available upon acceptance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The only thing would have liked the author comment on, was whether the accuracy a good metrics to describe the representation of a class, given how much that can be influenced by only a few examples in minority classes.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors introduce two novel techniques, Online Image Synthesis, and Accuracy Adaptive Sampling, that work in symbiosis, to tackle class imbalance in medical classification problems. These modules dynamically generate classes that are under-represented during training. The paper addresses a significant problem in medical image analysis, one that has not been tackled by the community as much as others. The introduction of a diffusion model, guided by the classifier and its performance, to generate images that are under-represented represents a novelty. The methodology is well explained and the experiments, and comparison with the state-of-the-art, seem to provide enough evidence to support the claim. The paper is well written, and understandable. Language and style are acceptable. There is no concern about reproducibility: the authors used public datasets and state that the code will be made available upon acceptance. The only thing would have liked the author comment on, was whether the accuracy a good metrics to describe the representation of a class, given how much that can be influenced by only a few examples in minority classes. I recommend acceptance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strong methodology, experiments, results. Claims are sustained. Data and code public or to be made public.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ comments and insightful suggestions. They consider our proposed idea of bridge data augmentation with the mainstream classifier novelty (R1, R3). The problem we addressed is significant (R1, R3). The paper is well-written (R4). Below we address the reviewers’ concerns.

R1: We appreciate the positive comments regarding novelty, effectiveness and contribution for the weak accept recommendation in Q12. However, we noticed that the rating given in Q11 showed Weak Reject. We kindly request clarification of this inconsistency and hope you can adjust the rating accordingly.

R1Q1: Discuss computational efficiency and scalability. For computational efficiency, our IOIS method increases the computational cost a little, approximately 40% longer time and 30% more GPU memory for iterative image generation than the compared imbalanced methods. However, it is worth noting that our method is efficient, as it solely utilizes the inference stage of diffusion model. For dataset scalability, we have conducted experiments on two datasets of varying scales to demonstrate the generalizability and scalability of our method. These experiments effectively show the applicability of our method to larger datasets, highlighting its potential for dataset scalability.

R1Q2: Compare with more methods. We argue that our paper includes comprehensive experiments. We have compared seven state-of-the-art methods specifically designed to address the imbalanced classification issue. These methods contain three main solutions to imbalance problems, including re-weighting, re-sampling, and GAN-based synthetic methods.

R3Q1: Whether accuracy is a good metric. We agree that accuracy may not be a suitable metric for imbalanced classification task, as it often fails to reflect the performance of minority classes. A few wrong predictions affect a little for accuracy while they may impact the minority classes a lot. Consequently, to better assess the performance on imbalanced datasets, we apply Macro-F1, Balanced Accuracy, and MCC. These metrics provide a more comprehensive understanding of the model’s performance by considering the performance of all classes, including the minority ones.

R4Q1: Motivation needs citations. We add corresponding citations to support our motivation and make further explanations. Firstly, separating the image generation and downstream classification tasks may lead to overfitting due to the lack of synthetic diversity or model collapse problem [1]. The high similarity of synthesized images makes the classifier easy to distinguish the training set, while showing poor results for test. Secondly, the fixed portions of synthetic images for each class may not align with the dynamic requirements of the classifier during training [2], because the difficulties in classifying different classes vary for each epoch. Moreover, we have conducted ablation studies in Table 1, where the experiment of “Ours (+offline)” separates the generation and classification and fixes the synthesized image size during classifier training. The results demonstrate our claim. [1] Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging, 2023. [2] OnlineAugment: Online Data Augmentation with Less Domain Knowledge. ECCV, 2020.

R4Q2: Compare with large language-based data augmentation methods. We argue that comparing our method with large language-based data augmentation methods is not suitable for our application due to two reasons: Firstly, directly using large language-based models to generate medical images yields poor performance due to the lack of domain-specific knowledge [3]. Secondly, training these large language-based models requires significantly more computational resources than our method, which leads to an unfair comparison. [3] A domain-specific next-generation large language model (LLM) or ChatGPT is required for biomedical engineering and research. Annals of Biomedical Engineering, 2024.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Two reviewers have raised concerns on the comparison experiments and they are not fully convinced that the rebuttal addresses these issues. However the experiments seem to provide sufficient justification for accepting the paper. The rebuttals also do well in addressing the issue around citations

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Two reviewers have raised concerns on the comparison experiments and they are not fully convinced that the rebuttal addresses these issues. However the experiments seem to provide sufficient justification for accepting the paper. The rebuttals also do well in addressing the issue around citations



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I agree with meta reviewer 1

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I agree with meta reviewer 1



back to top