Abstract

Semi-supervised learning (SSL) has attracted much attention since it reduces the expensive costs of collecting adequate well-labeled training data, especially for deep learning methods. However, traditional SSL is built upon an assumption that labeled and unlabeled data should be from the same distribution e.g., classes and domains. However, in practical scenarios, unlabeled data would be from unseen classes or unseen domains, and it is still challenging to exploit them by existing SSL methods. Therefore, in this paper, we proposed a unified framework to leverage these unseen unlabeled data for open-scenario semi-supervised medical image classification. We first design a novel scoring mechanism, called dual-path outliers estimation, to identify samples from unseen classes. Meanwhile, to extract unseen-domain samples, we then apply an effective variational autoencoder (VAE) pre-training. After that, we conduct domain adaptation to fully exploit the value of the detected unseen-domain samples to boost semi-supervised training. We evaluated our proposed framework on dermatology and ophthalmology tasks. Extensive experiments demonstrate our model can achieve superior classification performance in various medical SSL scenarios.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4152_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4152_supp.pdf

Link to the Code Repository

https://github.com/PyJulie/USSL4MIC

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Ju_Universal_MICCAI2024,
        author = { Ju, Lie and Wu, Yicheng and Feng, Wei and Yu, Zhen and Wang, Lin and Zhu, Zhuoting and Ge, Zongyuan},
        title = { { Universal Semi-Supervised Learning for Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a novel approach to universal semi-supervised learning, enabling the detection of outliers in unseen datasets. Two distinct branches are employed to achieve this goal. The first branch utilizes the distance between features and class prototypes, as well as class estimation, to distinguish outliers at the class level. The second branch leverages a pre-trained Variational Autoencoder (VAE) to identify samples from disparate domains, which can be utilized for domain adaptation to enhance the performance of semi-supervised models. To evaluate the efficacy of this methodology, experiments were conducted on two datasets: a publicly available dataset for skin cancer detection and a private dataset for ophthalmology. The results demonstrate that the proposed approach significantly improves the performance of the semi-supervised algorithm with respect to the considered baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is easy to read and well motivated.
    • The proposed methodology of the dual-outlier approach itself is elegantly simple and novel and can be easily extended to incorporate other backbones.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper has some notable limitations that deserve attention. One of the important aspects for the detection of outliers using a distance metric is the compactness of the in-distribution features and the separation between the classes (see [1-2]). However, the Cross-Entropy loss will not have any impact in any of them, hence there might be samples from unknown classes with small(er) distance than known classes. Also why the authors consider Euclidean distance instead of cosine-similarity? Furthermore, the paper relies heavily on strong assumptions, such as the data distribution following a Gaussian mixture model, without providing sufficient theoretical reasoning or empirical evidence to support these claims.

    • The table is not clear, according to the authors “Close-set SSL. The samples in the labeled and unlabeled data share the same classes and are collected under the same environment” then why the Close-set SSL algorithm have results under PAD-UFES-20 and DermNet? Those datasets are not dermoscopy images as in the ISIC datataset. In the same direction, I do not observe a high impact with respect to the baselines if the known/unknown dataset shared the same imaging methodology (dermoscopy in this case) e.g. OM 70.1 and UASD 70.0. Could the authors provide some explanation for that?

    • There is a lack of exploration into the number of labeled samples, known and unknown data, and expected results. It would be beneficial to see the impact of the proposed algorithm in low-data regimes.

    [1] How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection?. Yifei Ming, Yiyou Sun, Ousmane Dia, Yixuan Li [2] Towards a theoretical framework of out-of-distribution generalization. Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, and Liwei Wang.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • I did not understand the line “We warm up the model using exponential rampup [15] with 80 out of 200 epochs, to adjust the coefficients of adversarial training $\alpha$ and SSL $\beta$.” Do you use the validation set to tune $\alpha$ and $\beta$ or those parameters are updated automatically during training?
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • To enhance readability, consider adding a separate column or row in the table indicating the number of samples used for each dataset. This would provide valuable insight into how the algorithm performs with varying amounts of known and unknown data. Additionally, please clarify what bold and underline mean in the table, I would recommend to consider to run some statistical analysis and bold those results that are statistical significant.

    • The authors mix the notation in the paper $x_{u,d}$, $x_{d,u}$ being used interchangeably. To improve clarity, I recommend using superscripts to distinguish between labeled or unlabel data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper would benefit from a more thorough discussion of the theoretical foundations behind its component choices, as these decisions can have a significant impact on the method’s overall performance and effectiveness. Additionally, the experimentation section could be improved by providing clearer explanations of the results and any visualizations used to illustrate them, allowing readers to better comprehend the implications and limitations of the proposed methodology.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thank you to the authors for addressing all my concerns thoroughly. While the work has some limitations, such as the objective not promoting the compactness of in-distribution features and separation between classes for better OOD detection, it offers valuable practical applications for the MICCAI community. Therefore, I am raising my score to weak accept.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a universal semi-supervised learning framework that aims to tackle the problem of class and domain mismatch in medical image classification. It introduces dual-path outlier estimation for unknown class (UKC) detection and employs a variational autoencoder (VAE) for domain adaptation (UKD) to improve model robustness and performance in semi-supervised settings. The proposed method is tested on dermatology and ophthalmology tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The paper is easy to follow. 2.The paper solves semi-supervised learning under unseen distributions, which is an important ML problem in practice.

    1. Results are evaluated over multiple datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Writing needs slight improvement, although the general idea is conveyed with clarity. Some sentences are confusing and need proof-reading to ensure logic and coherence. E.g. “Meanwhile, to extract unseen-domain samples, we then apply an effective variational autoencoder (VAE) pre-training. After that, we conduct domain adaptation to fully exploit the value of the detected unseen-domain samples to boost semi-supervised training” : Remove “then”; also domain adaptation is a task not something we conduct, we may adopt domain adaptation techniques or losses.
    2. The distribution mismatch between labeled and unlabeled data has been widely explored, and I think authors may need to either compare or mention techniques in the natural domain to show a solid understanding of the field. e.g. DS-SSL, DASO, Class-imbalanced SSL w Adaptive Thresholding… 3.The novelty is limited as the paper borrows some technologies from existing SSL or UDA papers.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The author should focus more on the issues in “unified SSL” rather than presenting the proposal from a technical perspective. Also, comparison with other baselines and writing needs clear improvement.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Authors did an improvement from the previous work in NIPs and applied it in medical datasets. While the novelty is somehow limited, the work could still be a good addition to the medical AI research field. However, more comparison and improvements are definitely needed if it is to be accepted in MICCAI. Please kindly address all the issues mentioned in weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I have maintained my score on (3) weak reject after carefully reading the reviewers comments and author rebuttal.



Review #3

  • Please describe the contribution of the paper

    A novel framework for universal semi-supervised medical image classification is presented. It allows the model to learn from unknow classes and or domains. A scoring mechanism to measure the possibility of an unlabelled sample being from and unknown domain or class is presented. Experiments were performed in different medical fields: dermatology and ophthalmology.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The study of SSL methods is of high importance, as there is limited annotated data specially in the medical field. Moreover, both class and domain mismatch measurement and detection are important for a unified training. The authors present interesting approaches to improve data efficient learning, demonstrating that they are not specific for a single medical field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not clear that the improvements obtained using the method proposed are significant, for instance when compared with close-set SSL (PI), has the average result is higher but at the expensed of the std.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Despite the novelty of the method presented it is not clear the superiority compared with the other literature methods. In the ablation study, did the authors performed statistical tests to confirm the significance of including each component, please confirm. Minor comments: Section 3.3 refers to Table 3 but it should be Table 4.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work is complete and reasonable and has practicality. The experiments in different scenarios are interesting. However, the comparison of the methods with the literature is not totally clear. And the ablation study could benefict with clear discussion. Please refer to the weaknesses portion and justification of the recommendation for the limitations.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I have maintained my score on (4) weak accept after carefully reading the reviewers comments and author rebuttal.

    I believe the study is relevant for the MICCAI community since it addresses SSL under unseen circumstances and the results are evaluated in different domains.

    I agree that the paper will benefit in having better justification of the choices made in the methodology and further explanations on the results.




Author Feedback

We appreciate reviewers’ positive comments “The work is complete and has practicality to the medical AI research field (R1, R4). The method is novel and effective and can be applied to many medical scenarios (R1, R3).”

Responses:

*Ablation study (R1) We defined the datasets from the left column to the right column in Table 4, ranging from close domain to far domain (stated in Supp. document). Such definition helps investigate the components that contribute to the overall performance. In summarize, the larger the domain gap from the unlabeled data, the more significantly the removal of both the domain separation module (CDS) and the domain adaptation term harms performance. This effect is also observed under conditions of class mismatch (DOE). We will include statistical test results in the ablation study in the revised version if permitted.

*The use of Euclidean distance (R3) Thanks for the valuable references. Following the agreement maximization principle, we focused on the consistency of two augmented samples to assess UKC with satisfactory results obtained. Considering the cosine-similarity may also work, calculating Euclidean distance for the unsupervised term is less computationally expensive, which benefits training speed.

*Guassian Mixture Model (R3) GMM is used to model the loss values output by VAE to distinguish the domains of unlabeled data. Using a GMM does not strictly require the data to follow a Gaussian distribution, as it can model data as a weighted sum of several Gaussian distributions. Even if the overall data distribution is not Gaussian, GMM can approximate it well by adjusting the parameters of the individual Gaussian components. While using the loss values output by VAE as a binary prediction with a threshold can yield similar results, we found that GMM can better model the overall distribution of the likelihood of OOD on unlabeled samples.

*Table 1 not clear (R3) As mentioned in Sec. 3.1, “We sample 30% close-set samples and all open-set samples from the remaining 17,331 instances to form the unlabeled dataset.” Therefore, the dataset used in each column is obtained from a mixed dataset with 30% close-set samples and the specified open-set samples. For example, the column “Derm7pt” indicates that the unlabeled data is from “30% ISIC and Derm7pt”. Please refer to our Supp. Table 1 for details on the open-set settings. We will carefully revise Table 1 to improve readability.

*Warm up (R3) In traditional SSL settings, warm-up is a widely-used technique to ensure the generation of high-quality pseudo-labels. The model first learns from labeled data and then gradually increases the weight of the unsupervised term to incorporate unlabeled data. In this work, we did not deliberately tune the coefficients of the unsupervised/DA term but simply followed the settings in [15].

*More comparison methods (R4) We appreciate the valuable references. We conducted the benchmark following [9], including 6 classic close-set SSL methods, 5 recent open-set SSL methods, and 1 directly related universal SSL method. Our methods are simple, plug-and-play, and effective across most SSL methods. We will include more related references with detailed discussions in the revised version.

*Novelty (R4) Our primary focus is on the assessment of UKC and UKD. We intentionally kept the SSL and DA components simple, to better investigate the factors contributing to the performance of our overall framework. We evaluated our method on two medical domains to validate its generalization, with the aim of addressing realistic scenarios involving extensive but chaotic unlabeled data.

*Reproducibility (R4) We will release the code for this work and implementations of compared methods. We hope this codebase will benefit the research community on related problems.

*Others We apologize and will carefully revise some writing issues.

We hope this response can address your concerns and sincerely thank you in advance for your consideration.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a new universal semi-supervised medical image classification method that mitigates class and domain mismatch problems. The paper received (weak accept -> weak accept, weak reject-> weak accept, weak reject -> weak reject) scores (before->after rebuttal). The strengths found be the reviewer are: relevance of semi-supervised learning given the small datasets in medical imaging, interesting solutions to improve data efficient learning, paper is easy to read, and the method is elegant. As for the weaknesses, the reviewers raised the following issues: unclear relevance of improvements (particularly compared with close-set semi-supervised learning); objective not promoting the compactness of in-distribution features and separation between classes for better OOD detection; and lack of exploration of the number of labeled samples, known and unknown data, and expected results. This paper certainly has positive and negative aspects, but the positive points slightly outweigh the negative ones.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents a new universal semi-supervised medical image classification method that mitigates class and domain mismatch problems. The paper received (weak accept -> weak accept, weak reject-> weak accept, weak reject -> weak reject) scores (before->after rebuttal). The strengths found be the reviewer are: relevance of semi-supervised learning given the small datasets in medical imaging, interesting solutions to improve data efficient learning, paper is easy to read, and the method is elegant. As for the weaknesses, the reviewers raised the following issues: unclear relevance of improvements (particularly compared with close-set semi-supervised learning); objective not promoting the compactness of in-distribution features and separation between classes for better OOD detection; and lack of exploration of the number of labeled samples, known and unknown data, and expected results. This paper certainly has positive and negative aspects, but the positive points slightly outweigh the negative ones.



back to top