Abstract

Semi-supervised learning (SSL) offers a pragmatic approach to harnessing unlabeled data, particularly in contexts where annotation costs are prohibitively high. However, in practical clinical settings, unlabeled datasets inevitably encompass outliers that do not align with labeled classes, constituting what is known as open-set Semi-supervised learning (OSSL). While existing methods have shown promising results in domains such as natural image processing, they often overlook the nuanced characteristics intrinsic to medical images, rendering them less applicable in this domain. In this work, we introduce a novel framework tailored for the nuanced challenges of \textbf{open}-set \textbf{s}emi-\textbf{s}upervised \textbf{c}lassification (OpenSSC) in medical imaging. OpenSSC comprises three integral components. Firstly, we propose the utilization of learnable prototypes to distill a compact representation of the fine-grained characteristics inherent in identified classes. Subsequently, a multi-binary discriminator is introduced to consolidate closed-set predictions and effectively delineate whether the sample belongs to its ground truth or not. Building upon these components, we present a joint outlier filter mechanism designed to classify known classes while discerning and identifying unknown classes within unlabeled datasets. Our proposed method demonstrates efficacy in handling open-set data. Extensive experimentation validates the effectiveness of our approach, showcasing superior performance compared to existing state-of-the-art methods in two distinct medical image classification tasks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0244_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{He_OpenSet_MICCAI2024,
        author = { He, Along and Li, Tao and Zhao, Yitian and Zhao, Junyong and Fu, Huazhu},
        title = { { Open-Set Semi-Supervised Medical Image Classification with Learnable Prototypes and Outlier Filter } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a semi-supervised classification method that can leverage unlabeled data with open-set classes (safe-SSL open-set SSL/OSSL). They propose an outlier filter that leverages the agreement between multi-binary classification of each closed-set class vs the open set and per-class prototype matching on the closed-set to guide the training of both, plus a closed-set classifier. They use a mean teacher framework to provide pseudo-labels on the unlabeled data. They experiment with the AdaptFormer architecture over pre-trained vision transformers using CLIP, in two medical imaging datasets (ISIC skin cancer and DDR diabetic retinopathy). Experiments consist on closed-set classification results, closed-set+open-set class results and ablation studies on the latter.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The topic of open-set semi-supervised learning is interesting and relevant for medical image analysis
    2. The proposal seems novel, and the reported results seem to demonstrate that it works
    3. Extensive comparison with alternative approaches
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The state of the art revision is poor, and specifically regarding clearly discussing the contribution of the proposal with respect to related work. This hinders the assessment on the actual novelty of the paper. This also lowers the value of the extensive comparison results.
    2. The paper is unclear in many aspects and very hard to follow.
    3. The actual impact on the selected clinical applications for the benchmark are not clear. Application-wise, the contribution does not seem relevant. Therefore the actual impact of the proposal on medical image applications is unclear, and to be demonstrated.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • In page 6, it is said that the “pre-trained encoder is ViT/B16 pre-trained on 400 million image-text pairs [13]”. This is clearly a pre-trained CLIP model version downloaded from somewhere, and trained on specific data, so it is important to be specific about the exact model and version. Specially because this encoder is fixed and not refined.
    • The number of epochs or stopping criterion is not specified
    • No discussion about relevant training settings and parameters of the compared approaches is provided. This is relevant as the reported results using such methods are obtained by the authors of this paper (and on experimental settings and datasets that substantially differ from those of the original authors).
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The methodology section could be clarified for a better understanding of the proposal. The explanation of the methods is quite convoluted and the rationale behind the proposals is not clearly followed from the explanations. Here, referring to related works (such as [9] and [10]), and relating them to the proposal, would help. Equation (5) does not seem to clearly reflect the different behaviour of the proposed learning method depending on the outlier filter confidence score.

    The reported results compare the proposed approach with other open-set ssl related methods. However, the experimental setup for these methods is not detailed, and the reported results are performed by the authors (original papers do not aim to solve these medical image classification problems). As the A detailed discussion on the relation between these methods and the proposed approach, along with a discussion about the fairness of comparison or relevant parameters would be benefit the assessment of the actual contribution of the proposed method.

    Related with the previous comment, it not clearly motivated why this approach may be better suited for medical image classification than any other classification problem in computer vision. As the paper proposal y mainly methodological, interested reader may wonder why not simply replicate the experiments setup of all the other compared methods in CIFAR/ImageNet, etc. to demonstrate the contribution.

    On the contrary, if the aim to actually advance in the medical imaging applications like skin cancer or diabetic retinopathy, comparing with the results with what’s easily achievable using state-of-the-art baseline models should be discussed to demonstrate a contribution to the MIC field. In this sense, the closed-set results look more relevant than those of the open-set, and the ablation study should have been reported on that former setting.

    Regarding the closed-set comparisons, and aiming at demonstrating a contribution to medical image classification applications, the reported results are not competitive with the state of the art in DDR and ISIC. It is clear that only 20% or 30% of the labeled data is used. But for the sake of clarity, two clear baselines are missing: the first using fully-supervised learning on the data scarcity setting (20-30%) without unlabeled data, and the second, the results of the E.1 setting (i.e. plain Mean Teacher). These would demonstrate a contribution of the proposed open-set aware semi-supervised setting against typical approaches to solve the problems under label scarcity by leveraging unlabeled data with probable open-set classes to improve the classification results.

    Regarding the ablation study, I have to comments. First, as already mentioned, they should be reported on the closed-set setting, which should be the target application objective that we can actually evaluate and are interested on from the MIC point of view. Second, the provided discussion is not clear from the provided results. For example, it is argued that the contributions help discriminating better between seen and unseen classes due to their substantial similarity. But in order to evaluate that, a per-class result reporting is necessary, as in the averaged ACC or Recall the improved confusions could come from either a better discrimination among the seen classes instead of a better identification of the unseen ones.

    Finally, in my opinion, there are better settings in MIC to evaluate open-set semi-supervised learning approaches. For example, there is a great diversity of DR datasets available (MESSIDOR, e-Ophtha, IDRiD, to name a few). Even more, unlabeled retinographies with open classes, could be gathered from other related ophthalmic diseases (e.g. like glaucoma, age-related macular degeneration or the multi-pathology ODIR), while still focusing on the DR issue. Such an experiment, could actually demonstrate the value of the proposed method in applied medical image classification by comparing to the state of the art evaluating on full DDR, for example.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the topic is interesting and the proposal seems novel, there are relevant weaknesses that lean my recommendation to reject. The most relevant ones, in my opinion, are the lack of clarity of the contribution with respect to related approaches, and the lack of demonstrated contribution in improving medical image classification against typical settings.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal didn’t actually addressed my major concerns with the paper itself.

    However, I’ve changed my mind after having the context of other referees and the whole rebuttal and now i’m leaning towards accept.



Review #2

  • Please describe the contribution of the paper

    The paper proposed a openset semi-supervised classification method by introducing learnable prototypes, multi-binary discriminators. The learnable prototypes help inliers focused representation. Multi-binary discriminator discriminate each class or not and find outlier. The experiments demonstrate effectiveness on multiple datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Introducing prototypes is reasonable for this open set classification setting.
    • Multi-binary discriminator is simple and effective approach, and it significantly improves performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • About the balance of inliers and outliers. Unlike general image datasets, there is a class imbalance in medical image classification. Does these datasets used in the experiments have class imbalance? In that case, what is the ratio between inliers and outliers? To considering openset classification, class imbalance problem is important.
    • Which side does the proposed modules contribute to inliers or outliers? Learnable prototype may contribute to identify outlier but it may decrease inlier classification performance. The reviewer is interested in the detailed effect of the proposed module.
    • The reason of improvement of performance on closed-set situation. The reviewer could understand the effectiveness for identify outlier by introducing the proposed modules. However, It could not understand the effectiveness for closed-set setting. Why these modules contribute to the closed-set setting?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The reviewer recommended to add explanation about the class balance in the dataset.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose simple and effective module for Open set semi-supervised classification. The modules seems effective and experimental result demonstrate effectiveness.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author’s responses have addressed most of my concerns.



Review #3

  • Please describe the contribution of the paper

    This paper proposed an open-set semi-supervised (OpenSSC) framework, which outperforms other open-set SSL methods in fine-grained medical image classification tasks by enhancing outlier detection accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The authors highlighted the drawbacks of current open-set SSL techniques, which fail to account for subtle visual distinctions between classes in fine-grained medical image classification tasks. (2) To address this issue, the authors introduced an OpenSSC framework which outperforms other competing methods in classification.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The proposed OpenSSC framework is lacking necessary descriptions and analysis of some important components. What is the motivation behind introducing learnable prototypes? How does introducing another classification head (such as F_h) differs from the prototype-based classification branch? What was the idea behind designing the Outlier filter in this way? Why not adopt the state-of-the-art post-processing OOD detection methods to identify the outliers? (2) The lack of essential implementation details makes it challenging to determine the persuasiveness of the comparison results. Do competing methods employ the same backbone as OpenSSC? It is vital to present comprehensive information on the total and learnable parameters amount, GPU cost, and training time cost required for both OpenSSC and all other competing methods. (3) The outlier detection accuracy of OpenSSC and all other competing methods on training set or test set are not presented. However, these results are important to demonstrate the key contribution of the proposed OpenSSC.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) What is the basis for dividing classes in ISIC dataset into inlier classes and outlier classes? (2) Some symbols lack consistent representation within their respective contexts. For instance, the labelled image feature is denoted as f^s in Fig. 1 but as f_i in Section 2.2, while learnable prototypes are represented by c_1 to c_k in Fig. 1 and P in Section 2.2. Additionally, there is a discrepancy between the inputs of unsupervised loss shown in Fig. 1 and Eq.5.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see 5 and 6

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors’ responses have addressed most of my concerns, therefore my decision remains a weak accept.




Author Feedback

Q1: Clarity of contribution (R1) We focus on: 1) Learnable prototype to denote features of seen classes, which mitigates feature conflicts between inliers and outliers; 2) Multi-binary discriminator for outlier identification. Our method takes advantage of unlabeled data mixed with unseen classes guided by limited labeled data. It could be utilized effectively in open clinical settings, where the uncontrol and unseen class images may be collected into the unlabeled set.

Q2: Implementation details (R1, R4) We use ViT-B/16 in CLIP from https://github.com/OpenAI/CLIP as backbone for all methods with the same settings. Total parameters are 88M, trainable parameters are 1M. The server is RTX 3090 with 24GB memory. Training time of all methods are 3~4 minutes per epoch. Training epoch is set to 50 to ensure convergence.

Q3: More details of our framework (R1, R4) Fine-grained classification often shares significant similarities and it is crucial to discern distinctive features during learning. Thus, we propose learnable prototypes to learn seen-class center. We mentioned it in the first paragraph of section 2.2. This is different from other classification task in CV. If we simply adopt these methods in CV for medical domain, it leads to poor results, as shown in Table 1. Equation (5) is only applied for seen-class after the outlier filter. Closed-set classifier is used to obtain class scores to assist multi-binary discriminator to recognize whether the image belongs to inliers or not. After training on labeled data, they all have acquired discriminative features, and can be used jointly as outlier filter. Post-processing OOD detection do not sufficiently learn features of seen and unseen classes, resulting in poor detection results.

Q4: Comparing with SOTA baselines (R1) The aim of this paper is to improve classification performance for open-set SSL. However, SOTA baseline models are trained with only seen samples in supervised manner. The closed-set results in Table 1 are obtained under open-set SSL training, and comparing with the SOTA baseline models is unfair due to different experiments settings.

Q5: Simple baselines (R1) We give the closed-set results follow the format in Table 1: Fully-supervised: (63.34,48.74, 62.11,46.94, 71.85,58.96, 70.25,57.22) plain Mean Teacher: (64.11,49.54, 62.58,47.29, 72.13,59.45, 70.87,57.98)

Q6: Per-class result (R1) Due to page limitation, we only report the average results, as same as [9,14]. We will add per-class result in the supplementary.

Q7: Better settings to evaluate open-set SSL (R1) Collecting datasets of DR as seen classes and other related diseases as unseen classes is more realistic and can better evaluate the effectiveness of our method. This is a promising experimental setup and we will explore it in our future work.

Q8: Class balance in the dataset (R3) The datasets in the experiments have class imbalance, the ratio between outliers and inliers for DDR (575 outliers, 6320 inliers) and ISIC2018 (6847 outliers, 3168 inliers) is 0.09 and 2.16.

Q9: Which side does the proposed modules contribute to inliers or outliers? (R3) The proposed modules contribute to both inliers and outliers, learnable prototype learns the class center of each seen class, and it can help the backbone learn robust features, improving the performance of closed-set setting. Multi-binary discriminator decides whether the current sample belongs to inliers or outlier, and improve the performance of inliers and outliers, and we validate each component in Table 2.

Q10: Outlier detection accuracy (R4) The performance is evaluated on both seen and unseen classes. We regard all unseen classes as a new class and report the Closed-Set and Open-Set performance in Table I following IOMatch [9].

Q11: The basis for dividing ISIC dataset (R4) We can simulate open-set scenario by randomly designating classes as inliers or outliers, tailored to the specific needs of the clinical situation.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes a semi-supervised classification method called safe-SSL open-set SSL/OSSL, which effectively utilizes unlabeled data containing open-set classes. The method includes an outlier filter that uses multi-binary classification and per-class prototype matching to guide the training of a closed-set classifier. The reviewers highlighted the novelty and significant improvements of the method. However, they expressed concerns about the need for a more detailed description of the methodological and analysis parts. The rebuttal addressed these concerns to a large extent. After considering the rebuttal, all reviewers agreed to accept the paper (WA, WA, WA). The meta-reviewer recommends accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposes a semi-supervised classification method called safe-SSL open-set SSL/OSSL, which effectively utilizes unlabeled data containing open-set classes. The method includes an outlier filter that uses multi-binary classification and per-class prototype matching to guide the training of a closed-set classifier. The reviewers highlighted the novelty and significant improvements of the method. However, they expressed concerns about the need for a more detailed description of the methodological and analysis parts. The rebuttal addressed these concerns to a large extent. After considering the rebuttal, all reviewers agreed to accept the paper (WA, WA, WA). The meta-reviewer recommends accepting this paper.



back to top