Abstract

Notable progress has been made in medical image segmentation models due to the availability of massive training data. Nevertheless, a majority of open-source datasets are only partially labeled, and not all expected organs or tumors are annotated in these images. While previous attempts have been made to only learn segmentation from labeled regions of interest (ROIs), they do not consider the latent classes, i.e., existing but unlabeled ROIs, in the images during the training stage. Moreover, since these methods rely exclusively on labeled ROIs and those unlabeled regions are viewed as background, they need large-scale and diverse datasets to achieve a variety of ROI segmentation. In this paper, we propose a framework that utilizes latent classes for segmentation from partially labeled datasets, aiming to improve segmentation performance, especially for ROIs with only a small number of annotations. Specifically, we first introduce an ROI-aware network to detect the presence of unlabeled ROIs in images and form the latent classes, which are utilized to guide the segmentation learning. Additionally, ROIs with ambiguous existence are constrained by the consistency loss between the predictions of the student and the teacher networks. By regularizing ROIs with different certainty levels under different scenarios, our method can significantly improve the robustness and reliance of segmentation on large-scale datasets. Experimental results on a public benchmark for partially labeled segmentation demonstrate that our proposed method surpasses previous attempts and has great potential to form a large-scale foundation segmentation model.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1324_paper.pdf

SharedIt Link: https://rdcu.be/dZxdw

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_26

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_Exploiting_MICCAI2024,
        author = { Zhao, Xiangyu and Ouyang, Xi and Zhang, Lichi and Xue, Zhong and Shen, Dinggang},
        title = { { Exploiting Latent Classes for Medical Image Segmentation from Partially Labeled Datasets } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {273 -- 282}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposes a framework for medical image segmentation from partially labeled datasets by leveraging latent classes. The framework takes into account the existing but unlabeled ROIs in the images during training. The framework consists of an ROI-aware network that detects the presence of unlabeled ROIs and forms latent classes, which are used to guide the downstream segmentation. Additionally, the framework uses a teacher-student structure to constrain ROIs with ambiguous existence. The student network is supervised by segmentation losses from both labeled ROIs and latent classes, and is regularized by the consistency loss with the teacher network. Experimental results are shown on a benchmark dataset consist of several challenges for segmentation of abdominal organs and tumors.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Solving partial label problem in medical image segmentation is important especially for training large models.
    2. Evaluation is conducted on multiple challenge datasets and the organs included are comprehensive.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed framework lacks novelty as the key components are borrowed from a previous publication with limited modifications.
    2. The quantitative analysis lacks a complete ablation study, and the comparison with other methods has not included a necessary set of state-of-the-art (SOTA) methods.
    3. Compared with DoDNet the proposed approach brings limited improvements.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is not clear whether the code will be shared based on descriptions in the manuscript.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The key model elements proposed by the manuscript are introduced without necessary context. Examples include: The Student-Teacher framework is introduced in Section 2.3 without descriptions on the exact motivation and reasoning. The introduction of the ResNet-18 for latent class generation in 2.2 also looks arbitrary. More context should be provided on why these model elements are selected.
    2. Compared with the original Mean Teacher framework introduced in reference [15], the proposed approach has made very limited changes. This includes the loss function in equation 4 which is borrowed directly from [15].
    3. The quantitative evaluation in general lacks many necessary details. Recommendations include first a complete ablation study and second a comparison with a standard set of SOTA methods in segmentation with partial labels. A more complete subset of methods can be found from the following reference: Liu, H., Xu, Z., Gao, R., Li, H., Wang, J., Chabin, G., … & Grbic, S. (2024). Cosst: Multi-organ segmentation with partially labeled datasets using comprehensive supervisions and self-training. IEEE Transactions on Medical Imaging.
    4. The experiment should consider using more up-to-date benchmark datasets, such as the AbdomenAtlas-8K dataset.
    5. The following references should be included and comparisons should be conducted: Liu, T., Zhang, X., Han, M., & Zhang, L. (2023). A Lightweight nnU-Net Combined with Target Adaptive Loss for Organs and Tumors Segmentation. Liu, H., Xu, Z., Gao, R., Li, H., Wang, J., Chabin, G., … & Grbic, S. (2024). Cosst: Multi-organ segmentation with partially labeled datasets using comprehensive supervisions and self-training. IEEE Transactions on Medical Imaging. Wu, Y., Wang, E., & Shao, Z. (2023). Fast abdomen organ and tumor segmentation with nn-UNet. Luo, J., Chen, Z., Liu, W., Liu, Z., Qiu, B., & Fang, G. (2023). AdaptNet: Adaptive Learning from Partially Labeled Data for Abdomen Multi-Organ and Tumor Segmentation. Zhao, Y., Hu, P., & Li, J. (2023, July). Partial Label Multi-organ Segmentation based on Local Feature Enhancement. In 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 1-4). IEEE.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the manuscript needs to have more novelty and the quantitative analysis is incomplete.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors made some efforts to address concerns from other reviewers, although the explanations are not particularly convincing to my questions, e.g. the selection of the architecture and classification model. Overall, I can see adjusting the rating to a slightly higher grade, but cannot recommend Accept directly.



Review #2

  • Please describe the contribution of the paper

    This paper tackles an important task, in which the training data are only partially labeled. To achieve this, the authors train a patch-based classifier to identify the presence of unlabeled ROI. Then, a simple consistency loss is adopted to regularize the predictions of the student and the teacher networks .

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The tackled task is important and interesting.
    2. The proposed method is easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Novelty. The key novelty of this paper lies in that a patch-based classifier to identify the presence of unlabeled ROI. Then, a simple consistency loss is adopted to regularize the predictions of the student and the teacher networks. However, the above techniques are simply derived from the existing work.
    2. Unfair comparision. The classifier is trained by using a large-scale dataset, whereas the compared methods are not.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The discussion on the classifier should be included, as the proposed framework heavily relies on the performance of the classifier.
    2. Unfair comparision. The classifier is trained by using a large-scale dataset, whereas the compared methods are not.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the novelty is limited, the framework seems promising and the main idea is reasonable.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the rebuttal from authors. It seems that the proposed method used additional dataset (TotalSegmentator), where the compared method does not. The authors are suggested to add the discussion on why the existing work can not utilize the the above dataset for training. Thus, I keep my score.



Review #3

  • Please describe the contribution of the paper

    In this work, authors propose a mean-teacher training setup with pseudo labels, where the usage of pseudo labels is guided by another pre-trained classification model. Authors show that such a setup improves segmentation metrics for the MOTS benchmark.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty. Authors propose a new method that helps to improve segmentation models from the partially annotated datasets. Extendability. The method can potentially improve other segmentation models trained on partially annotated data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Validation: authors use only Dice score as a segmentation metric, while the reference work that authors mention (DoDNet) compares Hausdorf distance as well. In addition, it is unclear why the task-specific networks have not been compared with the proposed method. E.g. nnUnet, which was a winner of KiTS2019 could be compared in this setting. Complexity: the training setup seems to be complicated since requires careful optimization of multiple hyperparameters (e.g. thresholds for the ROI-guidance). Thus, the extendability to other tasks may require additional hyperparameters search.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would recommend adding more segmentation metrics for the evaluation part (if they are already present for the baseline model). This will help to support the segmentation improvement claims. For future work, I would recommend:

    • Add more state-of-the-art comparisons, e.g. with the task-specific networks like kidney+tumor segmentation
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Complexity of the model, lack of comparisons, and metrics. However, authors showed improvement in some segmentation metrics.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I have reviewed the authors’ rebuttal, and they have partially addressed my questions. However, I do not agree with their comment regarding HD, so I have decided to maintain my original rating.




Author Feedback

Q1: The framework lacks novelty (R1, R3). A1: The core idea of our method is to explore the possible latent classes (i.e., existing but unlabeled ROIs) to guide the partially-labeled segmentation. Compared with previous attempts and semi-supervised learning, the main differences include: 1) We are aware of the latent classes in partially labeled segmentation and propose to utilize them to improve segmentation performance, under the guidance of an independent ROI classifier. Such a design could ensure that different ROIs could contribute differently to the network and improves the effectiveness of the framework. 2) We propose to regularize ROIs with enough certainty by CE loss with sharpening operations. These ROIs are regarded as latent classes and should be utilized to improve the segmentation quality by reducing the prediction entropy in them. 3) We utilize consistency loss on ambiguous ROIs. Compared with latent classes, as these ROIs may contain informative clues that could benefit the training, and the model should yield close predictions under perturbations.

Q2: The classifier is trained by using a large-scale dataset, whereas the compared methods are not (R1). A2: The dataset size of TotalSegmentator is not significantly larger than the segmentation dataset (1153 in TotalSegmentator v.s. 1155 in segmentation dataset). MOTS benchmark does not contain organ labels for lungs and colon and thus we utilize TotalSegmentator to train the ROI classifier.

Q3: The quantitative analysis lacks ablation study, and the comparison does not include a set of SOTA methods (R3). A3: Table 1 in the manuscript serves as the ablation study as well. The main contribution of our method is latent class mining, which involves the segmentation loss on latent ROIs and the consistency loss on ambiguous ROIs. If both losses are removed, the proposed framework degenerates to mean-teacher (2nd row in Table 1). Also, we investigate the consistency losses on ambiguous ROIs in the 3rd row in Table 1. DoDNet, 4th row in Table 1, serves as a previous state-of-the-art in partially labeled segmentation. We would include a more thorough comparison with SOTA methods in partially labeled segmentation in future works.

Q4: The Student-Teacher framework is introduced without descriptions on exact motivation and reasoning. The introduction of the ResNet-18 for latent class generation looks arbitrary (R3). A4: We adopt mean-teacher as latent classes mining requires pseudo labels, whereas mean-teacher serves as a simple yet effective method to generate them. We adopt ResNet-18 as the classifier as the identification of ROIs is a relatively easy task, while the lightweight nature of ResNet-18 is ideal to guarantee latent class mining performance with little extra cost.

Q5: The training setup requires careful optimization of multiple hyperparameters (e.g., thresholds for the ROI-guidance), which affects the extendability to other tasks (R4). A5: The performance of our framework is not sensitive to thresholds, as a well-trained classifier would always output very confident logits to all ROIs, no matter which ROIs the model is identifying.

Q6: The discussion on the classifier performance should be included (R1). A6: Our pilot studies reveal that the overall accuracy of ROI identification exceeds 96%, which would not set restrictions on the proposed framework. We would add relative discussions in future works.

Q7: Authors use only Dice score as a segmentation metric (R4). A7: We do not include HD as the images are anisotropy in spacing, where HD might lead to a biased report of performance in the axis with a large spacing.

Q8: Task-specific networks and challenge winners have not been compared with the proposed method (R4). A8: We do not compare with challenge winners as our experimental settings are different from challenges with fewer training tricks. Plus, the comparison of task-specific networks has been thoroughly discussed in previous works like DoDNet.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed the major concerns raised by the reviewers. Positive reviews outweigh negative opinions. The authors should revise the paper as promised.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed the major concerns raised by the reviewers. Positive reviews outweigh negative opinions. The authors should revise the paper as promised.



back to top