Abstract

The segmentation of the pubic symphysis and fetal head (PSFH) constitutes a pivotal step in monitoring labor progression and identifying potential delivery complications. Despite the advances in deep learning, the lack of annotated medical images hinders the training of segmentation. Traditional semi-supervised learning approaches primarily utilize a unified network model based on Convolutional Neural Networks (CNNs) and apply consistency regularization to mitigate the reliance on extensive annotated data. However, these methods often fall short in capturing the discriminative features of unlabeled data and in delineating the long-range dependencies inherent in the ambiguous boundaries of PSFH within ultrasound images. To address these limitations, we introduce a novel framework, the Dual-Student and Teacher Combining CNN and Transformer (DSTCT), which synergistically integrates the capabilities of CNNs and Transformers. Our framework comprises a tripartite architecture featuring a Vision Transformer (ViT) as the ‘teacher’ and two ‘student’ models — one ViT and one CNN. This dual-student setup enables mutual supervision through the generation of both hard and soft pseudo-labels, with the consistency in their predictions being refined by minimizing the classifier determinacy discrepancy. The teacher model further reinforces learning within this architecture through the imposition of consistency regularization constraints. To augment the generalization abilities of our approach, we employ a blend of data and model perturbation techniques. Comprehensive evaluations on the benchmark dataset of the PSFH Segmentation Grand Challenge at MICCAI 2023 demonstrate our DSTCT framework outperformed 10 contemporary semi-supervised segmentation methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2798_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/jjm1589/DSTCT

Link to the Dataset(s)

https://ps-fh-aop-2023.grand-challenge.org/ https://codalab.lisn.upsaclay.fr/competitions/18413 https://doi.org/https://doi.org/10.5281/zenodo.10969427 https://doi.org/10.6084/m9.figshare.14371652

BibTex

@InProceedings{Jia_Intrapartum_MICCAI2024,
        author = { Jiang, Jianmei and Wang, Huijin and Bai, Jieyun and Long, Shun and Chen, Shuangping and Campello, Victor M. and Lekadir, Karim},
        title = { { Intrapartum Ultrasound Image Segmentation of Pubic Symphysis and Fetal Head Using Dual Student-Teacher Framework with CNN-ViT Collaborative Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a dual student-teacher model that leverages the synergy between transformers and CNNs to effectively segment the pubic symphysis and fetal head during intrapartum procedures using semi supervised learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A significant contribution lies in achieving enhanced accuracy in semi-supervised segmentation, facilitated by an utilization of unlabeled ultrasound images. Notably, the integration of consistency learning and entropy minimization, utilizing both soft and hard labels, constitutes another contribution of this paper. Furthermore, the study includes a comprehensive ablation analysis, providing thorough insights into the model’s performance and data perturbation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the authors have provided a rationale for using unlabeled images in semi-supervised algorithms, the justification for employing the teacher-student model for fetal ultrasound segmentation lacks clarity.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The dataset used in this study is specified in the manuscript and an anonymized link for the source code is provided in the abstract, facilitating reproducibility of this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The claim “The hard pseudo labels generated by maximum confidence can sometimes be noisy”, requires additional investigation. Providing evidence to support this claim would enhance understanding. Throughout the manuscript, clarity is important, especially regarding cross-supervision with hard labels and consistency learning with soft labels. Exploring the reciprocal supervision between soft and consistency learning with hard labels would be interesting. Additionally, it would be beneficial to highlight the relevance of this issue in the context of intrapartum ultrasound segmentation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the authors have justified the utilization of unlabeled images for constructing semi-supervised algorithms, the rationale behind employing the teacher-student model for segmenting fetal ultrasound images lacks clarity. Considering the potential general applicability of the proposed model to various segmentation tasks, it would be interesting to observe its performance on adult ultrasound images or even across other imaging modalities

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel framework called Dual-Student and Teacher Combining CNN and Transformer (DSTCT) for the segmentation of the pubic symphysis and fetal head (PSFH) in intrapartum ultrasound images. The author proposed a novel framework that addresses the challenges of limited annotated data in ultrasound image segmentation. The DSTCT framework demonstrates significant improvements in segmentation accuracy and generalization abilities compared to existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper introduces a novel framework called Dual-Student and Teacher Combining CNN and Transformer (DSTCT) for the segmentation of the pubic symphysis and fetal head (PSFH) in intrapartum ultrasound images. The author proposed a novel framework that addresses the challenges of limited annotated data in ultrasound image segmentation. The DSTCT framework demonstrates significant improvements in segmentation accuracy and generalization abilities compared to existing methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Questions and suggestions for improvement:

    1. What does EMA represent in Figure 1?
    2. The letter “D” is used multiple times in the text to represent both the dataset and the Mean Squared Error (MSE) loss. This could lead to confusion and should be clarified.
    3. You mention four parameters: α = 0.5, β = 1.0, γ = 3.0, and μ = 0.1, but in the ablation study, only three parameters are evaluated. Consider addressing this discrepancy.
    4. While you have tested various comparative methods, many of them are from before 2019. Including more recent methods would provide a more comprehensive evaluation of the effectiveness of your approach.
    5. In your statement “Remarkably, with only 20% labeled data for training, our DSTCT achieves 90.4% DSC performance, only 0.6% inferior to the upper bound performance,” I couldn’t find “90.4% DSC” in Table 1. Without this reference, it’s difficult to verify your claim.
    6. Since you didn’t provide detailed explanations, I assumed “UNet SUNet SUNet” in Table 1 is the same as the OURS method. However, the data for these two methods don’t match, causing confusion. Inconsistencies in the data may lead readers to question its reliability. Please provide a reasonable explanation.
    7. You haven’t provided standard deviations with your data. Without them, it’s challenging to ensure that your method is statistically significant, especially for minor improvements. Including standard deviations and p-values would strengthen the validity of your method.
    8. ASD is mentioned to have decreased by 0.626 pixels. However, the unit for ASD should be in millimeters (mm) rather than pixels, as pixels are the smallest unit and cannot be further subdivided.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper introduces a novel framework called Dual-Student and Teacher Combining CNN and Transformer (DSTCT) for the segmentation of the pubic symphysis and fetal head (PSFH) in intrapartum ultrasound images. The author proposed a novel framework that addresses the challenges of limited annotated data in ultrasound image segmentation. The DSTCT framework demonstrates significant improvements in segmentation accuracy and generalization abilities compared to existing methods.

    Questions and suggestions for improvement:

    1. What does EMA represent in Figure 1?
    2. The letter “D” is used multiple times in the text to represent both the dataset and the Mean Squared Error (MSE) loss. This could lead to confusion and should be clarified.
    3. You mention four parameters: α = 0.5, β = 1.0, γ = 3.0, and μ = 0.1, but in the ablation study, only three parameters are evaluated. Consider addressing this discrepancy.
    4. While you have tested various comparative methods, many of them are from before 2019. Including more recent methods would provide a more comprehensive evaluation of the effectiveness of your approach.
    5. In your statement “Remarkably, with only 20% labeled data for training, our DSTCT achieves 90.4% DSC performance, only 0.6% inferior to the upper bound performance,” I couldn’t find “90.4% DSC” in Table 1. Without this reference, it’s difficult to verify your claim.
    6. Since you didn’t provide detailed explanations, I assumed “UNet SUNet SUNet” in Table 1 is the same as the OURS method. However, the data for these two methods don’t match, causing confusion. Inconsistencies in the data may lead readers to question its reliability. Please provide a reasonable explanation.
    7. You haven’t provided standard deviations with your data. Without them, it’s challenging to ensure that your method is statistically significant, especially for minor improvements. Including standard deviations and p-values would strengthen the validity of your method.
    8. ASD is mentioned to have decreased by 0.626 pixels. However, the unit for ASD should be in millimeters (mm) rather than pixels, as pixels are the smallest unit and cannot be further subdivided.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces a novel framework called Dual-Student and Teacher Combining CNN and Transformer (DSTCT) for the segmentation of the pubic symphysis and fetal head (PSFH) in intrapartum ultrasound images. The author proposed a novel framework that addresses the challenges of limited annotated data in ultrasound image segmentation. The DSTCT framework demonstrates significant improvements in segmentation accuracy and generalization abilities compared to existing methods.

    Questions and suggestions for improvement:

    1. What does EMA represent in Figure 1?
    2. The letter “D” is used multiple times in the text to represent both the dataset and the Mean Squared Error (MSE) loss. This could lead to confusion and should be clarified.
    3. You mention four parameters: α = 0.5, β = 1.0, γ = 3.0, and μ = 0.1, but in the ablation study, only three parameters are evaluated. Consider addressing this discrepancy.
    4. While you have tested various comparative methods, many of them are from before 201
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a new framework for semi-supervised intrapartum ultrasound segmentation, using a student-teach approach to learning in a combined CNN and ViT model. This model out-performs 10 different semi-supervised approaches in the same task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper introduces a fusion of approaches which enable semisupervised image segmentation, a method necessary to take advantage of the massive number of unlabeled images which can be combined with information gained from a small number of labeled samples within these same datasets (a common situation in clinical data).
    • The technique is reasonable and well reasoned.
    • The model is developed in an open data set which enables replication and further development of this approach.
    • The model is compared to many other, comparable models in the same data, showing consistently superior performance.
    • Ablation and hyperparameter testing are helpful for understanding what aspect of the model is enabling the improved performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There may be bias in which images are selected to be labeled. It would be interesting/useful for the authors to comment on or address this.
    • Unfortunately, because the model is so complex, it can be difficult to follow exactly what was done. A simpler “toy” figure describing the different teacher-student techniques and how the CCN+ViT come together would be useful to clarify this for the reader.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Clearer captions of the tables would be helpful. It’s not always obvious what metric is being displayed
    • There may be bias in which images are selected to be labeled. It would be interesting/useful for the authors to comment on or address this.
    • Unfortunately, because the model is so complex, it can be difficult to follow exactly what was done. A simpler “toy” figure describing the different teacher-student techniques and how the CCN+ViT come together would be useful to clarify this for the reader.
    • Confidence intervals of the performance metrics (i.e. even bootstrapped from the final prediction data) would strengthen the assertion that this method is superior.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper provides a clear contribution to semisupervised learning and is usefully embedded within a clinical context of use (large, partially labeled data sets). It is a valuable step forward and a useful contribution to the current body of work, with substantial comparison to other methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

To R1: Thank you for your feedback. We acknowledge the concern regarding potential bias in image selection. To address this, we ensured that our image selection process was as unbiased as possible by employing random selection techniques. Figure 1 illustrates the key aspects of the teacher-student techniques and the integration of CNN and ViT models. By breaking down the components and their interactions in a more digestible manner, we aim to provide a clearer understanding of the methodologies employed.

To R3: We appreciate the opportunity to clarify and improve our work. We realize that the justification for employing the teacher-student model for fetal ultrasound segmentation needs further clarification. The teacher-student model is particularly advantageous in this context because it allows the model to leverage a large amount of unlabeled data effectively. The teacher model, trained on labeled data, generates pseudo-labels for the unlabeled data, which the student model then uses for further training. This approach enhances the segmentation performance by improving the model’s ability to generalize from a limited set of labeled images to a broader set of clinical scenarios, which is critical for the variability encountered in fetal ultrasound images. Because the limited labeled data is not reliably stable for training the network model, directly inputting the unlabeled data into the pre-trained model to produce hard pseudo-labels may result in a lot of noise (i.e., incorrect predictions introducing noise). As training progresses, these errors may accumulate. By setting a high threshold, the quality (i.e., correctness) of pseudo-labels can be ensured. However, a series of dynamic threshold methods have pointed out that overly high thresholds discard many uncertain pseudo-labels, leading to imbalanced learning between categories and low pseudo-label utilization. Dynamic thresholds lower the threshold in the early stages to introduce more pseudo-labels for early training, but the low threshold in the early stages inevitably introduces low-quality pseudo-labels. Therefore, to reduce errors (noise) in hard pseudo-labels and focus on challenging areas without labels, we utilize a sharpening function to generate soft pseudo-labels in this paper.

To R4: Thanks! In Figure 1, EMA stands for Exponential Moving Average. We have already specified in the article that E represents the Mean Squared Error (MSE) loss, and D solely represents the dataset. Due to the space limit of the paper, we did not give standard deviations and p-values, and the ablation experimental results on parameter μ. In fact, the value of parameter μ = 0.1 is optimal, which is consistent with previous research, therefore, we do not give relevant results (we set μ to values of 0.01, 0.1, 0.5, and 1.0 while keeping the other weight parameters constant at α = 0.5, β = 1.0, and γ = 3.0. The experimental results showed that the Dice Similarity Coefficient (DSC) for PSFH were 0.889, 0.893, 0.891, and 0.890, respectively). In our article, the comparison methods and their corresponding years are as follows: Mean Teacher (MT): 2017; Deep Adversarial Network (DAN): 2017; Deep Co-Training (DCT): 2018; Uncertainty-Aware Mean Teacher (UAMT): 2019; Cross Consistency Training (CCT): 2020; Cross Pseudo-Supervision (CPS): 2021; Cross Teaching Between CNN and Transformer (CTCT): 2022; Interpolation Consistency (ICT): 2022; Self-Integration Method Based on Consent-Aware Pseudo-Labels (S4CVnet): 2022; and Collaborative Transformer-CNN Learning (CTCL): 2022. We apologize for the error. With only 20% labeled data for training, our DSTCT achieves 89.3% DSC performance, only 6% inferior to the upper bound performance. The unit for ASD is indeed millimeters (mm). We first conducted ablation experiments and then performed experiments on the balancing weight parameters. Therefore, in the comparison of methods, we selected the best results from Table 3, rather than from Table 2.




Meta-Review

Meta-review not available, early accepted paper.



back to top