Abstract

We propose a method that leverages multiple identical network structures to generate and process diverse augmented views of the same medical image sample. By employing contrastive learning, we maximize mutual information among features extracted from different views, ensuring the networks learn robust and high-level semantic representations. Results from testing on four public and one private endoscopic surgical tool segmentation datasets indicate that the proposed method outperformed state-of-the-art semi-supervised and fully supervised segmentation methods. After trained by 5% labeled training data, the proposed method achieved an improvement of 11.5%, 8.4%, 6.5%, and 5.8% on RoboTool, Kvasir-instrument, ART-NET, and FEES, respectively. Ablation studies were also performed to measure the effectiveness of each proposed module. Code is available at \href{https://github.com/on1kou95/Mutual-Exemplar}{Mutual-Exemplar}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0103_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0103_supp.pdf

Link to the Code Repository

https://github.com/on1kou95/Mutual-Exemplar

Link to the Dataset(s)

https://www.synapse.org/Synapse:syn22398986/wiki/605520 https://datasets.simula.no/kvasir-instrument/ https://github.com/kamruleee51/ART-Net https://www.kaggle.com/datasets/aithammadiabdellatif/binarysegmentation-endovis-17/code

BibTex

@InProceedings{Wen_Learning_MICCAI2024,
        author = { Weng, Weihao and Zhu, Xin},
        title = { { Learning Representations by Maximizing Mutual Information Across Views for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper builds on top of prior work on contrastive co-training to address its “data waste” and “wrong exemplar” problems. The authors demonstrated that the proposed method improves upon baselines for multiclass segmentation tasks on surgical tool videos, especially when the number of labeled images are small.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good Empirical Results: the proposed method improves performance of multiclass segmentation tasks on surgical tool videos when compared to baselines, especially when there is not many labeled images for training.
    • Comprehensive Experiments: The method did a pretty comprehensive set of experiments on 5 different datasets, different architectures, and ablations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited Usability: The performance of the method is comparable to baselines when the number of labeled images increases. In addition, the proposed method does not improve over baseline on 3 other dataset mentioned in Limitations.
    • Lacks Clarity: The authors should be (1) more precise with using certain terminologies, e.g., “co-training”, “contrastive co-training”, (2) improve clarity of the introduction to state contributions more clearly, and (3) simplify the notations used in methods. More details in detailed comments.
    • Unverified Motivations: This paper is motivated by “data waste” and “wrong exemplar” that are not clearly demonstrated as problematic. Some concrete proof of these problems and how much they causes problems would be helpful.
    • Unclear Contributions: The paper is unclear how the proposed method is different from the directly relevant prior work (e.g., Min-Max Similarity) to address the “data waste” and “wrong exemplar” problems.
    • Unclear Connections to Co-training: This paper uses concepts (e.g., co-training) that is challenging to related to a specific aspect of the method.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors do mention this work builds on top of code from a prior work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major

    • Introduction: I hope the author could be a bit more careful on usage of “co-training”. The origianl co-training paper [1] assumes each data point comes with two different views (e.g., figure & captions), the deep co-training paper [2] constructs the second view using adversarial examples. For example, the authors mentioned “co-training divides the labeled dataset into two non-overlapping subsets …” and perhaps they should use “min-max similarity” [3] instead. Also, the author should be clear what “contrastive co-training” refers to - Are you refering to [3]?
    • Introduction: related to previous point, could the author explain what aspect of the method is motivated by/similar or dissimilar with co-training [1]? The author should avoid using this terminology if the motivation is inadequate because it confuses the reader.
    • Method: What is “the classifier” after the first layer of the encoder, e.g., what is it trying to classify? What is the dimension of the features map from the classifier that the authors denote $Y^f$ and $Y^p$? Please clarify this in the paper in more detail.
    • Method: the notations can be improved significantly. Currently, the notations are challenging to understand what they mean. A few examples: (1) why use $q$ here $q^{pi}$, how about just use $z$ to denote features or tell the reader that $q$ means query and $k$ means key to align with some contrastive learning literature’s notation, (2) $k_i^{P\neq i}$, if $i=1$, then does $p\neq i$ refer to 2 or 3? (3) Augmentation is denoted as w,m,s in Fig. 1 but are indexed numerically in the paragraph after Equation (2). Try to make it consistent.
    • Method: Equaion (3) is not the contrastive loss. Contrative loss is the cross entropy loss of correctly classifying the positve pair from a set of pairs. The denominator should at least include the term in the numerator. Please clarify this point or refrain from calling it “contrastive” loss to avoid confusion.
    • This paper uses concepts (e.g., co-training) that is hard to me to tie to the specific aspect of the method. The authors may want to either explain this connection in more detail or refrain from using these buzz words all at once. Additionally, I don’t understand what “mutual exemplar” represents.

    Minor

    • abstract: “laebled” -> “labeled”
    • the paper would benefit from grammar check, there is quite a few misspelled words.
    • Introduction: From my understanding, deep co-training does not train multiple networks on different subsets of data, but instead crates a dataset of adversarial examples. Could the authors explain a bit on this?
    • Introduction: the “our contribution” section currently is more like a in-depth discussion comparing contrastive co-training and the proposed method. Please consider revising the contribution to state, in concise language, the contribution of the paper, e.g., what is proposed, how much better empirically, etc.
    • Method: page 4 $q^{pi}$ -> $q^{p_i}$.
    • Method: Equation (3) numerator should be $q \cdot k$ instead of $q \cdot q$.

    [1] Combining Labeled and Unlabeled Data with Co-Training [2] Deep co-training for semi- supervised image recognition. [3] Min-max similarity: A contrastive semi-supervised deep learning network for surgical tools segmentation

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the authors’s comprehensive experiments and favorable experimental results. The proposed method has limited usability (e.g., only on some datasets and only when the number of labeled images are small). In addition, the paper lacks clarity, unclear contributions, and unverified motivations that would benefit from further revision.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    I read the rebuttal and don’t feel the author sufficiently addressed my concerns. The weaknesses/questions mentioned in this review are largely unresolved.



Review #2

  • Please describe the contribution of the paper

    The paper identifies two significant issues in existing methodologies: “data waste” and “wrong exemplar.” To address these, the authors have designed a contrastive learning framework enhanced with pseudo-labeling that effectively mitigates these problems. They have also validated their approach using both public and a unique private dataset, demonstrating notable improvements in performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s strengths lie in several areas: Firstly, the authors have provided access to their implementation code (based on “Min_Max_Similarity”), which is a commendable practice for facilitating reproducibility. Secondly, the method proposed is straightforward yet effective, contributing to its practical applicability. Finally, the strong performance across various datasets underscores the robustness and potential of the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper has several areas that could be improved: The writing quality needs substantial improvement. Specific issues include typographical errors such as “laebled” which should be corrected to “labeled,” and grammatical errors like “Ablation studies is” which should be “Ablation studies are.” Additional corrections include “thebinary” which should be “the binary,” and “forF3,” which should include a space to read “for F3.” The figure legends and descriptions also need revision for clarity; for instance, the figure should show that three encoders were initialized differently, and it should include the unlabelled dataset, which is currently missing. There is also a potential error with “Max Similarity” which might refer to “Min-Max Similarity,” and the variable “𝑋j” in the figure lacks a description. Lastly, there is a repetitive error where “the second” is mentioned ambiguously and should be clarified. The experimental setup and the writing do not convincingly demonstrate how the methodology addresses the issues of “data waste” and “wrong exemplar.” This connection between theory and results needs strengthening. The logic behind ablation study is hard to follow. The method primarily builds upon existing techniques like “Max Similarity” and “Pseudo-CL,” which raises concerns about its novelty. This reliance should be addressed by clearly distinguishing the proposed enhancements from these prior works. Although the paper lists four “modifications” to state-of-the-art methods, it lacks a clear articulation of how these contribute uniquely to advancing the field. This section requires a more detailed explanation to highlight the impact and novelty of the proposed changes.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are encouraged to undertake a thorough revision of the manuscript to address the numerous typographical and grammatical errors present. Enhancing the clarity of the experimental descriptions and ensuring that the modifications introduced are distinctly explained would significantly strengthen the submission. It would also be beneficial to more explicitly detail how the proposed method diverges from existing techniques and the specific benefits these differences confer.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the experiments conducted are thorough and the results are promising, the presentation of the paper currently falls below the standard expected for clarity and precision. Numerous typographical errors and some unclear descriptions in the methodology section need addressing. The potential of the proposed methods is evident, yet the paper would benefit greatly from a detailed revision to polish the language and further clarify the experimental setup and results. If these issues are addressed, the paper could present a good case for acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author addressed my concerns. Based on the paper is well orgainsed and open-sourced. I recommend accept.



Review #3

  • Please describe the contribution of the paper

    The manuscript “Mutual Exemplar: Contrastive Co-training for Surgical Tools Segmentation” presents an approach for surgical tool segmentation based con contrastive learning. The approach is tested on public and private datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Well written and easy to follow manuscripts
    • Interesting application, already quite investigated in the CAI community
    • Nice explanation of SSL history
    • Clear workflow image overview
    • Extensive experiments
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The degree of contributions of the manuscript over the literature is not clear. There is a number of studies outside the field of surgical tool segmentation that uses SSL. How does the manuscript place itself in this rapidly evolving field? The description of the methodology is clear but what is missing is how the authors tailor this methodology to the problem under investigation.
    The choice of the competitors is not well justified. Data waste and wrong exemplar seem to appear only at the beginning and very end of the manuscript. Maybe these 2 concepts may be deepened more. The introduction may be summarized to leave more space to method description.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Most of the datasets are publicly available (public).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See point 6. Mainly I would suggest to:

    • summarize Sec. 1 to convey only relevant information and highlight better the manuscript contribution.
    • improve methodology description (more details on how the method was tailored on surgical tool segmentation and how it differs from the state of the art, also outside surgical tool segmentation)
    • describe better how the competitors where chosen
    • improve the discussion (do not just mention quantitative results but try to highlight why the proposed method overcomes the competitors )
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Average methodological novelty, robust experimental analysis

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewer for their valuable feedback. In the following we answer the questions raised by the reviewers.

  1. For the clarity of the reviewer and readers, we restructured the introduction section of the paper. The restructuring is based on a funnel approach where first, we explained the problem statement through recent statistics from world health organization. Secondly, we explained reference method and explain their limitations. Additionally, we highlighted the technical contribution of our work in bullet points. The quantitative results clearly show our proposed network perform better than the reference method.

  2. We explain the clarity the concept of data wasta and wrong exemplar below. Data Waste: Traditional co-training methods typically involve splitting the dataset into several subsets and training multiple networks concurrently (hence the term “co”). Each network utilizes only one subset of the data. Clearly, since each network trains on a different subset, the features extracted by them are distinct. The issue arises because, in pursuing different features, each network does not have access to the entire dataset. Typically, two networks are used, with each training on half of the labeled data, meaning each network effectively wastes half of the labeled data. We define this as Data Waste. In contrast, our proposed method uses data augmentation to obtain different features, allowing each network to utilize the entire set of labeled data, thereby eliminating the problem of data waste. Wrong Exemplar: Traditional co-training often trains two networks simultaneously. If one network makes a high confidence but incorrect prediction, using contrastive learning to make the features produced by the other network more similar to those of the erroneous network can lead us further away from the correct answer. Traditionally, using two networks is a necessary compromise because the labeled data is divided into subsets. If divided into too many subsets, each network would receive insufficient data. For example, if the total labeled data consists of 120 images, traditional co-training with two networks allows each to access 60 images; with three networks, each will only get 40 images. To ensure that each network receives enough data, only two networks are used, although when one network errs, the other is likely to err as well. Our approach employs more networks to mitigate the impact of individual network errors on overall training. Why don’t we use even more networks? Because we need substantially varied augmentations to ensure significant differences between the features learned by each network. For each dataset, we can employ more networks as we find better combinations of augmentations. However, for this paper, to ensure a fair comparison and minimize network adjustments for each dataset, we use a common set of augmentations.

  3. The experiments mentioned by R3, and R4 in the Supplementary Material are general medical image segmentation tasks, which do not include surgical instruments. It’s important to note that surgical instruments segmentation and general medical image segmentation are similar yet distinct tasks due to the vastly different features of surgical instruments and biological tissues. Our results demonstrate that contrastive learning, which focuses on identifying and distinguishing different features, achieves superior outcomes. Our method aims to further enhance contrastive learning through co-training. However, our results reveal that co-training does not hold a clear advantage over other types of semi-supervised learning methods. Nevertheless, by addressing the issues of “Data Waste” and “Wrong Exemplar,” our proposed method significantly improves the precision of surgical instrument segmentation. Given the critical importance of accurate surgical instruments segmentation in contexts like Robotic-Assisted Surgery, we argue that our proposed method has a high potential clinical impact.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The decision is split: two Weak Accept (WA) and one Reject (R). The reviewers highlighted the novelty, convincing experiments. Reveiwer 3 expressed concerns about the lack of description of the key idea co-training and the paper quality. Rebuttal addressed some of the issues but some of them remain. This paper is a borderline paper for acceptance. In the AC’s opinion, the merits slightly outweigh the concerns. If there are space, it could be accepted.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The decision is split: two Weak Accept (WA) and one Reject (R). The reviewers highlighted the novelty, convincing experiments. Reveiwer 3 expressed concerns about the lack of description of the key idea co-training and the paper quality. Rebuttal addressed some of the issues but some of them remain. This paper is a borderline paper for acceptance. In the AC’s opinion, the merits slightly outweigh the concerns. If there are space, it could be accepted.



back to top