Abstract

Segmenting medical images is critical to facilitating both patient diagnoses and quantitative research. A major limiting factor is the lack of labeled data, as obtaining expert annotations for each new set of imaging data and task can be labor intensive and inconsistent among annotators. We present CUTS, an unsupervised deep learning framework for medical image segmentation. CUTS operates in two stages. For each image, it produces an embedding map via intra-image contrastive learning and local patch reconstruction. Then, these embeddings are partitioned at dynamic granularity levels that correspond to the data topology. CUTS yields a series of coarse-to-fine-grained segmentations that highlight features at various granularities. We applied CUTS to retinal fundus images and two types of brain MRI images to delineate structures and patterns at different scales. When evaluated against predefined anatomical masks, CUTS improved the dice coefficient and Hausdorff distance by at least 10% compared to existing unsupervised methods. Finally, CUTS showed performance on par with Segment Anything Models (SAM, MedSAM, SAM-Med2D) pre-trained on gigantic labeled datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1569_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1569_supp.pdf

Link to the Code Repository

https://github.com/KrishnaswamyLab/CUTS

Link to the Dataset(s)

https://github.com/KrishnaswamyLab/CUTS

BibTex

@InProceedings{Liu_CUTS_MICCAI2024,
        author = { Liu, Chen and Amodio, Matthew and Shen, Liangbo L. and Gao, Feng and Avesta, Arman and Aneja, Sanjay and Wang, Jay C. and Del Priore, Lucian V. and Krishnaswamy, Smita},
        title = { { CUTS: A Deep Learning and Topological Framework for Multigranular Unsupervised Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes CUTS, a novel unsupervised deep learning approach for medical image segmentation that combines contrastive learning and local patch reconstruction with topological data analysis. This approach attempts to address the limitations of supervised methods and offers a unique solution for multi-granular segmentation without requiring labeled data. Compared to other unsupervised methods, it shows a better performance on three datasets on binary segmentation tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The motivation behind CUTS is clear, and the reasons for incorporating intra-image contrastive learning and local patch reconstruction are well explained.

    • In contrast to many contrastive learning methods that focus on inter-image comparisons, CUTS emphasizes intra-image feature learning by utilizing pixel-centered patches. This approach holds particular relevance for medical images.

    • The authors introduce diffusion condensation into CUTS, enabling the extraction of unsupervised structures at different levels of granularity. This feature may offer flexibility to accommodate various clinical needs.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While CUTS has the potential to provide flexibility for different clinical needs, selecting and evaluating the appropriate level of granularity in real-world applications could be challenging, limiting its practicality.

    • Despite being proposed as an unsupervised method, the framework of CUTS involves several engineering steps, and the granularity levels vary across different domains and tasks.

    • Some essential descriptions and implementation details are missing, such as the specific criteria for selecting persistent structures, hyperparameters, epochs, training details of contrastive learning, etc.

    • Some experiments may not be the most suitable for evaluating the proposed method.

      1. A more appropriate evaluation and application scenario for this method would involve instance/semantic-level segmentation tasks, such as STEGO, rather than a straightforward binary segmentation task.
    1. Both qualitative and quantitative comparisons of diffusion-B are not very suitable. Since one of the biggest limitations & challenges of CUTS is the selection of the granularity level, the results of diffusion-B being the best match to the ground truths may not be practical in a real unsupervised learning situation where ground truth is unavailable. It becomes difficult to determine the “best” level for “unsupervised segmentation”. In such a scenario, CUTS combined with k-means and diffusion-P would be more automatic and end-to-end, providing a better representation of CUTS’ actual performance on the task (but the performance has significantly dropped).

    2. It’s not reasonable to compare CUTS to basic SAM. Although SAM is trained on large-scale datasets, those data are natural images. It should be considered has good results on medical image tasks. Instead, it would be more reasonable for the authors to evaluate the method on specialized medical image datasets such as MedSAM or SAM-MED2D.

    • The figure is too small to observe. The width of the image is much smaller than the width of the page, requiring a significant increase in size to be displayed clearly.

    • How does the abbreviation CUTS come about?

    • The organization of Conclusion is poor.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The framework of CUTS involves multiple engineering steps, and the results vary depending on the chosen granularity level. However, there are some essential descriptions and implementation details that are missing. These include a detailed standard for selecting persistent structures, hyperparameters, epochs, and training details for contrastive learning, etc.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please look at the weaknesses section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are a few deficiencies in the paper writing, organization, experiments, and results that cause to its weaknesses. As a result, the overall rating of the paper currently falls around the borderline and will still depend on rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Some of my concerns are addressed, and some are not. Nevertheless, the paper is currently above the borderline. I hope the authors present a clearer paper in the revised version.



Review #2

  • Please describe the contribution of the paper

    This paper proposes an unsupervised deep learning framework for medical image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This method can be adapted to a wide range of datasets.

    The method shows novelty and distinction compared to existing methods. It utilizes a contrastive loss function and nearby pixels.

    The solution effectively addresses the challenge.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I like the method, but I have serious concerns about the motivation of this paper. Since SAM has already been introduced and demonstrates better performance, is it still necessary to design such an unsupervised method? We can also consider SAM as a method that does not require user annotation. Do we still need the unsupervised method introduced in this paper?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors could strengthen the motivation of this paper by discussing how their unsupervised method could be complementary to, or even integrated into, existing methods like SAM. Simply claiming it is unsupervised seems insufficient.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Combine Mamba with MIL, incorporating the design of SR-Mamba for further improvement.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    The response 2 addresses my concern about motivation. I think this kind of paper should be encouraged in MICCAI society. We need more insights about segmentation tasks and images themselves, we need to process the data efficiently with a lightweight model, rather than just increasing the number of parameters/data for LLM/SAMs. I would expect the author could add their response 2 to the main draft. I am also happy to see that the authors are willing to release their code.

    Still, I have some concern about the Retinal Atrophy dataset. Compared to other two datasets, the CUTS does not work well as SAM. Could you please explain it in the final version?



Review #3

  • Please describe the contribution of the paper

    Authors propose an unsupervised segmentation method which is trained using a contrastive loss between (intra-image) patches and a reconstruction loss. Using the trained model, embeddings are computed for every pixel. These embeddings are then assigned to n clusters and further clustered using either k-means or diffusion condensation. The latter produces a variety of cluster assignments at different levels of granularity

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • interesting method for a relevant problem (using contrastive learning, popular in unsupervised learning, but tailored for segmentation)
    • outperforms other similar approaches
    • evaluated on a range of datasets with different types of segmentation targets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. although the results are good for an unsupervised method, the paper would be even more convincing if you could describe a use case. Given the somewhat low DSC (0.486) for important targets such as brain tumors, it is unlikely to be reliable enough for diagnostic support. Also, within the same tumor dataset, the various clustering results do not seem to be able to separate white and grey matter as the intensity difference is rather subtle (Fig 2). As a result it is a bit unclear what the model would be most useful for, since it cannot segment the tumor or the gyri. Adding a motivating use case or a potential strategy for refining the results could help the reader understand the limitations and ways in which the method could already be useful.
    2. need a label prior of some kind in order to extract nearest matching cluster for any region of interest. Is this another limitation on its applicability or are there strategies to circumvent this issue?
    3. comparison with medSAM (J. Ma et al. Segment anything in medical images. arXiv:2304.12306, 2023) would probably be more relevant than comparing with SAM. The comparison with SAM only highlights that SAM is trained on a billion masks, but does not mention the domain shift. Also, would SAM performance improve if it was given a similar matching process as the unsupervised methods (trying with every foreground pixel) instead of only having a single center point of the mask?
    4. the novelty of patch-based contrastive learning might be over-emphasized considering that one of the original contrastive learning papers proposed a patch-based (intra-image) method (A. Oord et al. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748, 2018).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • could perhaps mention the limitations of the current method/performance to give readers a realistic perspective/objective evaluation
    • diffusion condensation could also be explained in a bit more detail
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    good improvement over benchmarks, but still seems to miss many important structures (fig 2. top row vessels, bottom row gyri) limiting its usefulness.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Rebuttal has not really responded to some of my concerns:

    1. what is the practical use case given that the proposed method does not seem to reliably segment tumors or gyri?
    2. I am not convinced that it is practical to provide clinicians with “multi-granular complete segmentation” that they have to sift through and select from. This could become almost as laborious as manually segmenting. I could be wrong, but I think needing a label prior is still a limitation of the method.

    But the merits of the paper still stand, so my review remains the same.




Author Feedback

Comment 1: It’s not reasonable to compare to basic SAM. You shall compare medical versions of SAM, namely SAM-Med2D and MedSAM.

Response 1: Thank you for the valuable comment. We took the advice and ran them on our tasks. For (retina, ventricles, tumor) dataset, for SAM-Med2D/MedSAM/CUTS+diffusion-kmeans/CUTS+diffusion-B we got dice scores of (0.548/0.079/0.675/0.741, 0.736/0.053/0.774/0.810, 0.591/0.088/0.432/0.486). Note: according to MedSAM’s authors, point-prompt is an unstable feature which explains the bad results. CUTS+diffusion-B significantly outperforms on 2/3 datasets, even CUTS+kmeans performs better on 2/3 datasets. Other SAM prompting methods such as bounding box prompts give additional information and lead to an unfair comparison. We have all results ready and can include these during the revision.

Comment 2: If SAM models are demonstrating better performance, is it still necessary to design a method like CUTS?

Response 2: First, SAM models are not showing better performance than CUTS. This alone is enough reason to design a method like this. Further, a model like CUTS is lightweight, in that it does not require months of annotation and pretraining in large compute warehouses. Further, it is clear that models like SAM indeed require domain-specific fine-tuning for specific tasks that are not implicitly covered by supervised pre-training stage, and this could suggest objectives, modules and methods for pre-training or fine-tuning. More generally, we believe large foundation models will not automatically work for any task without such fine-tuning, so specific approaches with the correct inductive biases will still be important for these.

Comment 3: You need a label prior of some kind in order to extract the nearest matching cluster for any region of interest. Is this another limitation on its applicability or are there strategies to circumvent this issue?

Response 3: We can simply provide multi-granular complete segmentation (Fig 2), after which a particular region of interest can be chosen by the clinician or user.

Comment 4: You may want to consider instance/semantic-level segmentation tasks, such as STEGO, rather than a binary segmentation task.

Response 4: We agree that semantic-level segmentation could be an interesting application of CUTS. However we were only able to access medical datasets with binary segmentation of particular targets. Datasets such as the Berkeley natural image dataset have semantic segmentation, and indeed CUTS performs well on that, but that may be less relevant to MICCAI.

Comment 5: What are the criteria for selecting persistent structures, hyperparameters, epochs, training details of contrastive learning, etc.?

Response 5: Persistence is a measure that is defined in diffusion condensation as clusters that stay separated over many iterations. We use this to select persistent structures and there are no hyperparameters involved in this. For selection of hyperparameters in CUTS training, we have previously conducted ablation/tuning experiments and can include them in the revision.

Comment 6: Will you open-source the code?

Response 6: We have well-documented code ready to be released, and we will include the link to the codebase once the paper is accepted.

Thanks again to the kind reviewers for the constructive feedback, and hope we have addressed your crucial questions and can deserve an increase in the rating. We will address the minor suggestions including figure size changes, rephrasing the conclusion, etc.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Overall, the paper was well received by the reviewers; two reviewers in particular upgraded their scores post-rebuttal. The main weaknesses noted by all the reviewers, e.g. organization and clarity of the paper, and all the minor concerns not addressed in the rebuttal, can be addressed for the camera-ready version of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Overall, the paper was well received by the reviewers; two reviewers in particular upgraded their scores post-rebuttal. The main weaknesses noted by all the reviewers, e.g. organization and clarity of the paper, and all the minor concerns not addressed in the rebuttal, can be addressed for the camera-ready version of the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes a novel unsupervised medical image segmentation that combines contrastive learning and local patch reconstruction with topological data analysis. The reviewers list the interesting idea for using intra-image feature learning, and the comprehensive experimental results as strengths. The reviewers express concerns about the difficulty for applying CUTS in real application, and missing detailed descriptions of the method. The rebuttal addressed these concerns, and then the reviewers upgraded their reviews to WA, SA, and WA. The meta-reviewer recommends accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposes a novel unsupervised medical image segmentation that combines contrastive learning and local patch reconstruction with topological data analysis. The reviewers list the interesting idea for using intra-image feature learning, and the comprehensive experimental results as strengths. The reviewers express concerns about the difficulty for applying CUTS in real application, and missing detailed descriptions of the method. The rebuttal addressed these concerns, and then the reviewers upgraded their reviews to WA, SA, and WA. The meta-reviewer recommends accepting this paper.



back to top