Abstract

In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a candidate mask and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0293_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0293_supp.pdf

Link to the Code Repository

https://github.com/MedICL-VU/PRISM

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Li_PRISM_MICCAI2024,
        author = { Li, Hao and Liu, Han and Hu, Dewei and Wang, Jiacheng and Oguz, Ipek},
        title = { { PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper develops an interactive segmentation model suitable for 3D medical images based on SAM, performing effectively on colon, pancreas, liver, and kidney. Compared to SAM, two improvements are made: removing the mask token and using the sparse token to obtain the segmentation masks, and accumulated mask refinement. The experimental results also exhibit the model’s superiority in 3D medical image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The model’s design, which integrates human-in-loop with SAM, is direct and offers broader application prospects in practice. -The model creates masks directly with sparse tokens. These sparse prompts, especially sampled in false positives and false negatives areas, amplify the model’s correction capacity and enable the generation of more accurate masks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The comparison may not be fair. The method described in this paper requires consistent sampling of sparse prompts from GT throughout the iteration process, potentially causing label leakage. Moreover, no comparisons are made with other models’ results using bbox. The authors should provide a fairer comparison, perhaps by comparing the results of other models sampling sparse prompts from GT during each iteration. -There is a lack of detailed introduction to the hybrid encoder. How are CNN features and ViT features combined? How are they utilized in the decoder? -The necessity of retaining the gradient of the dense prompt is unclear. Would there be an impact on the results if only y.detach() is passed in the subsequent iteration? -The rationale for continuous maps and confidence scores sharing the same MLP is not sufficiently explained.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see weaknesses

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the method introduced in this paper is simple and straightforward, it demonstrates impressive advantages over other methods in the field of 3D medical image segmentation. Therefore, I recommend a ‘weak accept’ rating.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present an interactive segmentation model (PRISM) that can utilize clicks, scribbles, as well as bounding boxes as prompts to segment 4 types of tumors. The PRISM model implements: (1) iterative refinement to improve the segmentation with each new interaction; (2) multiple predictions with varying confidence to address the ambiguity of interactions; (3) a small second-stage refinement network that corrects the output of the first-stage model. The authors’ model outperforms 3 non-interactive and 4 interactive approaches and achieves >93% Dice on all 4 tumor segmentation tasks. The authors extensively ablate their model by varying the number and type of interactions as well as by excluding individual components of the model to justify why they have included them.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Strong motivation: Combining multiple interaction types (clicks, scribbles, bounding boxes) is a very relevant task in the interactive segmentation community as different interaction types complement each other, e.g., a bounding box is easy to draw but clicks and scribbles are more precise. There is only one work [1] in the medical field that combines all three types of interactions, but does so only in the 2D domain. The authors do a great job at motivating why this is important in the field.

    (2) Well-justified design decision via ablation studies: The authors incorporate important concepts from previous interactive approaches into one unified framework: (1) multiple interaction types [1]; (2) iterative refinement [2], [3], [4]; and (3) multiple predictions [5], [6] to eliminate ambiguity. Although these concepts are well-known in the interactive segmentation community, the authors combine them in a unified framework and demonstrate with ablation studies that each component is crucial to achieve the best results. The ablation studies justify the authors’ design decisions, demonstrating the rationale behind incorporating specific components into their final framework.

    (3) Convincing results on four datasets: The authors achieve impressive results (over >93% Dice) on 4 datasets targeting different types of tumors. The qualitative results in the supplementary also illustrate that their approach has a large potential to control the quality of the predicted segmentation by placing interactions in critical locations.

    (4) General approach: The author’s approach is evaluated on 4 datasets, but it is quite general and could be applied to other tasks and imaging modalities. I can see how it could also generalize to 2D imaging modalities by applying the same concepts.

    [1] Lin, Zheng, et al. “Multi-mode interactive image segmentation.” Proceedings of the 30th ACM Multimedia. 2022.

    [2] Zhuang, Mingrui, et al. “Efficient contour-based annotation by iterative deep learning for organ segmentation from volumetric medical images.” IJCARS 2023

    [3] Ma, Wenao, et al. “Rapid model transfer for medical image segmentation via iterative human-in-the-loop update: from labelled public to unlabelled clinical datasets for multi-organ segmentation in CT.” ISBI 2022

    [4] Bredell, Gustav, Christine Tanner, and Ender Konukoglu. “Iterative interaction training for segmentation editing networks.” MICCAI Workshops 2018.

    [5] Kirillov, Alexander, et al. “Segment anything.” ICCV 2023

    [6] Li, Zhuwen, Qifeng Chen, and Vladlen Koltun. “Interactive image segmentation with latent diversity.” CVPR 2018

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Incorrect claims: The authors claim in the introduction on page 2 that “few studies have explored a human-in-the-loop approach for medical interactive segmentation”. This is factually incorrect as there are many works focusing exactly on iterative human-in-the-loop interactive segmentation in the medical domain [2], [3], [4], [5], [6], [7], [8] and [1] even defines a dedicated taxonomy branch for such methods with > 30 included approaches. Stating that it is an “underexplored field” may make the author’s contribution seem stronger but it disregards the progress made in the field by numerous methods and is factually incorrect. I advise the authors to omit this claim in the introduction or to re-phrase it.

    (2) Questionable methods for comparisons: The authors compare to 4 other interactive approaches, all of which are based on the Segment Anything Model (SAM) [9] which is notorious for struggling with medical data without extensive fine-tuning. I am interested why the authors decided to compare only to SAM-based approaches instead of comparing to well-established interactive methods that are tailored specifically to iterative refinement such as [2-8]. These methods would be more similar to their proposed model PRISM. I understand that I cannot request additional experiments during the rebuttal but I would like the authors to justify why they only chose SAM-based approaches for their experiments.

    [1] Marinov, Zdravko, et al. “Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy.” arXiv 2023

    [2] Zhuang, Mingrui, et al. “Efficient contour-based annotation by iterative deep learning for organ segmentation from volumetric medical images.” IJCARS 2023

    [3] Ma, Wenao, et al. “Rapid model transfer for medical image segmentation via iterative human-in-the-loop update: from labelled public to unlabelled clinical datasets for multi-organ segmentation in CT.” ISBI 2022

    [4] Ma, Chaofan, et al. “Boundary-aware supervoxel-level iteratively refined interactive 3d image segmentation with multi-agent reinforcement learning.” IEEE Transactions on Medical Imaging 2020

    [5] Liao, Xuan, et al. “Iteratively-refined interactive 3D medical image segmentation with multi-agent reinforcement learning.” CVPR 2020

    [6] Bredell, Gustav, Christine Tanner, and Ender Konukoglu. “Iterative interaction training for segmentation editing networks.” Machine Learning in Medical Imaging: 9th International Workshop, MLMI 2018, Held in Conjunction with MICCAI Workshops 2018

    [7] Amrehn, Mario, et al. “Ui-net: Interactive artificial neural networks for iterative image segmentation based on a user model.” 2017 Eurographics Workshop on Visual Computing for Biology and Medicine, VCBM 2017

    [8] Marinov, Zdravko, Rainer Stiefelhagen, and Jens Kleesiek. “Guiding the guidance: A comparative analysis of user guidance signals for interactive segmentation of volumetric images.” MICCAI 2023

    [9] Kirillov, Alexander, et al. “Segment anything.” ICCV 2023

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The overall reproducibility of the paper is good. The authors have promised to provide the full source code and have used public datasets for their experiments. They have also described their methodology in great detail (apart from equations for producing Z_x and Z_v in Fig. 2).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Here are more detailed comments in order to improve the paper:

    (1) The authors should re-phrase the claim that there are few methods focusing on human-in-the-loop iterative approaches in the medical domain. This is quite a hot topic and there are a lot of community efforts in building up iterative medical segmentation models in open source projects such as MONAI Label (https://github.com/Project-MONAI/MONAILabel). I advise the authors to reconsider this claim as it is factually incorrect.

    (2) The authors should justify why they only compare to SAM-based approaches. This is important as SAM-based models require additional fine-tuning on medical data and there are many iterative approaches in the field that are closer to what the authors propose.

    (3) Z_x and Z_v are only illustrated in the Fig. 2, but it is not clear how they are exactly computed. The figure on its own is not enough to understand how the final embeddings are produced. I advise the authors to add an equation that describes this (perhaps in the supplementary due to space limitations).

    Minor comments that played no role in my review but would be good to fix:

    • Typo on page 2: from previous iteration -> from the previous iteration
    • Typo on page 2: goal of being effectiveness -> goal of being effective
    • Typo on page 2: user-friendly interaction -> user-friendly interactions
    • Typo on page 3: sparse prompt based on -> a sparse prompt based on
    • Typo on page 3: for next iteration -> for the next iteration
    • Typo on page 3: in human-in-loop manner -> in a human-in-the-loop manner
    • Typo on page 4: with uniform distribution -> with a uniform distribution
    • Typo on page 5: mean square error -> mean squared error
    • Typo on page 5: fed to sampler for next iteration -> fed to the sampler for the next iteration
    • Typo on page 5: 4e-5 (the 5 should be part of the superscript, e.g., $4e^{-5}$, same for 2e-6
    • Typo on page 6: as sparse prompt -> as a sparse prompt
    • Typo in caption of Fig. 4: rapidly get corrected -> is rapidly corrected
    • Typo on page 7: advanced to -> improved by
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the authors make factually incorrect claims in the introduction and compare only to a limited scope of SAM-based approaches, their work presents a unification of multiple meaningful components seen in the interactive segmentation domain. The authors thoroughly ablate their framework by alternately omitting these components which further justifies that this unification is reasonable. Their results are convincing and their approach is general (not limited to a particular dataset or imaging modality). Hence, I would opt for a weak accept. However, I would like the authors to elaborate why they have chosen only SAM-based approaches for their comparisons and to omit all the false claims in the introduction section.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents an interactive segmentation in medical imaging, which accepts both sparse and dense prompt. The proposed method allows continuous improvements and utilises a corrective refinement network to correct the mislabeled voxels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea is novel in that it accepts various prompts, is trained with iterative and confidence learning, employs a corrective refinement network to produce precise segmentation for challenging conditions.
    2. The experiments are quite solid with promising improved results with repect to state-of-the-art segmentation methods on 4 3D CT data set.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The definition/description of loss function terms can be improved.
    2. It is not clear how the algorithm works in the 3D space. Fig. 2 only shows the situation in the 2D case.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please illustrate more clearly about the loss function, and describe how the algorithm works in the inference stage.
    2. Although the improvements are evident, The authors are strongly to conduct statistical tests (e.g., paired t-test) to validate the statistical significance.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel idea and strong validation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    Based on SAM, an interactive deep learning model is developed and presented, that offers the expert users a broad range of interaction paradigms (scrabble, masks, points, boxes…), thus establishing a human-in-loop interactive segmentation model. Novelty of the paper is especially this kind of generic interaction that is achieved. Besides, an ensemble of segmentation masks candidates gets aggregated for improved quality of results and to allow for iterative feedback loops, too. Furthermore, the segmentation masks can get provided as visual input markers, implicitly designing a self-adapting system this way.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Very good writing style, very precise and detailed reference to state of the art literature and their reviewing. Impressive results when comparing to other state of the art approaches. Thanks to generic user interaction paradigms, high potential for transfer into clinical routine

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Statement in introduction providing motivation for this novel approach can be a matter of discussion, stating “Moreover, 2D models [5,31,22] are not considered efficient…” as there are plenty of well-known strategies on propagating 2D markers in 3D volume processing.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The code is publicly available (will be in the non-anonymous paper version)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Layout of abstract “too much” – plain text favourable over underlining words and so on.

    Due to lacking space in the paper, the equations are directly written within the lines what reduces the readability and reading flow a bit. Check format of “4e-5” as only the “-“ shows a higher position and not the digit “5” too Statement “PRISM-plain only uses point prompts, while PRISM-ultra can handle other sparse visual prompts such as boxes and scribbles. All results are generated with seeds.” A bit of a sucker-punch. The paper claims to be totally generic w.r.t. visual input hints. Nevertheless this seems to be true only for the “ultra” version. Besides, only seeds (point markers and no scribble?) are utilized for this paper – if not seed refers to the model random generator seeds.

    Fig.3 if “DICE score” on y axis, then the scalar range is invalid as it should be [0;1]. In your case it is the DICE percentage as before! Fix the table captions! S1, S2 links for the tables go towards nirvana and a not a good label for table numbers anyway. Same is true for figures, cf. “ Fig. 4 and Fig. S.6”!!!

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    novel, sound approach. Paper itself only needs marginal corrections w.r.t. layouting

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their positive reviews that led to an early accept decision. We would like to take the opportunity to address a few minor issues that were raised and clarify misunderstandings. The manuscript text will be revised accordingly, and all typos etc. raised by reviewers will also be fixed.

R1: Comparison fairness. This seems to be a misunderstanding: All the interactive seg-mentation models access the labels for prompt sampling during inference, so the com-parisons are fair. Furthermore, for fairness re: prompt types such as bounding box, we included PRISM-plain, which uses the same prompt settings as the compared methods.

R1: Hybrid encoder. We directly adopted the CATS network without modifications.

R1: Retain the gradient. This is a typo, we retain the gradient for continuous maps, not the dense prompt. Indeed, y.detach() is passed in the subsequent iteration.

R1: MLP. This is another typo, we have two separate MLPs. We fixed this by adding superscripts.

R3: Propagation from 2D. For some 3D images with large slice thickness, propagating methods may fail as they rely on prediction information from adjacent slices. Such methods may not be robust for tumor segmentation, where large variations and ambiguous edges are present.

R3: Prompt types. This seems to be a misunderstanding: our results indeed do include points, bounding boxes, and scribbles. PRISM-plain is a simplified ablation version that only uses point prompts (no BB or scribble) for a fair comparison to other point-only prompt methods. Indeed, seed refers to the model random generator seeds.

R4: 2D vs 3D. Our network is built in 3D and takes 3D images as input, with the prompts also generated in a 3D manner. Figure 2 is merely a 2D illustration of the prompts used in our study.

R5: Incorrect claim. We apologize for the poorly worded claim: we were referring to the recent wave of SAM-based methods rather than generic interactive methods. As the re-viewer notes, our method and experiments are entirely focused on SAM-style interac-tions (see also next point). We will carefully clarify this in the main manuscript.

R5. Why SAM-based only. We focus on SAM due to its wide generalizability on various tasks and prompt-efficient encoding.

R5. SAM notorious for medical images. We note that the compared SAM-based meth-ods are in fact all designed for tumor segmentation (except SAM itself) with publicly available code repositories. We chose these methods because they can be classified as model adaptations from SAM and medical image foundation models, allowing us to compare different aspects.

R5. Non-SAM comparisons. Tumor segmentation is a challenging task that lacks robust solutions. Our primary goal is to provide a feasible solution for clinical routines. We acknowledge that older methods could be compared against. However, further compari-sons are not essential for demonstrating the effectiveness of our approach, as our pro-posed methods have already achieved human-level performance, as evidenced in our submission. This underscores the main point of interactive segmentation rather than fo-cusing on exhaustive comparisons.




Meta-Review

Meta-review not available, early accepted paper.



back to top