Abstract

In the rapidly advancing field of medical image analysis, Interactive Medical Image Segmentation (IMIS) plays a crucial role in augmenting diagnostic precision. Within the realm of IMIS, the Segment Anything Model (SAM), trained on natural images, demonstrates zero-shot capabilities when applied to medical images as the foundation model. Nevertheless, SAM has been observed to display considerable sensitivity to variations in interaction forms within interactive sequences, introducing substantial uncertainty into the interaction segmentation process. Consequently, the identification of optimal temporal prompt forms is essential for guiding clinicians in their utilization of SAM. Furthermore, determining the appropriate moment to terminate an interaction represents a delicate balance between efficiency and effectiveness. For providing sequential optimal prompt forms and best stopping time, we introduce an \textbf{A}daptive \textbf{I}nteraction and \textbf{E}arly \textbf{S}topping mechanism, named \textbf{AIES}. This mechanism models the IMIS process as a Markov Decision Process (MDP) and employs a Deep Q-network (DQN) with an adaptive penalty mechanism to optimize interaction forms and ascertain the optimal cessation point when implementing SAM. Upon evaluation using three public datasets, AIES identified an efficient and effective prompt strategy that significantly reduced interaction costs while achieving better segmentation accuracy than the rule-based method.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1442_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Hua_Optimizing_MICCAI2024,
        author = { Huang, Yifei and Shen, Chuyun and Li, Wenhao and Wang, Xiangfeng and Jin, Bo and Cai, Haibin},
        title = { { Optimizing Efficiency and Effectiveness in Sequential Prompt Strategy for SAM using Reinforcement Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces an Adaptive Interaction and Early Stopping (AIES) mechanism that employs reinforcement learning to optimize prompt strategies for the Segment Anything Model (SAM). This mechanism optimizes the sequence of prompts and determines the optimal point for interaction termination by leveraging a Deep Q-network (DQN) with an adaptive penalty mechanism. The approach enhances segmentation accuracy while reducing interaction costs, demonstrating superior performance on three datasets (Brain MRI, colonoscopy and Spleen CT scans) compared to traditional rule-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The introduction of an adaptive penalty mechanism for early termination is a well thought out feature that adjusts the cost of interaction based on its effectiveness.
    • The paper’s evaluation is robust, involving three different public 2D and 3D medical imaging datasets, which underscores the method’s generalizability and applicability.
    • The paper is well structured with clear explanations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The contributions section states that one of the contributions is “Sequential prompt strategy optimization”. This work might be built upon the methods proposed by Shen et al. [1] Could you please clarify the distinctions between your approach and that referenced in [1]?
    • The related work section is light.
    • Shen et al. [1] already introduced a DRL formulation that uses DQN for prompt optimization of SAM. However, no experimental comparison was done with this work.
    • The paper could benefit from a more detailed exposition on the RL configurations employed. Specifically, the selection of the RL model parameters, the architecture of the DQN, and how these choices impact the performance of the system could be discussed.

    [1] Shen, C., Li, W., Zhang, Y., Wang, Y., Wang, X.: Temporally-extended prompts optimization for sam in interactive medical image segmentation. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 3550– 3557. IEEE (2023)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please provide the running time required by the proposed DRL approach for each of the different strategies.
    • The paper does not thoroughly discuss the computational resources and time required to implement the AIES system.
    • Please provide more implementation details as well as information about hardware used.
    • Please proofread and correct grammar/typos:
    • “For providing sequential optimal prompt” -> to provide
    • “The recent introduced” -> The recently introduced
    • “natural image segmentation due to the sensitivity” -> to its sensitivity
    • “SAM is sensitive to the prompts forms” -> prompt forms
    • “number of steps and penalty” -> the penalty
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a DQN-based prompt optimization strategy to improve SAM’s zero-shot performance. The paper presents a robust experimental validation across multiple public 2D and 3D datasets which substantiates the effectiveness of the AIES mechanism. Most of the novelty relies on the adaptive penalty for early termination. However, the paper’s limitation lies in the missing comparative analysis with other DRL methods like the one introduced by Shen et al. [1]. Also, the related work is light and lacks detailed discussion on the differences between the proposed approach and Shen et al. [1]

    [1] Shen, C., Li, W., Zhang, Y., Wang, Y., Wang, X.: Temporally-extended prompts optimization for sam in interactive medical image segmentation. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). pp. 3550– 3557. IEEE (2023)

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns in their rebuttal. I am updating my score to Weak Accept.



Review #2

  • Please describe the contribution of the paper

    The paper seeks to improve the performance of the interactive Segment Anything Model when used with medical images in terms of accuracy and speed of final segmentation. A Deep Q-Network is used to determine optimal prompt forms for segmentation. The approach is evaluated on three public datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The application of a Deep Q-Network is as expected. An interesting innovation is using a loss function that penalizes the use of a large number of steps by adjusting the reward function, encouraging convergence to more efficient prompt form choices. The experimental results are comprehensive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Outside of penalizing long chains of prompt selections, the reinforcement learning approach used for the problem is typical.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Detailed comments:

    • The figures are quite small, and their text is hard to read, even when viewed under 300x magnification on a large monitor.
    • Acronyms such as IMIS and SAM are defined in the abstract but not in the body of the paper’s main text. They should be defined clearly in the main text of the paper.
    • Pages 4-5: In Equation (2), the reward is represented by $R_t$. In Equation (4) (and at the top of page 4), the reward is represented by $r_t$. Are these supposed to be the same?
    • Page 5: “… a constant penalty, $lambda$,…” => “… a constant penalty, $\lambda$,…”. $\lambda$ is what appears in the equation being discussed.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Outside of penalizing long chains of prompt selections, the reinforcement learning approach used for the problem is typical and predictable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors’ rebuttal comments were satisfactory in addressing my comments.



Review #3

  • Please describe the contribution of the paper

    The authors proposed a novel strategy called Adaptive Interaction and Early Stopping (AIES) to help the SAM select the optimal prompt modes (box/point) and termination time. The authors adopted the DQN-based reinforcement learning method to model the sequential decision tasks, and designed a reward function in terms of accuracy and penalty, to achieve the balance between segmentation efficiency and effectiveness. The proposed AIES has been tested on three public datasets, validating its potential to assist SAM in saving interaction time and improving final performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The motivation of this paper is good, and the core idea is novel. SAM is not robust to the prompt sequences, thus, developing an automatic method to select the optimal actions is important for its clinical application. Using DQN with two rewards is reasonable, and the results show that the proposed method achieves better performance than the previous strategy in most cases. The organization and writing of this paper are good, and easy-to-follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although the proposed AIES method can output the prompt mode automatically, it still needs the clinicians to draw a box or find the center points of the error area manually to prompt the SAM, which is also time-consuming. Some figures in the manuscript are confusing and need further expression. The authors only used DICE metric to evaluate the methods, however, distance-related metrics are also very important. For more comments refer to Q10.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Fig.1 (including the text) is too small, and it contains lots of lines (dotted, solid, different colors, etc.), making it difficult for readers to follow the pipeline. The authors should consider optimizing the Fig.1 to make it clearer.
    2. For the state space, the authors used three components including image slice, segmentation logits, and pervious prompt set. One concern is whether segmentation logits are required, since it may be totally wrong. Even if the user provides a correct prompt, the SAM may respond with an incorrect segmentation map, which will hurt the following performance significantly.
    3. For the discrete action space, the authors provided three options, including box, center of the error region, and termination: (1) During testing, the user should manually draw the tight box or select the center of the error region, is that right? If yes, how to ensure the user gives the correct prompt, especially for the center? It is often difficult for users to accurately identify areas of maximum FP or FN, thus, they may prompt the model with wrong positive or negative points (e.g., the model requests FP centers, but the users click on the FN region). (2) Immediately following (1), the prompting process contains two stages: (a) the RL agent output the action, and (b) the user takes the action. If the two stages are inconsistent, is the performance of the model affected? How to avoid and quantify this potential inconsistency? (3) Breaking the “clicking center” action down into positive and negative clicks may allow the model to learn relevant information and guide the user to better click on the error region. (4) In a real test, will the RL agent first outputs Action 2 (center), then outputs Action 1 (box)? If yes, this is not consistent with human operation (i.e., human usually draw a box first, then click). If no, the order of actions is relatively fixed (i.e., box-point-point-stop), the only effect of RL-agent is to stop the interaction. Please discuss the necessity of using RL in this task.
    4. In Fig. 2, the authors visualize the relationships between penalty, average length, and segmentation results. (1) What are the green and blue scatter points? How to get these points? (2) How to match the lines and points to y-axis (Dice or Penalty)? (3) AIES-Constant uses a constant penalty at each step, but in the figure, it seems that green points show different penalty levels.
    5. In Fig. 3, why the random strategy achieves a low misunderstanding proportion? The authors should discuss this phenomenon deeply to make the misunderstanding metric more credible.
    6. In Fig. 4, the red boxes are shown in act1 to act5 in the last row. Why are they counterfactual interactions? Besides, what’s additional red box on “5. act”?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) Good motivation, and novel methods. (2) Satisfactory performance. (3) Some unclear descriptions, especially in the method part.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed all of my concerns. I believe this paper should be accepted.




Author Feedback

#1:

Q:The RL method is typical. A:This work focuses on modeling optimization of interaction strategies. RL is employed not to introduce novel methods but as a powerful tool to enhance and optimize the decision-making process for SAM usage.

Responses for detailed comments: Figures enlarged. “R” is the reward function, “r” is the specific reward at a given moment. We have fixed abbreviations/typos.

#3

Q:Cost in testing A:Manual interaction is inevitable in interactive medical image segmentation. We aim to reduce it in future work.

Q:Lack HD A:Higher Dice scores generally correlate with lower HD values, as same as in our experiment. We didn’t include HD95 results due to space limits but will provide code for detailed results.

Q:Logits brings unstable A:Previous logit helps determine the optimal action. Errors affect performance, but training with incorrect logits reduces impact. Further exploration needed.

Q:Correct prompt is hard A:We introduced random perturbations(10 pixels) to enhance robustness. We use ‘center point’ to reduce confusion between FN and FP.

Q:Inconsistency following A:The random policy, akin to users with no preference, simulated this situation, and it was worse than our method. To address this, we suggest using post-processing methods.

Q:Split ‘Center’ A:Worse results were found. It may due to a larger action space complicating RL exploration.

Q:Doubts about learned strategy A:Using “center” before “box” yields better results. Deep learning allows RL agents to adapt quickly in new scenarios, outperforming manual exploration.

Q:Fig problem A:Fig 1 has been redrawn for clarity. We made an error in Fig 2. We split it into two charts: Average Length vs. Penalty and Average Length vs. Dice. A grid search of penalty hyperparameters yields various penalty levels. Experimental results show the random policy performs poorly, which represents it is a conservative strategy, leading to fewer misunderstandings. The red rectangle indicates actual interaction; counterfactual results stem from not following the algorithm with the center instead. Future revisions will use bold formatting for maximum Dice values.

#4

Q:About first contribution and distinctions
A:[1] focuses on optimizing interactive medical image segmentation interactions but ignores that fixed-length interactions may lead to suboptimal solutions.. Our work enhances prompt strategies by considering adaptive interaction rounds and early termination strategies with SAM, optimizing segmentation and reducing computation while maintaining accuracy.

Q:The related work is light. A:The added related work about [1] is shown below; Others have been added in the revised version! To address SAM’s sensitivity to interaction forms, a framework was developed to adaptively offer suitable prompts for users [1]. This method optimizes temporally extended prompts but overlooks fixed-length interactions. Our work improves prompt strategies by considering interaction forms and adaptive termination, introducing early termination to optimize SAM segmentation for efficiency and accuracy while reducing computational efforts.

Q:Comparison with [1] A:Our AIES is a more generalized formulation. When λ is 0 and the penalty is removed, it is equivalent to [1]. AIES(6) in Table 1 reproduces the TEPO algorithm. For other steps, since our interaction steps are averaged and not integers, a rigorous comparison with TEPO is not feasible, hence it was not included in the table.

Q:Detailed configurations A:We will release our code soon. We found that the number of environments sampled, training epochs, and buffer size are the key factors.

Responses for detailed comments: Experiments were conducted using four NVIDIA A100 40GB GPUs. Cost 8h to 9h. We have fixed grammar/typos.

Thank you for your review and suggestions. We have addressed all concerns and improved the manuscript. We appreciate your feedback!




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All the reviewers agree to accept it, and I also believe that this is a good article

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All the reviewers agree to accept it, and I also believe that this is a good article



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top