Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Medical Image Grounding (MIG), which involves localizing specific regions in medical images based on textual descriptions, requires models to not only perceive regions but also deduce spatial relationships of these regions. Existing Vision-Language Models (VLMs) for MIG often rely on Supervised Fine-Tuning (SFT) with large amounts of Chain-of-Thought (CoT) reasoning annotations, which are expensive and time-consuming to acquire. Recently, DeepSeek-R1 demonstrated that Large Language Models (LLMs) can acquire reasoning abilities through Group Relative Policy Optimization (GRPO) without requiring CoT annotations. In this paper, we adapt the GRPO reinforcement learning framework to VLMs for Medical Image Grounding. We propose the Spatial-Semantic Rewarded Group Relative Policy Optimization to train the model without CoT reasoning annotations. Specifically, we introduce Spatial-Semantic Rewards, which combine spatial accuracy reward and semantic consistency reward to provide nuanced feedback for both spatially positive and negative completions. Additionally, we propose to use the Chain-of-Box template, which integrates visual information of referring bounding boxes into the reasoning process, enabling the model to explicitly reason about spatial regions during intermediate steps. Experiments on three datasets MS-CXR, ChestX-ray8, and M3D-RefSeg—demonstrate that our method achieves state-of-the-art performance in Medical Image Grounding. Ablation studies further validate the effectiveness of each component in our approach. Code, checkpoints, and datasets are available at https://github.com/bio-mlhui/MedGround-R1

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1222_paper.pdf

SharedIt Link: https://rdcu.be/eHwTJ

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04971-1_37

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/bio-mlhui/MedGround-R1

Link to the Dataset(s)

https://github.com/bio-mlhui/MedGround-R1

BibTex

@InProceedings{XuHui_MedGroundR1_MICCAI2025,
        author = { Xu, Huihui AND Nie, Yuanpeng AND Wang, Hualiang AND Chen, Ying AND Li, Wei AND Ning, Junzhi AND Liu, Lihao AND Wang, Hongqiu AND Zhu, Lei AND Liu, Jiyao AND Li, Xiaomeng AND He, Junjun},
        title = { { MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {391 -- 401}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents the reinforcement learning framework for Medical Image Grounding (MIG), adapting the Group Relative Policy Optimization (GRPO) technique to vision-language models (VLMs) to remove the reliance on expensive Chain-of-Thought (CoT) annotations. The authors introduce Spatial-Semantic Rewards, which jointly consider spatial accuracy and semantic consistency. Moreover, the authors propose the Chain-of-Box prompt, a novel reasoning structure that integrates bounding box visual features directly into the reasoning process, improving spatial reasoning capabilities. Experimental results demonstrate state-of-the-art performance on three public datasets (MS-CXR, ChestX-ray8, M3D-RefSeg).
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed method replaces supervised fine-tuning with a reinforcement learning-based approach, reducing dependency on large annotated CoT datasets.
- This paper targets spatial reasoning and semantic understanding, which is important in clinical applications.
- The paper is well written and well structured.
- The proposed method (Spatial-Semantic Rewards and Chain-of-Box design) is well-aligned with the core challenges in MIG.
- Empirical results across three datasets show quantitative and qualitative improvements over existing methods.
- Ablation studies confirm the individual impact of the proposed components, enhancing the credibility of the approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Adapting and stabilizing GRPO in the VLM may introduce training instability, which may limit reproducibility. The submission does not provide sufficient information for reproducibility but the authors claimed to release the source code and/or dataset upon acceptance of the submission.
- The Chain-of-Box prompt assumes the availability of bounding box information, which may not always be feasible in real-world applications or under minimal supervision.
- The authors simply adopt GRPO technique for medical applications. More discussion on challenges to apply for MIG task will be benefit.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel and timely contribution to the field of MIG by introducing a RL-based training framework that eliminates the need for expensive CoT reasoning annotations. The proposed method leverages GRPO, previously applied in language domains, and adapts it to VLM for MIG. The Spatial-Semantic Reward function and the Chain-of-Box reasoning prompt enhance the model’s ability to reason about both spatial and semantic aspects. The method is well-aligned with the demands of clinical image interpretation, particularly where accurate spatial grounding is critical, and annotated data is scarce. Empirical results across three publicly available medical datasets demonstrate state-of-the-art performance, with ablation studies validating the contribution of each proposed component. Overall, the proposed method looks novel and shows significant contributions to the medical domain. Therefore, I would recommend a weak accept.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper presents a reinforcement learning framework for Medical Image Grounding (MIG) using GRPO. The paper adaptes GRPO to VLMs for MIG without CoT annotations. This paper introduces Spatial-Semantic Rewards to provide nuanced relative advantage for spatially-negative completions and the Chain-of-Box template to explicitly integrate visual information into the intermediate reasoning steps. Experiments on three datasets show that their method achieves SOTA performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novel Training Method: Applying GRPO to VLM enables MIG with reasoning capabilities without relying on CoT annotations. This training approach breaks the traditional dependence on costly CoT annotations.
2. MIG-Task-Specific Rules: In GRPO, Spatial-Semantic Rewards and the Chain-of-Box template, which are customized for the MIG task, are introduced. Spatial-Semantic Rewards provide a more nuanced understanding of the relative advantage for spatially negative completions. The Chain-of-Box template explicitly integrates visual information into the intermediate reasoning steps, allowing the model to better utilize the visual data associated with medical images.
3. Strong Performance: This paper conducts comprehensive experiments on three datasets and achieves performance on these datasets that far exceeds the baseline model.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Reward Design Complexity: The method employs three distinct rewards, Format, Spatial, and Semantic, to guide optimization. While each has a clear purpose, the multi-reward structure introduces hyperparameter tuning complexity. However, the paper lacks an in-depth analysis of how to balance these rewards, which may make it difficult for other researchers to replicate and optimize the method easily.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a significant and well-executed contribution to Medical Image Grounding by adapting GRPO to vision-language models without relying on Chain-of-Thought annotations. The introduction of Spatial-Semantic Rewards and the Chain-of-Box template addresses key challenges in spatial reasoning and visual integration. Despite some complexity in reward tuning, the method shows strong empirical performance across three datasets. Given its clinical relevance, methodological novelty, and practical impact, this paper is a valuable addition to the MICCAI community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The authors adapted Group Relative Policy Optimization (GRPO) reinforcement learning framework from DeepSeek-R1 to Vision-Language Models (VLMs) for Medical Image Grounding (MIG). They proposed the Spatial-Semantic Rewarded Group Relative Policy Optimization to train the model without Chain-of-Thought (CoT) reasoning annotations. Experiments on three datasets demonstrated that the proposed method could achieve state-of-the-art performance in MIG.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

It is a novel application to apply GRPO from the recently released DeepSeek-R1 to MIG. The proposed Spatial-Semantic Rewarded GRPO can train the VLMs more efficiently without large amounts of CoT reasoning annotations. Experiments on three datasets support its feasibility and superiority.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The core method, GRPO, was introduced by DeepSeek-R1, so the novelty of this paper is somewhat reduced.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

It seems the authors did not report on the computational efficiency (i.e., model training time) of their method compared to other approaches.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This study addresses a hot topic in AI and medical image computing. It is a novel application to adapt GRPO reinforcement learning to medical image grounding. I think it will attract substantial attention from the audience.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We would like to express our sincere gratitude to the ACs, Reviewers, and PCs for their valuable time and thoughtful feedback on our paper.

R1:

Novelty: While GRPO was initially introduced in DeepSeek-Math, our contribution lies in enhancing its reward function specifically for Medical Image Grounding (MIG). We propose the Spatial-Semantic Rewarded GRPO, which better supports medical reasoning by focusing on both spatial and semantic aspects of the ROI prediction.

R2:

Reproducibility: We are currently preparing our code and data, and further details will be made available in our repo.

Box Supervision: The chain-of-box prompt is a prompting strategy that does not require intermediate box supervision during training. Its purpose is to encourage the model to output useful box coordinates during thinking.

We believe the main challenge for applying GRPO to MIG is using more powerful medical VLMs. While we fine-tuned Qwen2.5-VL in the general domain, we believe a medical-specific VLM will achieve significantly better performance.

R3:

Reward Design: Our primary contribution is the semantic reward. Although we have included an ablation study on different reward strategies, we plan to provide a more in-depth analysis in the preprint version.

Thank you once again for your support and assistance.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

MedGround-R1: Advancing Medical Image Grounding via Spatial-Semantic Rewarded Group Relative Policy Optimization

Author(s):