Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Achieving equity in healthcare accessibility requires lightweight yet high-performance solutions for medical image segmentation, particularly in resource-limited settings. Existing methods like U-Net and its variants often suffer from limited global Effective Receptive Fields (ERFs), hindering their ability to capture long-range dependencies. To address this, we propose U-RWKV, a novel framework leveraging the Recurrent Weighted Key-Value(RWKV) architecture, which achieves efficient long-range modeling at O(N) computational cost. The framework introduces two key innovations: the Direction-Adaptive RWKV Module(DARM) and the Stage-Adaptive Squeeze-and-Excitation Module(SASE). DARM employs Dual-RWKV and QuadScan mechanisms to aggregate contextual cues across images, mitigating directional bias while preserving global context and maintaining high computational efficiency. SASE dynamically adapts its architecture to different feature extraction stages, balancing high-resolution detail preservation and semantic relationship capture. Experiments demonstrate that U-RWKV achieves state-of-the-art segmentation performance with high computational efficiency, offering a practical solution for democratizing advanced medical imaging technologies in resource-constrained environments. The code is available at https://github.com/hbyecoding/U-RWKV.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2891_paper.pdf

SharedIt Link: https://rdcu.be/eHw7y

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_59

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/hbyecoding/U-RWKV

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YeHon_URWKV_MICCAI2025,
        author = { Ye, Hongbo AND Tang, Fenghe AND Zhao, Peiang AND Huang, Zhen AND Zhao, Dexin AND Bian, Minghao AND Zhou, S. Kevin},
        title = { { U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {613 -- 623}
}

Reviews

Review #1

Please describe the contribution of the paper
The paper leverages the Recurrent Weighted Key-Value (RWKV) architecture [14] and proposes U-RWKV with two components:
- Direction-Adaptive RWKV Module (DARM), which includes Dual-RWKV and QuadScan
- Stage-Adaptive SE Module (SASE)
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper leverages the Recurrent Weighted Key-Value (RWKV) architecture [14] and proposes U-RWKV, which is lightweight and achieves better average Dice scores in the experiments.
2. By applying the QuadScan mechanism, the Time Mixing of RWKV becomes a Spatial Mixing module.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The meaning of Effective Receptive Fields (ERFs) is vague and not supported with references.
2. Figure 1(a) does not clarify how the experiment was conducted. It is unclear whether the IoU results are based on the experiment settings from Table 1.
3. Equation 1 is referenced from Bi-WKV [7] but is not fully explained. For example, the meanings of k_i and v_i are not provided.
4. In Equation 2, the terms “cha” and “spa” are introduced but not explained. These annotations are reused in Algorithm 1 and Equation 4 without clarification, whereas Equation 3 does explain “Stack” and “Average.”
5. Based on Section 2.3, the Stage-Adaptive SE Module (SASE) appears to be a variant of a CNN module, but no further explanation is provided. Figure 2 also does not explain the meaning of “SERation.”
6. In Equation 4 of the Dual-RWKV mechanism, the phrase “input feature map is spatially transposed” is unclear without proper explanation, and the symbol X^← is not defined. If it means swapping (H, W) to (W, H), then it raises the question of whether QuadScan \theta_4 for (W, H) is the same as \theta_1 for (H, W).
7. The paper mentions the Effective Receptive Fields (ERFs) problem but does not provide further explanation. Figure 1(b) lacks clarity on how the visualization was performed. Moreover, U-Manba and U-ViT are mentioned in Figure 1(b) but are not included in Table 1.
8. Figure 3 is not referenced or discussed in the paper.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1. The paper contains multiple typos:
  - “in 1(a)” should be “in Fig. 1(a)” for consistency; the same applies to “as shown in 1(b)”
  - Typos such as “atch”, “procss”, and “torch cancat” should be corrected
  - “in Fig. 2(b)” should have the number “2” in red to match the formatting used elsewhere
2. It is unclear why only Table 3 uses IoU while the other tables use Dice. This inconsistency in evaluation metrics should be addressed or justified.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the mentioned weaknesses above, the paper is still not of high quality.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Thank you to the authors for taking the time to address all my questions and revise the paper.

The authors responded to my question regarding the omission of U-ViT and U-Mamba results with the following explanation: “While the original manuscript omitted U-ViT/U-Mamba Dice comparisons due to space constraints, our updated ablation now includes: U-RWKV achieves an 86.88% average Dice on BUSI, Kvasir, and ISIC’17 datasets, outperforming both U-ViT (86.21%) and U-Mamba (85.34%).” However, I find this explanation somewhat vague. The models are briefly mentioned in Figure 1 but omitted from Table 1, and “space constraints” seems a weak justification—especially given the marginal improvement (86.88% vs. 86.21%).

That said, this appears to be the first paper among those I have reviewed to explore RWKV in medical imaging, which is a valuable contribution.

I hope the authors consider open-sourcing their code to enable the community to further explore and validate their method.

Review #2

Please describe the contribution of the paper

The authors propose a new framework based on the Recurrent Weighted Key-Value strategy. It introduces two main components Direction-Adaptive RWKV Module, which aims to reduce directional bias while keeping global context, and Stage-Adaptive Squeeze-and-Excitation Module , which aims to adjust different feature stages to balance detail and semantic capture capacity of the network.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A key strength of this work is the development of a parameter-efficient strategy that maintains strong performance, which is a good contribution for making AI solutions more accessible in clinical practice. Additionally, the method’s effectiveness is demonstrated across multiple datasets and medical tasks, highlighting its versatility and potential for adoption in several tasks..
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

One weakness of this work lies in the lack of clarity and detail regarding the training process. information such as hyperparameter selection, dataset division, and network initialisation is missing, which may difficult the reproducibility of the results obtained. Additionally, the inconsistency in evaluation metrics, using Dice for state-of-the-art comparisons and IoU for ablation studies, makes it difficult to interpret the impact of the proposed modules. Finally, the paper does not discuss the limitations of the method or outline directions for future work, which are important for understanding its scope and potential improvements.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed work introduces an efficient architecture that reduces model complexity while maintaining strong performance, which is highly relevant for enabling a broader clinical adoption of AI. The demonstrated improvements across multiple medical tasks further support its potential impact. However, the paper lacks essential training details such as hyperparameter choices, dataset splits, and initialization strategies, making it difficult to reproduce the results. The inconsistent use of evaluation metrics also hinders a clear assessment of the contributions
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This work proposed a lightweight framework U-RWKV for medical image segmentation, which can obtain global recetpive field.with very low computational cost.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The research focus is very practical application relevant. They tried to find a way to achieve both efficiency and effectiveness (global receptive field) simultaneously. This is the first time I see RWKV related work in miccai. It is new and I think it would collect the interest of many attendees.
2. The comparison is comprehensive including some widely adopted CNNs and Transformers, and evaluated by both effectiveness and computational cost. The datasets are also diverse.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The key concern is that the techinical contribution of this work is somewhat limited. Basically, they transformed the RWKV to applied for 2D images rather than 1D sequences, and designed a SE module for it. Frankly, the SE module seems very incremental, while their strategy for transforming 2D images to 1D sequences were previously developed for MAMBA, i.e., arrange the image patches in a the zig-zag sequence from difference directions. One underlying question is whether the order between image patches is reasonable.
2. It arises another problem i.e., this work did not compare with the Mamba quantatively, although they compared with Mamba in terms of receptive field in Figure 1 (b).
3. A minor problem is that it might be easier to follow if they could reorganize the method section. It seems more fluent to me if they do not put SASE between the preliminary of RWKV and the Direction-Adaptive RWKV module.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The techinical contribution is fine, but it would be exciting to see a RWKV-related work in MICCAI.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Thanks for the response from the authors. I recommend to accept this work for the novelty of RWKV, and their clear justification of the implementation details and the difference compared to MAMBA.

Author Feedback

We sincerely thank all reviewers for their insightful feedback and recognition of our methodology’s novelty (R3), practical utility (R1), parameter efficiency (R1, R2), and technical contributions. Notably, we appreciate Reviewer R3’s enthusiasm for exploring RWKV in this research community.

Before addressing specific comments, we clarify our core contributions, particularly in response to concerns about SASE (R2, R3) and the RWKV transformation (R3). Contrary to R2 and R3’s impression, SASE is not a CNN variant but a stage-adaptive feature adapter designed to complement RWKV. Its architecture enhances hierarchical feature delivery to DARM: shallow layers use inverted bottlenecks (SERatio=4) for spatial details, while deeper layers employ channel-split depthwise blocks (Equiv. SERatio=1/4) for semantic abstraction. This resolution-aware design improves feature informativeness for DARM. Ablation studies show that replacing SASE with a conventional Conv-BN-ReLU-Conv block degrades performance (-2.3% Avg. Dice). We also clarify that DARM’s innovation lies precisely in its explicit bidirectional spatial modeling capability, which fundamentally differs from Mamba’s unidirectional approach. While Mamba employs a single zigzag scan pattern, our QuadScan mechanism introduces four independent directional scans (→, ←, ↓, ↑), including the novel bottom-right to top-left sensitivity, combined with Dual-RWKV’s non-shared forward/backward computations to capture richer spatial relationships—a critical advantage for medical imaging where contextual awareness is paramount. Our ablations and ERF results validate the integration of novel SASE+DARM.

R1 Q1: Training Details Training settings are consistent with those in [1], with the following exceptions: 280 training epochs using a single 3090 GPU; the official Synapse dataset is employed, while other datasets follow a 70/30 training-validation split; the RWKV model is initialized using the weights from [2]. These configurations are detailed in Section 3.1, and code will be released upon manuscript acceptance. Q2: Metric Consistency All evaluation results now use Dice across all tables and figures. Q3: Limitations & Future Work U-RWKV inference is slightly slower than CNNs (e.g., UNeXt). Future plans include: (1)adapting to high-resolution settings, and (2)extending to 3D segmentation tasks.

R2 Q1 & Q8: ERF & Mamba Comparison The Effective Receptive Field (ERF) analysis measures pixel-wise influence on the output center [3], following the gradient backpropagation method in [4] (as noted in the revised Fig.1 caption). Our quantitative comparison of U-RWKV, U-ViT, and U-mamba architectures — where the latter two replace DARM with ViT and Mamba blocks while maintaining the U-RWKV baseline — shows that U-RWKV exhibits broader and more directional ERFs(Fig.1(b)). While the original manuscript omitted U-ViT/U-mamba Dice comparisons due to space constraints, our updated ablation now includes: U-RWKV achieves an 86.88% average Dice on BUSI, Kvasir, and ISIC’17 datasets, outperforming both U-ViT (86.21%) and U-Mamba (85.34%). Q2: Training Details See R1-Q1. Q6: SASE & SERatio See technical clarification. Q7, Q4, Q5, Q9: Equations & Notation Eq.4 Θ(X) ,Θ(X^←) are replaced with s, s^←; ‘←’ for sequence reversal; minor omissions (e.g., spa and Fig.3 reference) have been fixed in the revision.

R3 Q1: Contribution See technical clarification. Q2: Mamba Comparison See R2-Q1 for ERF explanation and ablation comparisons. Q3: Method Flow Thanks. We now present the Method as: Architecture→ SASE and Decoder→RWKV→DARM.

[1] CMUNeXt: An efficient medical image segmentation network based on large kernel and skip fusion. ISBI 2024 [2] Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures. arXiv 2024 [3] Understanding the effective receptive field in deep convolutional neural networks. NeurIPS 2016 [4] Scaling up your kernels to 31x31: Revisiting large kernel design in CNNs. CVPR 2022

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All reviewers tend to accept this paper.

back to top

U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV

Author(s):