Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multimodal medical image segmentation faces significant challenges in the context of gastric cancer lesion analysis. This clinical context is defined by the scarcity of independent multimodal datasets and the imperative to amalgamate inherently misaligned modalities. As a result, algorithms are constrained to train on approximate data and depend on application migration, leading to substantial resource expenditure and a potential decline in analysis accuracy. To address those challenges, we have made two major contributions: First, we publicly disseminate the GCM 2025 dataset, which serves as the first large-scale, open-source collection of gastric cancer multimodal MRI scans, featuring professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients. Second, we introduce HWA-UNETR, a novel 3D segmentation framework that employs an originality HWA block with learnable window aggregation layers to establish dynamic feature correspondences between different modalities’ anatomical structures, and leverages the innovative tri-orientated fusion mamba mechanism for context modeling and capturing long-range spatial dependencies. Extensive experiments on our GCM 2025 dataset and the publicly BraTS 2021 dataset validate the performance of our framework, demonstrating that the new approach surpasses existing methods by up to 1.68\% in the Dice score while maintaining solid robustness. The dataset and code are public via this URL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3634_paper.pdf

SharedIt Link: https://rdcu.be/eHw6c

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_27

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/JeMing-creater/HWA-UNETR

Link to the Dataset(s)

https://github.com/JeMing-creater/HWA-UNETR

BibTex

@InProceedings{LiaJia_HWAUNETR_MICCAI2025,
        author = { Liang, Jiaming AND Dai, Lihuan AND Sheng, Xiaoqi AND Chen, Xiangguang AND Yao, Chun AND Tao, Guihua AND Leng, Qibin AND Cai, Hongmin AND Zhong, Xi},
        title = { { HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {273 -- 282}
}

Reviews

Review #1

Please describe the contribution of the paper
1. GCM 2025 Dataset: The authors introduce the first large-scale, open-source multimodal MRI dataset for gastric cancer research, featuring professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients. This dataset addresses the scarcity of high-quality multimodal data for gastric cancer lesion analysis.
2. HWA-UNETR Framework: The authors propose a novel 3D segmentation framework that integrates Hierarchical Window Aggregate (HWA) blocks for dynamic cross-modal feature correspondence and Tri-orientated Fusion Mamba (TFM) blocks for global, multi-scale feature modeling. This framework achieves state-of-the-art performance on both the GCM 2025 and BraTS 2021 datasets, demonstrating superior accuracy and robustness.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This work has publicly released the GCM 2025 dataset, which serves as a valuable resource for future research in the field of gastric cancer segmentation.
2. The HWA-UNETR framework incorporates novel components such as HWA blocks for cross-modal alignment and TFM blocks for modeling long-range dependencies, and demonstrates excellent performance in multimodal medical image segmentation.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. It seems that the proposed method is not unique for the gastric cancer lesion segmentation problem, the authors should also experiment on other public medical dataset to test its eﬀectiveness.
2. The comparative experiment in this paper is insufficient. To better demonstrate the model’s advancements, the authors should expand the experiments to include comparisons with more recent state-of-the-art methods.
3. The ablation study could be expanded to include more detailed analyses of the individual components’ contributions, such as the impact of different window sizes in the HWA block.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

In Formula 1, feature extraction for a single modality input is conducted through multi-scale windows, and the fusion of features from different modalities is achieved through a dynamic weighting method. However, here it seems that only feature fusion is involved for different modalities, without explicitly mentioning the process of feature alignment.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors introduce the GCM 2025 dataset and demonstrate that HWA-UNETR achieves state-of-the-art performance in multimodal medical image segmentation. However, the experimental validation could be further strengthened to comprehensively evaluate the model’s capabilities.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper addresses the challenges in multimodal medical image segmentation for gastric cancer lesion analysis, particularly the scarcity of aligned multimodal datasets and the difficulty of integrating misaligned modalities. These issues force algorithms to rely on suboptimal training data and application migration, increasing resource costs and reducing accuracy. To overcome these limitations, the authors present two solutions: (1) the release of the GCM 2025 dataset, the first large-scale, open-source collection of gastric cancer multimodal MRI scans, comprising professionally annotated FS-T2W, CE-T1W, and ADC images from 500 patients; and (2) the introduction of HWA-UNETR, a novel 3D segmentation framework. Experiments on the GCM 2025 and BraTS 2021 datasets demonstrate state-of-the-art performance with improvements in Dice score over existing methods, alongside robust generalizability.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The major strength of the paper is:
1. The first large-scale, open-source multimodal MRI dataset (GCM 2025 Dataset) for gastric cancer, featuring aligned FS-T2W, CE-T1W, and ADC scans from 500 patients with expert annotations. Considerable contribution of the paper are:
2. A learnable window aggregation layer that dynamically aligns anatomical features across misaligned modalities, improving cross-modal feature correspondence.
3. An innovative context modeling approach that captures long-range spatial dependencies in three orientations, enhancing segmentation accuracy.
4. Comprehensive experiments demonstrating superior results and robustness on both the GCM 2025 and BraTS 2021 datasets, advancing multimodal segmentation in clinical settings.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Major weaknesses:
1. HWA is very similar to FPN. Please explain the reason for designing this when FPN already exists. Also, hierarchical feature extraction and multi scale feature extraction techniques are abundantly available in the segmentation architecture. A justification on how and why HWA is better needs to be presented.
2. SGC block’s downsampling ability is not well explained. Adaptive downsampling techniques to retain important features like https://openaccess.thecvf.com/content/CVPR2023W/ECV/papers/Hesse_Content-Adaptive_Downsampling_in_Convolutional_Neural_Networks_CVPRW_2023_paper.pdf and https://dl.acm.org/doi/pdf/10.1145/3072959.3073670?casa_token=vroRn_iomt8AAAAA:fWcfRZDzKMeKaNlEHEBn8n3kT-prH_kLec9jIEwitZ5atJiD3Oar3o9osPtSBwtYqKSsEv9XYBnH, https://arxiv.org/pdf/2307.09804 (spectral downsampling techniques). A discussion on why SGC is better than the existing techniques need to be presented. Minor weaknesses:
3. Heatmaps can be provided on how the modules are impacting the the feature extraction, this can help justify the questions I raised. Also, a justification on how the HWA and SGC modules are different from the existing literature is needed along with the visualization to justify.
4. Fig 2 texts are small and difficult to read, please try to enhance the quality of the image and enlarge the texts.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper has a good contribution of the proposed new dataset. However, if the questions regarding the method can be addressed, then the decision can be easily converted to weak accept. The architectural components of the proposed model is very similar and in the same logical direction as some of the already existing techniques.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
1. This paper releases the first publicly available multimodal MRI resource for gastric cancer research.
2. This paper proposes a a 3D segmentation framework combining a novel Hierarchical Window Aggregate (HWA) block for dynamic cross-modal alignment and a Tri-orientated Fusion Mamba (TFM) mechanism for global multi-scale modeling.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Dataset impact: GCM 2025 fills a critical gap as the first dedicated gastric cancer MRI dataset, enabling reproducible multimodal segmentation research.
2. Generalizability: Validated on both gastric (GCM 2025) and brain (BraTS 2021) tasks, achieving superior performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

A weakness lies in the undefined abbreviation ‘MFA’ in Table 3. While the ablation study references the ‘MFA Block’ as a core component, the main text fails to expand or define this acronym, leading to ambiguity in methodological transparency.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper publicly disseminate the GCM 2025 dataset, which serves as the first large-scale collection of gastric cancer multimodal MRI scans.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We appreciate the reviewers’ positive comments on the novelty and effectiveness of our method. At the same time, we are also grateful for the critical feedback on this work. Considering the word limit, we can only respond to some of the questions as follows: Reviewer#1 Q1&Q2: In fact, we initially conducted more extensive comparative experiments than those proposed in the current manuscript, including evaluations of more mainstream segmentation methods and on a wider range of cross organ datasets (excluding BraTS). However, due to space constraints, we regretfully had to omit most of these results. We tend to include the full set of results in a potential expanded version of the manuscript, rather than the currently submitted version. We sincerely appreciate your understanding. Q3: Thank you for your constructive feedback on our experimental design. In fact, we conducted ablation studies on the HWA window size settings. In addition to the current configuration of window sizes (1, 2, 4, 8) which achieved a Dice score of 74.21%, we also evaluated several alternative settings: (1, 2, 2, 2) with a Dice score of 73.42%, (1, 2, 2, 4) with 73.67%, (1, 2, 4, 4) with 73.23%, and (2, 2, 4, 4) with 72.87%. However, due to space limitations and prioritization of the most critical findings, we did not include these additional results in the current version of the manuscript. We hope for your kind understanding and that this response helps clarify your concerns to some extent. Reviewer#2 Q1: HWA differs from FPN in motivation and implementation. While FPN fuses multi-scale features, HWA aggregates local semantic information via window-based attention, addressing modality misalignment in multi-modal medical imaging. HWA also introduces learnable adaptive weighting for dynamic feature recalibration, unlike FPN’s static fusion. Additionally, HWA is modular and can be flexibly inserted as a modality-aware pre-processing mechanism. Combined with tri-oriented fusion Mamba, HWA-UNETR achieves superior performance in complex multi-modal segmentation tasks. Q2: While adaptive downsampling methods excel at restoring high-resolution features, our Stratified Group Convolution (SGC) Block has a different goal: preserving spatial context for state-space computation in the Tri-orientated Fusion Mamba (TFM) Block. Unlike static pooling or content-driven approaches, SGC leverages early modality interactions to highlight semantically important regions, crucial for multi-modal medical segmentation, where small lesions and modality mismatches matter. SGC performs adaptive downsampling under semantic supervision, preserving diagnostically critical details better than conventional methods. Q3&Q4: Thank you for your feedback. Heatmaps and module comparisons (HWA/SGC vs. literature) were prepared but omitted due to page limits. We’ll share them on GitHub for transparency. Moreover, we’ve enlarged Figure 2’s text in the final version and will also provide a high-resolution copy on GitHub. These adjustments aim to clarify our method’s novelty (e.g., HWA’s dynamic modality alignment vs. FPN’s static fusion) and improve readability. We appreciate your suggestions and hope the supplementary materials resolve these concerns. Reviewer#3 Q1: We sincerely thank you for your careful reading of our manuscript and for accurately pointing out the mistake. Indeed, “MFA” was an incorrect abbreviation that mistakenly referred to our proposed module, the Tri-orientated Fusion Mamba (TFM) Block. This error resulted from an oversight during manuscript preparation, and we offer our sincere apologies for the confusion it may have caused. We have already corrected this issue in the camera-ready version, with the appropriate revisions made in both Table 3 and Section 4.4. Once again, we apologize for the inconvenience this may have brought to your reading experience, and we truly appreciate your attentive review and valuable feedback.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The paper provides a multi-modal gastric cancer dataset and presents an approach for combining multiple modality information despite potential misalignments in the images via a dynamic learnable window aggregation approach. However, the reviewers have raised several concerns, which should be addressed.

back to top

HWA-UNETR: Hierarchical Window Aggregate UNETR for 3D Multimodal Gastric Lesion Segmentation

Author(s):