Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

With the significantly increasing incidence and prevalence of abdominal diseases, there is a need to embrace greater use of new innovations and technology for the diagnosis and treatment of patients. Although deep-learning methods have notably been developed to assist radiologists in diagnosing abdominal diseases, existing models have the restricted ability to segment common lesions in the abdomen due to missing annotations for typical abdominal pathologies in their training datasets. To address the limitation, we introduce MSWAL, the first 3D Multi-class Segmentation of the Whole Abdominal Lesions dataset, which broadens the coverage of various common lesion types, such as gallstones, kidney stones, liver tumors, kidney tumors, pancreatic cancer, liver cysts, and kidney cysts. With CT scans collected from 694 patients (191,417 slices) of different genders across various scanning phases, MSWAL demonstrates strong robustness and generalizability. The transfer learning experiment from MSWAL to two public datasets, LiTS and KiTS, effectively demonstrates consistent improvements, with Dice Similarity Coefficient (DSC) increase of 3.00% for liver tumors and 0.89% for kidney tumors, demonstrating that the comprehensive annotations and diverse lesion types in MSWAL facilitate effective learning across different domains and data distributions. Furthermore, we propose Inception nnU-Net, a novel segmentation framework that effectively integrates an Inception module with the nnU-Net architecture to extract information from different receptive fields, achieving significant enhancement in both voxel-level DSC and region-level F1 compared to the cutting-edge public algorithms on MSWAL. Our dataset and the code are publicly released at https://github.com/tiuxuxsh76075/MSWAL-.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2154_paper.pdf

SharedIt Link: https://rdcu.be/eHwNm

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_36

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/tiuxuxsh76075/MSWAL-

Link to the Dataset(s)

https://github.com/tiuxuxsh76075/MSWAL-

BibTex

@InProceedings{WuZha_MSWAL_MICCAI2025,
        author = { Wu, Zhaodong AND Zhao, Qiaochu AND Hu, Ming AND Li, Yulong AND Xue, Haochen AND Jiang, Zhengyong AND Stefanidis, Angelos AND Wang, Qiufeng AND Razzak, Imran AND Ge, Zongyuan AND He, Junjun AND Qiao, Yu AND Zheng, Zhong AND Tang, Feilong AND Dang, Kang AND Su, Jionglong},
        title = { { MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {378 -- 388}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents MSWAL, the first large-scale 3D multi-class abdominal lesion segmentation dataset that includes seven common lesion types (gallstones, kidney stones, liver tumors, kidney tumors, pancreatic cancer, liver cysts, and kidney cysts). The dataset consists of 694 CT scans (191,417 slices) from multi-phase contrast-enhanced CTs and is fully annotated with high-quality expert labels. The authors also propose a novel segmentation network, Inception nnU-Net, which integrates simplified Inception modules into the nnU-Net framework to capture multi-scale features. Extensive experiments demonstrate its superiority over existing SOTA segmentation models on MSWAL, and transfer learning results on LiTS and KiTS further validate the dataset’s generalizability.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. MSWAL fills a major gap in the current public datasets by including diverse lesion types beyond pan-cancer labeling and providing full annotation coverage, enabling more clinically relevant multi-lesion segmentation.
2. The authors conduct rigorous comparisons with six public SOTA models, region-level F1 analysis, ablation studies, and transfer learning experiments on external datasets (LiTS and KiTS), all of which support the claims of effectiveness and generalizability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. While the dataset is clinically relevant, the paper does not explore how MSWAL could be integrated into broader clinical workflows (e.g., multi-lesion report generation, diagnosis prioritization). This limits its immediate practical impact.
2. The paper acknowledges the long-tail distribution and mutual interference among multiple lesions (especially in kidneys), but does not propose or evaluate any specific strategies to mitigate these issues, such as re-weighting, data balancing, or specialized loss functions.
3. The transfer learning experiments are limited to LiTS and KiTS datasets, which are structurally similar CT datasets. There is no assessment of robustness to domain shifts (e.g., modality variation, extreme noise, atypical anatomy), which is critical in real-world deployment.
4. The Inception nnU-Net is effective, but its novelty is incremental. It primarily combines existing ideas (nnU-Net + simplified Inception) without introducing fundamentally new architectural principles compared to recent Transformer-based models.
5. Visual examples are provided, but the analysis lacks boundary error quantification or shape consistency metrics. Further evaluation could better support the model’s clinical reliability.
6. The paper states that bold and underline indicate the best and second-best results, respectively, but this is not consistently applied in Table 2. In addition, some values include a green upward arrow (e.g., ↑2.53), yet the meaning and comparison baseline of this notation are not explained, and it is only applied to selected results without clear criteria. This creates confusion and should be clarified.
7. While the results in the ablation study are reported with precision to two decimal places (e.g., 50.09%), no standard deviations or confidence intervals are provided, making it difficult to assess the robustness of the observed differences. In addition, no statistical significance tests (e.g., p-values) are reported to support the claims of performance improvements. Including these would strengthen the experimental rigor and reliability of the conclusions.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is primarily based on the high value of the proposed MSWAL dataset. The dataset contribution is the paper’s most significant strength—it addresses a major gap in existing public resources by providing the first large-scale, fully annotated 3D multi-class abdominal lesion segmentation dataset with diverse lesion types.

However, my score is moderated by several notable weaknesses. The proposed Inception nnU-Net is largely a combination of existing ideas with limited architectural innovation. The manuscript also lacks important robustness evaluations (e.g., under domain shift), does not address long-tail class imbalance, and omits essential statistical reporting such as standard deviations and significance tests. Furthermore, inconsistencies in the table annotations and unexplained notations (e.g., ↑ values) reduce clarity and presentation quality.

Overall, I see the paper as a weak reject: the dataset is valuable and the experiments are sound, but the paper would benefit from additional rigor, clearer analysis, and better presentation to reach the level of a confident accept.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal provided satisfactory clarifications to most of the raised concerns. The authors addressed the lack of boundary alignment metrics by reporting AHD scores, substantiating the model’s reliability. They also provided statistical significance testing (p-values < 0.05) for their ablation study results, strengthening the validity of their performance claims. The decision to focus on a CNN-based architecture was justified with comparative results showing significantly lower performance of Transformer-based alternatives on this task.

Although issues like long-tail class imbalance and domain shift were not fully explored experimentally, the authors acknowledged these limitations and clarified the intended scope of their work. Minor issues such as inconsistent result formatting and symbol explanations were noted for correction in the final version.

Given the introduction of a valuable dataset, a strong and well-justified baseline model, and a generally constructive rebuttal, I recommend acceptance.

Review #2

Please describe the contribution of the paper

The paper proposes a large dataset and a baseline method. The dataset is quite comprehensive and will have a great impact in the community.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A large dataset which have a great impact in the community.
2. The baseline method is with limited novelty and limited performance improvement, which is acceptable as the main contribution is the dataset.
3. The overall structure is clear with comprehensive experiments and high-quality figures.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The novelty of the baseline method is limited.
2. The performance improvement is limited.
3. The performance of nn-unet is quite low. Is the experiment not well implemented?
4. Transfer learning is mentioned and discussed in the experiment, which is not well motivated in the introduction.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

NA
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. A large dataset which have a great impact in the community.
2. The baseline method is with limited novelty and limited performance improvement, which is acceptable as the main contribution is the dataset.
3. The performance improvement is limited.
4. The performance of nn-unet is quite low. Is the experiment not well implemented?
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
1. Introduces a new large-scale 3D CT scan dataset for multi-class segmentation of whole abdominal lesions, with complete annotations for 7 lesion types (e.g., tumors, cysts, stones).
2. Proposes a new segmentation model that combines nnU-Net with Inception modules—Mini Inception and Inception Downsampling—to improve multi-scale feature extraction and segmentation accuracy.
3. Establishes MSWAL as a new benchmark for evaluating multi-lesion segmentation performance in the abdomen.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper presents MSWAL, a large-scale 3D CT dataset with full annotations for seven common abdominal lesion types (e.g., tumors, cysts, stones) across multiple organs. Unlike previous datasets limited to single-organ annotations or pan-cancer labels, MSWAL offers fine-grained, lesion-specific annotations, making it highly valuable for both clinical and research applications.
2. If publicly released, MSWAL could become a standard benchmark for abdominal lesion segmentation, supporting diverse, realistic training and evaluation.
3. The authors propose Inception nnU-Net, an extension of nnU-Net architecture that integrates Mini Inception and Inception Downsampling modules to improve multi-scale feature extraction and segmentation accuracy.
4. The paper provides a comprehensive experimental evaluation, including comparisons with six state-of-the-art segmentation models, and detailed ablation studies.
5. Transfer learning experiments on public datasets (LiTS and KiTS) further validate the robustness and generalization of models trained on MSWAL, highlighting its potential as a strong pretraining source for abdominal lesion segmentation tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While the paper makes valuable contributions, several areas require clarification and refinement to improve readability and reproducibility:
1. The paper evaluates the proposed Inception nnU-Net against several SOTA methods, but it would strengthen the work to also include comparisons with a standard Inception-based model and the original U-Net architecture. This would help isolate the performance gains attributed to the proposed modifications.
2. The ablation study could be expanded by including additional variants, such as: (a) Removing the left branch of the Mini Inception module. (b) Removing the residual connection in the Inception Downsampling module. These additional experiments would provide deeper insight into the contribution of each architectural component.
3. The bottleneck block in Figure 2 is mentioned but not explained. The authors should clarify how it differs from the encoder blocks, why it is specifically included, and what impact it has on the model’s performance. Additionally, the architecture of the encoder and decoder blocks should be briefly described to give readers a clearer understanding of the overall design.
4. The values or ranges of the hyperparameters M and N shown in Figure 2 should be clearly specified in the experimental details to ensure reproducibility and clarity.
5. On page 3, the authors use terms like CT scans, volumes, and instances interchangeably. This can cause confusion. The terminology should be revised and unified throughout the paper for better readability and consistency.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is a well-written paper, and this work provides a good dataset for the medical research community, backed by thorough experimentation.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

It is a well-written paper, and this work provides a good dataset for the medical research community, backed by thorough experimentation.

Author Feedback

We thank the reviewers for their valuable feedback. Our primary contribution is MSWAL, the first 3D multi-class abdominal segmentation dataset that ensures no missing labels. Additionally, we introduce Inception nnU-Net. We address the six main concerns raised by the reviewers below.

1) To R1(Q1) & R3(Q4): Regarding Inception nnU-Net’s novelty, we initially test Transformer-based models. However, in our experiments, Transformer-based models (e.g., nnFormer, Swin UNETR) demonstrate inferior performance, with DSC scores over 9% lower than CNN-based models (Table 2). This suggests that focusing on global features may compromise local detail representation. Based on this, we prioritize the CNN-based Inception nnU-Net, which utilizes multi-scale kernels to capture local details at different levels. This design enhances feature extraction across multiple scales, enabling more accurate distinction between the seven lesion types. 2) To R3(Q1): Thank you for raising the real clinical workflows. While report generation and diagnosis prioritization are critical components of computer-aided diagnosis, they fundamentally rely on understanding and segmenting medical images. To address this fundamental need, our work focuses on the first step in the diagnostic process: abdominal lesion segmentation. Building on MSWAL, we plan to study report generation and diagnosis prioritization in the future. 3) To R3(Q2): Regarding long-tailed distributions and lesion interference, we do not discuss the long-tail problem in experimental details. However, we apply re-weighting to adjust the loss previously, which will appear in the camera-ready version. For lesion interaction, we aim to analyze it and offer a new perspective for future research. 4) To R1(Q4) & R3(Q3): Regarding domain shifts, we would like to clarify the purpose of the transfer learning experiments and the focus of our research. The transfer learning experiments on LiTS and KiTS demonstrate the strong generalization capability of models pretrained on MSWAL for abdominal CT datasets (R2, Strength 5). Domain shifts, though important, are not the primary focus of our work. Therefore, we do not conduct domain shift experiments. We will revise the final version to better highlight the purpose and value of the transfer learning experiments in the Introduction. 5) To R3(Q5): Regarding boundary alignment metrics, we further validate it using the AHD metric (mm) in our comparative experiments: Inception nnU-Net: 8.51; nnU-Netv1: 13.44; nnU-Netv2: 12.26; nnU-Net Res: 8.95; MedNext: 9.23; nnFormer: 10.26; Swin UNETR: 15.17, demonstrating the effectiveness of Inception nnU-Net. 6) To R3(Q7): Regarding statistical significance tests in ablation experiments, we validate the experiment and find that the p-values, representing the probability of observing results as extreme as ours under the null hypothesis, are below 0.05 (max 0.04) for all four Inception nnU-Net variants. This confirms their statistical significance compared to the original model.

We would also like to address the other four comments.

1) To R1(Q2): We thank R1 for raising the issue of the improvement of Inception nnU-Net. However, for some categories, like liver tumor, the DSC value improvement is 2.53%, demonstrating the effectiveness of Inception nnU-Net. 2) To R1(Q3): Regarding the implementation of nnU-Net, we indicate that we follow the official nnU-Net implementation, with nnU-Netv1 and nnU-Netv2 independently reproduced by two researchers, whose results are similar (Table 2). 3) To R2(Q3): Regarding the description of the bottleneck, encoder, and decoder, we will provide a more detailed explanation of them to clarify the structure and eliminate typos in the camera-ready version. 4) To R1 & R2 & R3: Regarding typos, we will address them and clarify expressions in the revision, like the settings of N and M in the model, as well as terms like “CT scans”, “volumes”, and “instances”.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper makes a significant contribution through the introduction of MSWAL, a large-scale 3D CT dataset with comprehensive annotations for seven common abdominal lesion types (e.g., tumors, cysts, stones) across multiple organs. Unlike prior datasets that are either limited to single-organ segmentation or lack fine-grained lesion labels, MSWAL provides detailed, lesion-specific annotations that fill a critical gap in existing public resources. If released publicly, MSWAL has strong potential to become a widely adopted benchmark for abdominal lesion segmentation, benefiting both clinical research and algorithm development.

In addition to the dataset, the authors propose Inception nnU-Net, an enhanced variant of the nnU-Net architecture that integrates Mini Inception and Inception Downsampling modules. This design aims to improve multi-scale feature extraction and overall segmentation performance. The method is supported by extensive experiments, including comparisons with six state-of-the-art models and thorough ablation studies, underscoring its robustness and effectiveness.

While the dataset is the primary strength of this work, the combination of a valuable new resource and a well-validated methodological contribution makes this paper a strong candidate for acceptance. I recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset

Author(s):