Abstract

Accurate tumor segmentation is crucial for cancer diagnosis and treatment. While foundation models have advanced general-purpose segmentation, existing methods still struggle with: (1) limited incorporation of medical priors, (2) imbalance between generic and tumor-specific features, and (3) high computational costs for clinical adaptation. To address these challenges, we propose MAST-Pro M}ixture-of-experts for Adaptive Segmentation of pan-Tumors with knowledge-driven Prompts), a novel framework that integrates dynamic Mixture-of-Experts (D-MoE) and knowledge-driven prompts for pan-tumor segmentation. Specifically, text and anatomical prompts provide domain-specific priors, guiding tumor representation learning, while D-MoE dynamically selects experts to balance generic and tumor-specific feature learning, improving segmentation accuracy across diverse tumor types. To enhance efficiency, we employ Parameter-Efficient Fine-Tuning (PEFT), optimizing MAST-Pro with significantly reduced computational overhead. Experiments on multi-anatomical tumor datasets demonstrate that MAST-Pro outperforms state-of-the-art approaches, achieving up to a 5.20\% improvement in average DSC while reducing trainable parameters by 91.04\%, without compromising accuracy.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0362_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MenRun_MASTPro_MICCAI2025,
        author = { Meng, Runqi and Song, Sifan and Jin, Pengfei and Teng, Lin and Wang, Yulin and Sun, Yiqun and Chen, Ling and Oh, Yujin and Li, Xiang and Li, Quanzheng and Guo, Ning and Shen, Dinggang},
        title = { { MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {312 -- 322}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This manuscript proposes a kind of pan-tumor segmentation method, which introduces the dynamic MoE module in the SwinUNETR blocks, where routers dynamically select experts for generic and tumor-specific feature learning. Text and anatomical prompts are also incorporated to refine the image feature generated by the SwinUNETR. Experiments on sevel datasets of different anatomy demonstrate that the proposed method achieved the best results (except on MSD-Hepatic Vessel Tumor and KiTS).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this manuscript lies in the novel framework that integrates dynamic MoE and text and anatomical prompts for pan-tumor segmentation.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Some details of the methods and experimental setup are not presented clearly as indicated in the comments below.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Unclear whether the model is fine-tuned separately on each dataset’s training set or trained jointly. Pretraining scope (training set only or full dataset?) is also ambiguous.
    • What’s the meaning of internal and external datasets in the title of Table 1?
    • Are the compared methods also pre-trained on the same datasets listed?
    • Shape and role of θ^t (initial mask) in guiding the final output are unexplained, even do not refer to [9], which has a similar structure.
    • Router selection of experts (R^g vs. R^t) is unclear in text, equations, and Figure 1. How the generalized router R^g works is not presented in equations and the figure. The relationship between R^g and R^t is confusing in Figure 1.
    • In Figure 2, it is best to select an intensity window where the tumor is clear to display the images.
    • The numbers k1 and k2 used in the model are missing.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The fact that some details in the manuscript were not presented clearly was the main factor affecting my decision.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    This paper can be accepted if the issues above are addressed in the final version.



Review #2

  • Please describe the contribution of the paper

    The paper presents MAST-Pro, a novel framework for pan-tumor segmentation that addresses several key challenges in medical image analysis. The main contribution is the integration of Dynamic Mixture-of-Experts (D-MoE) with knowledge-driven prompts (both text and anatomical) to achieve accurate and efficient tumor segmentation across diverse anatomical regions. The framework uses Parameter-Efficient Fine-Tuning to significantly reduce computational costs while maintaining high accuracy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper demonstrates several notable strengths. First, it presents a novel architectural approach by combining D-MoE with knowledge-driven prompts, allowing for better feature learning across different tumor types. Second, the framework shows impressive computational efficiency, reducing trainable parameters by 91.04% while improving accuracy by 5.20% compared to state-of-the-art methods. Third, the extensive experimental validation across eight different datasets demonstrates robust generalization capability. Fourth, the use of both text and anatomical prompts provides a more comprehensive approach to incorporating domain knowledge. Finally, the qualitative results show superior performance in handling challenging cases like small tumors and complex boundaries.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper has several limitations worth noting. First, while the framework shows impressive results, there’s limited discussion of its performance on rare tumor types or edge cases. Second, the dependence on pre-trained models (like TotalSegmentator for anatomical prompts) could limit the framework’s applicability in scenarios where such models aren’t available or reliable. Third, the evaluation metrics focus primarily on DSC scores, and additional metrics could provide a more comprehensive assessment of the model’s performance. Fourth, the paper could benefit from more detailed ablation studies exploring different configurations of the D-MoE architecture. Finally, while computational efficiency is demonstrated, there’s limited discussion of inference time requirements in clinical settings.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper deserves a weak accept primarily because it presents a novel and promising approach to pan-tumor segmentation through the integration of Dynamic Mixture-of-Experts with knowledge-driven prompts, showing significant improvements in both accuracy (5.20% DSC increase) and computational efficiency (91.04% parameter reduction). While the technical innovation and experimental results are impressive, there are concerns about the limited discussion of performance on rare tumor types and edge cases, as well as the dependence on pre-trained models.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have adequately addressed my concerns by clarifying the data split protocol to prevent leakage, explaining the architectural novelty of their hierarchical routing strategy compared to existing MoE approaches, and committing to include additional ablation studies comparing with MoE baselines in the revised manuscript. While acknowledging the inference overhead, they provide a reasonable justification given the significant performance gains.



Review #3

  • Please describe the contribution of the paper

    The research presents a unified, pan-tumor segmentation system combining two concepts: A Dynamic Mixture-of-Experts (D-MoE) version of Swin-UNETR whose task-dependent routers dynamically select between generic low-rank experts and tumor-specific experts, therefore balancing cross-organ generalization and fine-grained lesion detail, and (2) knowledge-driven prompts—rich, pathology-specific text templates generated by an LLM and anatomical masks derived from Total-Segmentator injecting clinically relevant priors directly into the vision model. MAST-Pro reduces trainable parameters by about 91% and memory use to under 9 GB while increasing mean DSC by up to 5.2% on eight heterogeneous public datasets by coupling with a parameter-efficient fine-tuning approach that updates only a small subset of experts.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of the paper are as follows:

    • Knowledge-driven dual prompts—LLM-generated text templates and organ-mask anatomy prompts infuse semantic and spatial priors that promote cross-organ generalization.
    • Dynamic Mixture-of-Experts (D-MoE) addresses tumor heterogeneity and class imbalance by having task routers at inference select between generic and tumor-specific experts.
    • Updates just selected experts/adapters, reducing trainable weights from about 244 M to 21 M and maintaining memory under 9 GB, hence enabling clinical deployment on a single GPU.
    • Pretrained on 11 public datasets and tested on 8 tumor cohorts; improves +5.2 Dice over the best baseline and scores first on 6/8 tasks.
    • Removing prompts or D-MoE lowers mean Dice by as much as 7 points, hence verifying the influence of each module.
    • Visual outcomes indicate tighter boundaries and less false positives for tiny or irregular lesions, which is essential for surgical planning.
    • Built on open-source SwinUNETR and public data, reproducible pipeline simplifies adoption and further studies.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The technical novelty is limited: The “knowledge‑driven” dual‑prompt scheme (text + organ masks) replicates CAT’s anatomical/text prompt coordination, and pan‑tumor prompting was already proposed in early studies as well as the dynamic Mixture‑of‑Experts layer follows existing Switch‑Transformer and V‑MoE designs without clarifying any new routing logic.
    • The backbone is pretrained on LiTS, KiTS, and MSD then re‑evaluated on the same datasets, so patient‑level data leakage is possible and unaddressed; no ablation against MoE baselines (e.g., Switch‑ViT, V‑MoE) or reporting of rare‑class DSC means it is unclear whether gains stem from prompts or sparsity; efficiency claims quote fewer trainable parameters but omit inference FLOPs or latency, even though MoE routing can raise runtime cost, an issue highlighted in prior sparse‑expert work.
    • Experiments use only CT data while other works (MA‑SAM) already show cross‑modality generalisation; the text template presumes organ, modality, and tumour type are known a priori, which is rarely the case in workflow; there is no disclosure of per‑dataset splits or significance tests.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper delivers a well‑integrated framework whose practical impact, scale of validation, and efficiency gains surpass the noted limitations: the dual knowledge‑driven prompts and D‑MoE, while conceptually building on prior ideas, are combined in a way that demonstrably tackles pan‑tumor heterogeneity and class imbalance, yielding a consistent ~5% Dice uplift and top performance on 6/8 benchmarks with about 10× fewer trainable parameters and <9 GB memory. The qualitative results show clinically meaningful boundary fidelity, so the contribution, though evolutionary, is sufficiently validated and valuable to the community.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the AC and reviewers for their constructive feedback and appreciation of our work’s novelty (R1, R3), practical value (R2, R3), and clarity (R2, R3). Below, we address the reviewers’ comments point by point.

1) R1-W1/W3, R2-W2.1/W3 (Experimental Analysis): Our proposed pan-tumor segmentation framework is applied across diverse datasets without cohort-specific fine-tuning, whereas MA-SAM requires modality-specific tuning for MRI. Pretraining used all cases from BTCV, CT-ORG, Pancreas-CT, CHAOS, 3D-IRCADb, WORD, and AMOS, along with the 80% training splits of MSD, LiTS, KiTS, and AbdomenCT-1K. The remaining 20% were strictly reserved for evaluation. All baselines followed their original training protocols for fair comparison. Extending our framework to support multi-modality inputs is a key future direction, as suggested.

2) R1-W2/W6: The terms “internal” and “external” were mistakenly left in the caption during editing and will be removed. Fig. 2 will be updated with a proper windowing range to better highlight tumor regions.

3) R1-W4/W5/W7: θ^t includes weights (W∈ℝ^{T × c × d × h × w}) and biases (b∈ℝ^{T}), with T as tumor types and c as latent channels. As noted on page 5, it allows the model to produce category-specific proposals for each tumor type, which guide the final segmentation output. While our design differs from [9] by using anatomical and textual prompts, we will cite [9] and clarify this. The general router R^g captures common features by selecting experts only from the generic pool E^g and is used for datasets with mixed tumor types (e.g., AbdomenCT-1K). In contrast, the task-specific router R^t is used for tumor-specific datasets, and it selects top-k experts from both E^t and E^g, enabling a flexible combination of specific and general knowledge. In our implementation, we set k1=k2=4 for R^t and R^g. We will clarify these details and update Fig. 1 accordingly.

4) R2-W1: While both our method and CAT use dual prompts as a high-level design, theirs serve as visual exemplars, whereas ours encode anatomical prior embeddings for structured context modeling. Moreover, our designed MoE modules introduce a hierarchical routing strategy (R^g and R^t) to balance common and tumor-specific knowledge, unlike single-router designs in Switch-VIT and V-MoE.

5) R2-W2.2, R3-W1 (Discussion on Rare Tumor Types): M-Lu (lung tumor from MSD) is one of the smallest cohorts in terms of sample size, which serves as a representative “rare tumor” case. Table 1 shows our method achieves 72.10% Dice, outperforming prior SOTA (+19.3 Med-SAM3D, +20.6 MA-SAM, +5.2 ZePT), demonstrating robustness in data-scarce settings.

6) R2-W3.2, R3-W2 (Pretrained Tools & Prior Knowledge Dependency): We acknowledge that relying on external segmentors (e.g. TotalSegmentator) may limit applicability in certain regions (e.g., brain or head & neck), although we believe this type of tools will be more and more widely available. In this work, we currently focus on abdominal tumors, where such tools are reliable and widely used. As for prior knowledge (e.g., organ, modality, tumor type), it is often accessible through DICOM metadata or clinical context. Our framework offers a practical and generalizable approach for integrating anatomical guidance into pan-tumor segmentation.

7) R2-W2, R3-W3/W4/W5 (Results Analysis): Currently, we included the core ablations (e.g., with/without prompts or D-MoE). More thorough analysis of D-MoE settings (e.g., R^t vs. R^g, number of experts) and MoE baselines is ongoing and we commit to add them in the revised draft. While training cost is significantly reduced, inference efficiency remains unimproved as all modules stay active. We believe this overhead is acceptable given the performance gains and growing data scale. To provide a more comprehensive evaluation, we will add HD95 and AUROC in the supplementary material, where our method outperforms baselines.

We will include all the information in the final paper.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After reviewing the paper, the associated reviews, and the authors’ rebuttal, there remain several concerns that are unclear to this Area Chair:

    1. Limited Novelty (as noted by R1): The proposed knowledge-driven dual-prompt approach (text + organ masks) appears to closely follow the ideas introduced in the CAT paper.

    2. Potential Data Leakage: There are concerns about pretraining and evaluation overlap. Specifically, datasets such as KiTS and LiTS are used both in pretraining and in the reported evaluation, raising the possibility of patient-level data leakage that is not explicitly addressed.

    3. While the authors promised to include additional ablation studies, introducing new results in the rebuttal or updated submission is not permitted under the review policy.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have done a good job in rebuttal. All reviewers agree to accept this paper.



back to top