Abstract

Whole slide images (WSIs) are vital in digital pathology, enabling gigapixel tissue analysis across various pathological tasks. While recent advancements in multi-modal large language models (MLLMs) allow multi-task WSI analysis through natural language, they often underperform compared to task-specific models. Collaborative multi-agent systems have emerged as a promising solution to balance versatility and accuracy in healthcare, yet their potential remains underexplored in pathology-specific domains. To address these issues, we propose WSI-Agents, a novel collaborative multi-agent system for multi-modal WSI analysis. WSI-Agents integrates specialized functional agents with robust task allocation and verification mechanisms to enhance both task-specific accuracy and multi-task versatility through three components: (1) a task allocation module assigning tasks to expert agents using a model zoo of patch and WSI level MLLMs, (2) verification mechanism ensuring accuracy through internal consistency checks and external validation using pathology knowledge bases and domain-specific models, and (3) a summary module synthesizing final summary with visual interpretation maps. Extensive experiments on multi-modal WSI benchmarks show \ourmethod’s superiority to current WSI MLLMs and medical agent frameworks across diverse tasks. Source code is available at https://github.com/XinhengLyu/WSI-Agents.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0994_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/XinhengLyu/WSI-Agents

Link to the Dataset(s)

WSI-Bench: https://github.com/XinhengLyu/WSI-LLaVA WSI-VQA: https://github.com/cpystan/WSI-VQA

BibTex

@InProceedings{LyuXin_WSIAgents_MICCAI2025,
        author = { Lyu, Xinheng and Liang, Yuci and Chen, Wenting and Ding, Meidan and Yang, Jiaqi and Huang, Guolin and Zhang, Daokun and He, Xiangjian and Shen, Linlin},
        title = { { WSI-Agents: A Collaborative Multi-Agent System for Multi-Modal Whole Slide Image Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {683 -- 693}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose WSI-Agents, a novel collaborative multi-agent system for multi-modal WSI analysis. WSI-Agents integrates specialized functional agents with task allocation and verification mechanisms to enhance both task-specific accuracy and multi-task versatility. Compared to previous MLLMs and general medical agents, WSI-Agents achieve superior performance. The authors validate their method through both quantitative and qualitative comparisons on the WSI-Bench and WSI-VQA datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clarity: The paper is well-organized, straightforward, and easy to follow.
    • Novelty: The proposed multi-agent system is capable of performing various tasks simultaneously, addressing the limitation that generative models typically perform poorly on classification tasks. Additionally, it provides a pipeline that leverages multiple advanced MLLMs concurrently.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Writing:
      • Some MLLMs in the model zoo, such as Quilt-LLaVA, can only process normal-sized images. How is this limitation addressed in the system? Does it generate descriptions for all patches in a WSI?
      • The authors mention constructing a knowledge database, but no details are provided in the experimental section.
      • The evaluation metric for Table 1 is not specified.
    • Illustration: In Fig. 2, the system architecture is not clearly presented. Although the blocks are shown separately, they actually work in coordination. It is recommended to improve the figure by adding arrows to illustrate how different agents interact.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • It is recommended to specify the hardware used for running the entire system, and to provide memory and time comparisons against previous methods. Given that in Tables 2 and 3 the results are similar to or worse than those of single MLLMs for some metrics, is it worthwhile to adopt the multi-agent system if its computational cost is significantly higher?
    • The paper states that the interpretation map is generated by integrating outputs from multiple patch-level WSI models. Why is this not implemented as a separate agent in the system? Is this step manually performed?
    • A more detailed description of each component in the system would strengthen the paper. For instance, the prompts used for each agent, an end-to-end example of processing a WSI with intermediate input/output, details of the knowledge base, and the model used for feature embedding should be included.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • It is not clear enough of the system architecture
    • The paper lacks of some detailed descriptions, for example how the MLLMs deal with WSIs, do they generate output for all patches or just for a few selected patches; what is the knowledge database be like, how did it constructed, etc.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The most significant contribution is that the authors developed a comprehensive multi-agent system for multi-modal WSI analysis. The article has enough novelty, and the design of the multi-agent system is sufficiently scientific and rigorous. I would like to see more such works in this field to explore a proper way to integrate and exploit the existing powerful models at a lower cost.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The manuscript has a clear significance in developing a multi-agent system for multi-modal WSI analysis by exploiting existing pretrained models and solving multiple clinical tasks at a lower computational cost.
    2. The proposed WSI-Agents outperform most of the existing multi-modal methods and multi-agent methods across multiple tasks.
    3. The authors designed comprehensive ablation studies to demonstrate the necessity of each agent or agent group.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. As the task agent makes the initial decision, I am curious which LLM or model is used as the task agent and whether the authors have a quality verification for the decision made by the task agent, as it is the basis of the following steps. Also, which LLM or model is used as the summarizing and reasoning agent?
    2. Is the MLLM Zoo the same for each type of expert agent? If so, why not design different MLLM Zoos for various expert agents? Each expert agent might need a specific capability for handling different tasks.
    3. More details of datasets could be added in the dataset section, e.g., the size of the datasets and the number of cases for each task.
    4. I would recommend that the authors consider adding subtitles to clarify the corresponding order of each sub-figure in the entire WSI-Agents workflow in Figure 2.
    5. More content related to future work and the potential limitations of the proposed WSI-Agents can be added in the conclusion section.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The proposed multi-agent system has a clear function design for each agent to collaboratively produce the accurate output of the query.
    2. The good experimental results clearly demonstrate the capability of the proposed multi-agent system.
    3. The manuscript designed comprehensive ablation studies to demonstrate the optimal design of the workflow.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the proposal of WSI-Agents, a novel collaborative multi-agent framework specifically designed for multi-modal Whole Slide Image (WSI) analysis. This system directly addresses the critical challenge of balancing task-specific accuracy and multi-task versatility in digital pathology, a limitation often observed in existing WSI MLLMs or general medical agents. The framework achieves this through a structured integration of three key components: a specialized Task Allocation Module to assign tasks to expert agents leveraging a model zoo, a robust Verification Mechanism (incorporating internal consistency checks, external knowledge validation using pathology knowledge bases, and consensus checks with domain-specific foundation models), and a Summarization Module to synthesize coherent, validated, and interpretable final outputs with visual interpretation maps. The paper demonstrates the system’s superiority over current methods on established benchmarks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper exhibits several major strengths:

    1. Novel and Comprehensive Framework Formulation: The core strength lies in the WSI-Agents framework itself. Applying a collaborative multi-agent system specifically tailored for the complexities of multi-modal WSI analysis is a novel approach in this domain. Unlike general medical agents, it incorporates pathology-specific knowledge and WSI processing capabilities deeply into its verification steps. The framework’s design, which systematically integrates task allocation, multi-faceted verification (internal consistency, external knowledge, and foundation model consensus), and summarization, is comprehensive and well-thought-out. This structured collaboration is particularly interesting as it offers a principled way to harness the strengths of specialized models (both patch and WSI-level MLLMs) and diverse knowledge sources, aiming to mitigate the weaknesses of relying on a single large model or insufficient domain expertise.

    2. Particularly Strong Evaluation: The paper provides compelling evidence of the framework’s effectiveness through extensive experiments on established and relevant benchmarks (WSI-Bench and WSI-VQA). WSI-Agents demonstrably outperforms existing state-of-the-art WSI MLLMs and relevant medical agent frameworks by significant margins across a diverse set of pathology tasks (morphology analysis, diagnosis, treatment planning, report generation, VQA). The inclusion of both quantitative metrics and qualitative examples (Table 4) effectively validates the design choices and showcases the practical advantage.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper has a few minor weaknesses, primarily concerning reproducibility and clarity:

    1. Lack of Specific Implementation Details: As previously mentioned, the paper lacks sufficient clarity regarding the specific models or methods underpinning certain agent components (Logic, Summarizing, Reasoning agents, and the full list for the Consensus agent zoo). This ambiguity could potentially hinder the exact reproducibility of the work. Clarifying the model choices or types would be beneficial.

    2. Absence of Open-Source Code and Demo (at time of review): Although the abstract and Section 3 state that the source code “is to be released”, its unavailability at the time of review is a weakness. Providing access to the codebase and potentially a demo would significantly facilitate reproducibility, allow the community to verify the implementation details, and accelerate the adoption and extension of this promising framework. While the intention to release is noted, immediate availability strengthens a paper considerably.

    These points relate mainly to descriptive clarity and accessibility rather than fundamental flaws in the methodology or evaluation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is Accept.

    The major factors leading to this positive assessment are:

    1. Significance and Novelty of the Contribution: The paper introduces a novel and well-structured collaborative multi-agent framework (WSI-Agents) specifically designed for the challenging domain of multi-modal WSI analysis. It directly addresses the important accuracy-versatility trade-off, offering a conceptually sound and innovative approach.
    2. Strong Empirical Validation: The comprehensive experiments demonstrate substantial performance improvements over relevant state-of-the-art methods across multiple tasks and datasets. This strong empirical evidence effectively supports the claims and validates the effectiveness of the proposed framework.
    3. Relevance to the Field: The work addresses a critical need in digital pathology for more accurate, versatile, and interpretable AI tools, making a valuable contribution to the field.

    While there are minor weaknesses concerning the clarity of certain implementation details and the current unavailability of the source code, these do not fundamentally undermine the core ideas, the reported results, or the overall significance of the work. The paper’s strengths, particularly the novelty of the dedicated WSI multi-agent framework and its demonstrated effectiveness, significantly outweigh these limitations. The authors have stated their intention to release the code, which somewhat mitigates that concern. The paper presents a high-quality study with clear potential impact and is suitable for acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

N/A




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top