Abstract

Lymphoma subtype classification has a direct impact on treatment and outcomes, necessitating models that are both accurate and explainable. This study proposes a novel explainable Multi-Instance Learning (MIL) framework that identifies subtype-specific Regions of Interest (ROIs) from Whole Slide Images (WSIs) while integrating features of cell distribution and image. Our framework simultaneously addresses three objectives: (1) indicating appropriate ROIs for each subtype, (2) explaining the frequency and spatial distribution of characteristic cell types, and (3) reaching accurate subtyping using both cell distribution and image modalities. Our method fuses cell graph and image features extracted for each patch in a WSI by a Mixture-of-Experts-based approach and classifies subtypes within an MIL framework. Experiments on a dataset of 1,233 WSIs demonstrate that our approach achieves state-of-the-art accuracy compared to ten other methods and provides region- and cell-level explanations that align with a pathologist’s perspective.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1260_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/mdl-lab/Explainable-Malignant-Lymphoma-Classifier

Link to the Dataset(s)

N/A

BibTex

@InProceedings{NisDai_Explainable_MICCAI2025,
        author = { Nishiyama, Daiki and Miyoshi, Hiroaki and Hashimoto, Noriaki and Ohshima, Koichi and Hontani, Hidekata and Takeuchi, Ichiro and Sakuma, Jun},
        title = { { Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {321 -- 331}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an explainable multimodal framework, termed WEG-MoE (Weak-Expert-based Gating Mixture-of-Experts), for the classification of malignant lymphoma subtypes from Whole Slide Images (WSIs). The core contribution lies in integrating patch-level image features with cell-level spatial distribution features derived from cell graphs within a Multi-Instance Learning (MIL) paradigm. The framework aims to achieve both high classification accuracy and enhanced explainability by: 1) identifying subtype-specific Regions of Interest (ROIs) using class-wise attention scores, and 2) providing cell-level explanations regarding the frequency and spatial arrangement of characteristic cell types within these identified ROIs, leveraging the constructed labeled cell graphs. The WEG-MoE mechanism is specifically designed to fuse these two modalities while attempting to preserve explainability derived from both sources.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clinical Relevance and Problem Formulation: The paper addresses a clinically significant problem – the accurate subtyping of malignant lymphoma – where explainability is crucial for potential clinical adoption. The authors correctly identify the limitations of existing methods and formulate the problem in a way that reflects pathologists’ diagnostic workflow, considering both tissue architecture (via images) and cellular composition/distribution (via cell graphs).
    2. Focus on Multi-Level Explainability: A key strength is the explicit goal and demonstration of providing explanations at multiple levels. Beyond standard attention maps for ROI localization (region-level), the framework leverages cell graphs to offer insights into the underlying cellular characteristics (cell frequency, adjacency) within important regions. This attempt to provide richer, more clinically relevant explanations is commendable and goes beyond many standard WSI classification approaches.
    3. Conceptually Sound Framework Design: The core idea of fusing image features and cell graph features using a mechanism like MoE within an MIL framework is logical for this task. It directly attempts to capture the multi-faceted information pathologists use. The design choice to generate class-specific attention (via AdditiveMIL) is appropriate for distinguishing regions relevant to different subtypes.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Lack of Reproducibility due to Private Dataset: The most significant weakness is the reliance on a private dataset (1,233 WSIs). While the authors state the code is available, the inability for the research community to access the data prevents independent verification of the results, comparison against the reported benchmarks, and building upon this work directly. This severely limits the paper’s impact and contribution, as reproducibility is a cornerstone of scientific progress.
    2. Marginal Performance Improvement: While the proposed WEG-MoE achieves the highest reported accuracy and mean AUC, the improvement over several strong multimodal baselines (e.g., simple concatenation [23], standard MoE [15]) is marginal (e.g., accuracy improvement from 0.907 to 0.911, mean AUC from 0.975 or 0.972 to 0.977). Given the added complexity of constructing labeled cell graphs, training a cell classifier, and utilizing GNNs, the practical significance of this slight performance gain is questionable and not adequately discussed.
    3. Unaddressed Annotation Cost vs. Benefit Trade-off: The method requires significant annotation effort beyond WSI-level labels. Specifically, it necessitates cell-level annotations (as suggested by the description of creating training data via pathologist labeling on a t-SNE map, Fig 2A) to train the cell-type classifier, which is a prerequisite for generating the labeled cell graphs. This represents a substantial, expert-intensive cost. The paper completely fails to discuss or analyze the trade-off between this high annotation burden and the marginal performance gains/enhanced explainability achieved. It is unclear if the benefits justify the costs compared to methods relying solely on image features or WSI-level labels. This omission hinders the assessment of the method’s practical feasibility and scalability.
    4. Limited Discussion on Generalizability: The experiments are conducted on data from a single institution and scanner. While this is common, the lack of external validation or discussion on potential domain shift issues (e.g., due to staining variations, different scanners) makes it difficult to gauge how well the method might generalize to other clinical settings.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is Weak Reject.

    The major factors leading to this recommendation are the critical weaknesses that significantly limit the paper’s contribution and potential impact, despite its interesting ideas:

    1. Non-Reproducibility: The use of a private dataset is a fundamental flaw that prevents verification and hinders the community from building upon or comparing against this work reliably. This is the primary factor limiting the paper’s contribution.
    2. Questionable Cost-Benefit: The method introduces considerable complexity and requires expensive cell-level annotations. However, the demonstrated performance improvement over simpler fusion methods is marginal, and the paper provides no discussion on whether the enhanced explainability justifies this significant cost. This lack of analysis makes it difficult to evaluate the practical value proposition of the proposed framework.
    3. Limited Performance Gains: The small margin of improvement over existing techniques raises questions about the practical significance of the proposed WEG-MoE method, especially when weighed against its complexity.

    While the paper addresses an important clinical problem with a conceptually appealing framework focused on multi-level explainability (strengths), the severe limitations regarding reproducibility, the unaddressed cost-benefit trade-off of the required annotations, and the marginal performance gains prevent a positive recommendation. The work presents interesting concepts, but in its current form, its contribution to the field is significantly hampered by these issues.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    My concern about reproducing their results, model generalization and method novelty are not addressed after the rebuttal.



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel explainable classifier for malignant lymphoma using both cell graph and image features. The authors employ AdditiveMIL and TransMIL to model per-instance attention and interactions among patches within WSIs. These features are then fused using WEG-MoE (Weak-Expert-Based Gating Mixture of Experts), which adaptively integrates graph-based and image-based information by weighing their contributions. This enables subtype-specific ROI identification, cell-type distribution explanation, and high-accuracy classification across three subtypes: DLBCL, FL, and Reactive, evaluated using five-fold cross-validation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a novel WEG-MoE architecture that balances contributions from image and cell graph modalities, particularly addressing the tendency of naive multimodal models to over-rely on image features. – Strong emphasis on explainability: the use of AdditiveMIL allows class-specific attention per patch, while cell-type frequency and adjacency provide pathologist-aligned insights. – High classification performance with five-fold cross-validation, outperforming multiple strong baselines. –The work is claimed to be the first application of cell graphs for malignant lymphoma subtyping.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While WEG-MoE is thoughtfully designed, the overall methodological novelty is incremental. The approach mainly builds on existing MIL frameworks (AdditiveMIL, TransMIL) in a well-engineered way, rather than introducing fundamentally new algorithms.
    2. The authors mention that WEG-MoE encourages better learning from graph features by setting image-based predictions to zero (p⁽ˣ⁾ = 0) during training. However, this raises the question: why not train two separate unimodal models and fuse their outputs at inference? A more thorough justification for the gating design choice is needed.
    3. The paper lacks sufficient explanation about the construction of cell graphs. How are they created? Is there any sampling strategy to manage memory/computation given the potentially huge number of cells in WSIs? If not, how is computational efficiency handled?
    4. The authors emphasize explainability as a core motivation, yet their discussion is limited to image-level attention and cell frequency distributions. There are existing graph explainability methods (e.g., GNNExplainer [1], PGExplainer). It would be valuable to understand why such methods were not applied to interpret the cell graph representations.
    5. It is unclear why a tissue-level graph branch was not considered. Prior work such as HACT-Net highlights the complementary value of hierarchical modeling using both cell- and tissue-level graphs. Authors should clarify whether this was considered or if there were limitations preventing its inclusion.
    6. There is no external validation, and all results are based on internal cross-validation. This limits our understanding of the model’s generalizability across institutions or scanners.
    7. Competing methods also show comparable AUC and accuracy, and the performance margins are relatively small. The paper does not clearly explain which module(s) in their pipeline contribute most to this performance gain.

    [1]Jaume, Guillaume, et al. “Quantifying explainers of graph neural networks in computational pathology.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodological novelty is modest, several design choices are under-justified, and key aspects like cell graph construction and, explanation of cell graphs and external validation are insufficiently addressed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors mostly discussed and clarified my objection and have a clear understanding of the subject matter.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel explainable Multi-Instance Learning (MIL) framework for malignant lymphoma subtyping using Whole Slide Images (WSIs). The framework integrates cell distribution characteristics and image information to identify subtype-specific Regions of Interest (ROIs). The code is released.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper addresses an important question: how to achieve both high accuracy and sufficient explainability in malignant lymphoma subtype classification from Whole Slide Images (WSIs), which is critical for treatment strategies and patient outcomes.  

    The performance gain is remarkable (1% improvement beyond baseline accuarcy > 90%) . The design of WEG-MoE applies a weak-expert gating strategy, training the gating function on graph features to better utilize both modalities. It inspires further research into weakly-supervised gating mechanisms for multimodal fusion.  

    The method provides both region-level and cell-level explanations that align with pathologists’ perspectives, enhancing clinical trust and interpretability.

    They released their code.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While the overall mmotivation is novel, some individual components are not. For instance, the GIN as a graph feature extractor, are adopted MOE [15] from existing work. The novelty is in how these components are combined and adapted.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Code released. Important question and clear demonstration.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their constructive feedback and address their concerns below.

(1) Reliance on a single and private dataset (R2, R3) Reproducibility issue: Unfortunately, ethical constraints prevent the public release of our dataset. There are currently three public lymphoma datasets: TCIA’s DLBCL-MORPHOLOGY and TCGA-DLBC (containing only DLBCL subtype WSIs), and a Kaggle dataset (with multiple subtypes but only cropped regions, not WSIs). Explainable lymphoma diagnosis requires simultaneously addressing two critical tasks: (1) accurately identifying subtype-specific lesions within WSIs, and (2) precisely determining the spatial distribution of characteristic cells exclusively present in these lesions. Thus, since the TCIA and TCGA datasets lack multi-subtype differentiation, and the Kaggle dataset lacks full WSIs, we cannot evaluate the effectiveness of our approach on these datasets. While we agree on the importance of public dataset validation, we ask the reviewers to consider that our method specifically emulates pathologists’ evidence-based diagnostic reasoning, making thorough evaluation with existing public datasets particularly challenging. Finally, recognizing the importance of research community advancement, we commit, upon acceptance, to publicly releasing our trained model weights, enabling verification by researchers with appropriate lymphoma WSIs.

Generalizability issue: Regarding the single-dataset evaluation concern, our data underwent uniform staining and scanning, minimizing variations that would necessitate additional domain shift assessments. As mentioned above, publicly available datasets cannot be used for external validation. Meanwhile, methods like Hashimoto et al. (CVPR2020) can enhance our image-domain generalization in homogeneous datasets to address generalization. Also, our graph-domain representations use discretized information that is inherently less susceptible to acquisition variations.

(2) Cell graph construction and annotation by experts (R2, R3) Expert annotation is only required to create cell classifier training data. We used HoVerNet’s public implementation for cell segmentation without expert input. The annotation burden was minimized by t-SNE visualization, grouping similar cells together and allowing quick boundary definition, taking only minutes. No other expert annotations were needed. In our construction of cell graphs, all cells were used as nodes without sampling. The processing time, approximately 1 hour for 10 WSIs from WSI scanning to prediction output, is clinically acceptable, as our institution typically receives about 10 requests daily. This allows overnight processing without disrupting clinical workflow.

(3) Methodological novelty and limited performance improvement (R1, R2, R3) While individual components have parallels in existing research, our novelty lies in our innovative multimodal fusion approach. WEG-MoE uniquely employs gating networks trained on a weak expert, showing that high performance and explainability can coexist. As R1 noted, modest performance gains become challenging when accuracy exceeds 90%. Even marginal gains are significant in clinical diagnosis. Our improvement across metrics represents a meaningful advancement for clinical applications. To clarify R3’s misunderstanding, as stated on page 6 after Eq.7, we train separate unimodal models independently before fusing predictions during inference, aligning with R3’s suggestion.

(4) Design Choice Justifications (R3) Post-hoc GNN explainability methods provide qualitative instance-specific explanations rather than the quantitative population-level statistics needed clinically. Pathologists require cell frequencies and distributions as evidence for diagnostic decisions. Tissue-level graphs have limited utility for lymphoma due to poor tissue contrast compared to breast or colon tissue, evidenced by HACT-Net’s lower performance in Table 1. Our method addresses these limitations.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper presents WEG-MoE, a mixture-of-experts framework that combines image patch features and cell-level spatial information to classify lymphoma subtypes from whole slide images. It not only improves classification accuracy but also highlights important regions and explains cell-type patterns contributing to each prediction. All reviewers appreciate the methodological contribution, although they find the novelty to be incremental (R2 & R3), or consider it a novel combination of existing individual components (R1 also sees this as a weakness). The most significant weakness, as pointed out by R2 and R3, is the reliance on a single and private dataset, which makes it difficult to assess the method’s generalizability and reproducibility. Other main issues include unclear cell graph construction—specifically, whether it requires laborious expert annotation (R2, R3), marginal improvement (R2), and under-justified design choices (R3). The authors need to address these issues in their rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work presents a novel and well-justified WEG-MoE architecture that significantly improves classification accuracy while providing interpretable, clinically aligned explanations at both region and cell levels. The method addresses a critical need in computational pathology and introduces promising directions for future research in multimodal fusion and explainability.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top