Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Automated pathology report generation from Whole Slide Images (WSIs) faces two key challenges: (1) lack of semantic content in visual features and (2) inherent information redundancy in WSIs. To address these issues, we propose a novel Historical Report Guided Bi-modal Concurrent Learning Framework for Pathology Report Generation (Bi-Gen) emulating pathologists’ diagnostic reasoning, consisting of: (1) A knowledge retrieval mechanism to provide rich semantic content, which retrieves WSI-relevant knowledge from pre-built medical knowledge bank by matching high-attention patches and (2) A bi-modal concurrent learning strategy instantiated via a learnable visual token and a learnable textual token to dynamically extract key visual features and retrieved knowledge, where weight-shared layers enable cross-modal alignment between visual features and knowledge features. Our multi-modal decoder integrates both modals for comprehensive diagnostic reports generation. Experiments on the PathText (BRCA) dataset demonstrate our framework’s superiority, achieving state-of-the-art performance with 7.4% relative improvement in NLP metrics and 19.1% enhancement in classification metrics for Her-2 prediction versus existing methods. Ablation studies validate the necessity of our proposed modules, highlighting our method’s ability to provide WSI-relevant rich semantic content and suppress information redundancy in WSIs. Code is publicly available at https://github.com/DeepMed-Lab-ECNU/BIGen.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2364_paper.pdf

SharedIt Link: https://rdcu.be/eHdSS

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04978-0_33

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/DeepMed-Lab-ECNU/BIGen

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaLin_Historical_MICCAI2025,
        author = { Zhang, Ling AND Yun, Boxiang AND Li, Qingli AND Wang, Yan},
        title = { { Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {343 -- 352}
}

Reviews

Review #1

Please describe the contribution of the paper

For the first time, the explicit knowledge retrieval mechanism is introduced into the pathological report generation task to enhance the semantic rich diagnostic report information, and an interpretable cross modal knowledge transfer paradigm is established.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. first introduce an explicit knowledge retrieval mechanism into the pathology report generation task to enhance semantically rich diagnostic report information, establishing an interpretable cross-modal knowledge transfer paradigm.
2. propose a bi-modal concurrent learning strategy to dynamically extract key features in WSIs and retrieved knowledge, effectively suppressing information redundancy while enhancing pathological semantic learning.
3. Experiments on the PathText (BRCA) dataset demonstrate that our method significantly outperforms existing benchmarks in report generation quality and Her-2 prediction metrics
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. There is no good solution to the problem of effective correspondence between WSI itself and semantic features.
2. The semantic features of patch cannot be directly equivalent to the semantic features of WSI. The relationship between patch and WSI should be considered.
3. The theoretical description of bi-modal concurrent learning strategy is not sufficient, and the novelty description of the proposed method is not enough
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This use of historical diagnostic report knowledge to guide the current WSI report generation, establishes an interpretable cross modal knowledge transfer paradigm, and helps to enhance the interpretability of report generation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

1.Interesting Topic: The topic of slide-level report generation is highly relevant and significant, despite the scarcity of existing research in this area. While there are related works such as MIGen or HistGen, the exploration of slide-level report generation remains underexplored and is certainly worth delving into. 2.Knowledge Retrieval Module: The paper proposes a knowledge retrieval module designed to fully exploit the semantic information embedded in existing reports. This approach can significantly enrich the diagnostic information within the generated reports, thereby enhancing their clinical utility and accuracy. 3.Bi-modal Concurrent Strategy: The introduction of a bi-modal concurrent strategy is particularly noteworthy. This strategy dynamically extracts key patches from Whole Slide Images (WSIs) and their corresponding semantic features from the knowledge bank. Given the substantial size of WSIs, this method effectively reduces redundancy in the feature processing stage, streamlining the overall workflow and improving computational efficiency.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The knowledge bank improves the semantic generation and the concurrent strategy largely reduces the redundancy. These two modules make the model more interpretable, which is very important in clinical.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

My primary concern lies in the experimental design. As far as we are aware, both MIGen and HistGen utilize ResNet as their visual feature extractor. In contrast, this work employs UNI, which is a significantly more advanced and specialized model tailored for the pathology domain. This discrepancy in the choice of feature extractor may introduce a bias in the comparison, potentially rendering the results somewhat inequitable. To address this issue and ensure a more balanced evaluation, I recommend conducting additional experiments where all models are equipped with the same visual feature extractor. This would allow for a more accurate and fair comparison of the methods, thereby providing a clearer assessment of their respective strengths and contributions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of the proposed modules.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper introduces a Historical Report Guided Bi-modal Concurrent Learning Framework (BiGen) for automated pathology report generation from Whole Slide Images (WSIs), addressing two key challenges: semantic poverty in visual features and information redundancy in WSIs.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The integration of a knowledge retrieval module with a bi-modal learning strategy is innovative. By leveraging historical reports via PLIP (a visual-language foundation model), the approach explicitly injects clinical semantics into visual features, addressing a critical gap in prior WSI-based report generation methods that rely solely on visual self-attention (e.g., HistGen, MI-Gen).
2. Comprehensive experiments validate the necessity of each component (knowledge retrieval, bi-modal tokens, weight sharing). Visualizations (e.g., attention heatmaps highlighting tumor tissues) and qualitative analysis of retrieved knowledge demonstrate interpretability, showing how the model aligns visual patches with relevant clinical terms.
3. The problem formulation directly addresses a high-impact clinical need: reducing pathologist workload by automating report generation for WSIs.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The knowledge bank is constructed from sentence-level reports in the training set, which may not capture the full complexity of clinical context.
2. The paper does not discuss how to handle out-of-distribution cases (e.g., rare pathologies not present in the knowledge bank), limiting generalizability.
3. Experiments are exclusively conducted on the PathText (BRCA) dataset, focusing on breast cancer. The framework’s performance on other cancer types (e.g., lung, prostate) or tissue types is untested.
4. The knowledge retrieval process involves patch selection (top-k patches), spatial partitioning, and similarity calculations, which may be computationally intensive for large WSIs. Though the paper optimizes via spatial averaging (reducing patch count), scalability to real-world clinical settings with massive WSIs is not explicitly addressed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a highly innovative and well-evaluated framework that addresses long-standing challenges in pathology report generation. I would like to see the authors addressing the limitations in rebuttal.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

N/A

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Historical Report Guided Bi-modal Concurrent Learning for Pathology Report Generation

Author(s):