Abstract

Biomedical literature serves as a critical repository for cutting-edge research achievements, encompassing substantial statistically validated biological knowledge. However, the dispersed storage and unstructured characteristics of such literature significantly hinder manual acquisition efficiency while increasing error susceptibility. To address these challenges, this study proposes an intelligent literature knowledge mining platform. Three core innovations distinguish this research: (1) The development of an extensible literature collection-parsing-structuring framework based on a “literature tree” architecture (ECPS-LitTree), which facilitates HTML dynamic report generation and full-cycle data management, offering a novel solution for cross-source heterogeneous literature knowledge aggregation; (2) The design of a configurable requirement customization framework (CRC) that combines named entity recognition (NER) technology with user-configurable mining templates to enable personalized knowledge extraction; (3) The implementation of an integrated online platform, providing comprehensive services including visual analytics, interactive search, and batch data export functionalities. Experimental validation demonstrates that the platform surpasses existing mainstream tools in literature retrieval success rate, processing efficiency, and knowledge extraction volume. The platform’s flexible configurability exhibits broad applicability across multiple biomedical domains, offering researchers a reliable intelligent tool for knowledge discovery. The Configurable Platform is publicly and freely accessible at https://medseeker.genemed.tech/.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0638_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YuaXin_Configurable_MICCAI2025,
        author = { Yuan, Xinpan and Li, Bozhao and Zhao, Guihu and Wang, Yueming and Hua, Liujie and Kuang, Junhua and Chen, Jianguo and Xie, Shaomin and Li, Gan},
        title = { { Configurable Platform for Biomedical Literature Mining via Multimodal-Driven Extraction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {88 -- 98}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a platform for biomedical literature mining with multi-modal extraction. It contains features of tree based knowledge graph, configurable requirement NER and online platform.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Tree based literature search has a very clear and interpretable structure and easily visualizable.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    This paper has not compared nor mentioned literature search with LLM.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    This paper does not seem to be relevant to MICCAI

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper doesn’t seem to be relevant to the topic of this conference.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    This isn’t an area that I am expert of. If the other two reviewers accept, I would accept as well. But if the other two reviewers have one accept and one reject. I will reject as well. Most of my comments stay and not sufficiently addressed



Review #2

  • Please describe the contribution of the paper

    This paper proposes a comprehensive biomedical literature mining framework (ECPS-LitTree + CRC) that significantly improves the limitations of traditional literature mining tools through multimodal parsing, configurable entity recognition, and an interactive visualization platform. The core contributions focus on methodological innovations (e.g., cross-source heterogeneous data aggregation via a “literature tree” and dynamic template configuration mechanisms), qualifying it as a methodological contribution.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Innovation: ECPS-LitTree’s multimodal parsing mechanism (text + image semantic-guided segmentation) and literature tree storage architecture for biomedical literature mining. (2) Practical Value: Supports clinical variant classification under ACMG standards, providing an automated evidence extraction tool for precision medicine with clear translational potential.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) Validated on a small corpus of 800 literature entries (including PubMed Central and user-uploaded PDFs). and compared baseline are all platforms. There is a lack of necessary quantitative assessments. The results in Fig.4 don’t prove the performance superiority. (2) Insufficient functional comparisons with existing platforms and lack of quantitative evaluation of clinical diagnostic performance (e.g., variant classification accuracy). (3) The threshold for “semantic relevance scoring” (Formula 8, θ=0.4) lacks theoretical justification, and sensitivity analysis is missing. (4) Model parameters (e.g., BioBERT fine-tuning strategies) and user-configurable template examples are not disclosed, impacting reproducibility. (5) No discussion on ECPS-LitTree’s adaptability to non-biomedical domains (e.g., chemistry, materials science), limiting the framework’s generalizability.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strengths: a complete platform design (parsing to visualization), maybe practical. Weaknesses: Lack of innovation, limited validation on small scale dataset, incomplete technical details, and unquantified clinical impact.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the development of an integrated platform for biomedical literature mining that addresses the challenges of extracting knowledge from heterogeneous, unstructured sources. Specifically, the paper introduces ECPS-LitTree Framework, Configurable Requirement Customization Framework, Integration of Multiple Mining Approaches and Interactive Visualization Interface. The significance of this contribution is demonstrated through experiments showing that the platform achieves superior performance in literature retrieval success rates (94.75% vs. 42% for PubTator 3.0), processing efficiency, and knowledge extraction volume compared to existing tools, while offering greater configurability and flexibility for biomedical researchers

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The multi-source strategy expands coverage to 94.6% of open-source biomedical literature types, significantly outperforming single-database approaches. The CRC framework introduces a novel template mechanism that dynamically combines regular expressions with ontologies. This approach overcomes the rigidity of traditional NER systems by allowing researchers to define custom entity types and attributes.The system uniquely integrates vision-based Detectron2 segmentation with text parsing.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Most technical components rely on existing models (BioBERT, LayoutParser, BERN2, etc.) rather than introducing novel algorithms. The LayoutParser model [26] is used with minimal modifications for document segmentation. PubTator 3.0 comparison emphasizes retrieval rates but lacks detailed comparison of entity recognition quality. Missing validation of the effectiveness of the graph attention networks in relevance scoring. Lack of examples showing where the system struggles compared to human experts.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper represents a valuable contribution through its comprehensive integration of techniques to address an important problem in biomedical research. Its primary strength lies in engineering a complete solution rather than advancing fundamental algorithms. The demonstrated performance improvements are substantial, but methodological limitations and evaluation gaps prevent it from receiving a higher score. It stands as a solid, above-average contribution that would be strengthened with more rigorous evaluation of extraction quality and component-level analysis.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    This appears to be a solid engineering/systems contribution with clear clinical value, but limited methodological novelty. Success likely depends on whether the venue values practical clinical tools over algorithmic innovation, and how convinced reviewers are by the efficiency claims. The rebuttal is well-written and addresses concerns directly, which improves chances, but fundamental limitations around novelty and evaluation scope remain.




Author Feedback

#638 We sincerely thank the reviewers for their constructive feedback and recognition of our core contributions: (1) the methodological innovation of the ECPS-LitTree and CRC frameworks (Reviewers #2,#3,#4); (2) the platform’s clinical utility and translational potential (Reviewers #2, #3). Below, we address shared concerns followed by detailed responses to individual comments. The platform code and URL will be released upon manuscript acceptance.

  1. (Reviewers #2, #3) Why compare only PubTator3.0? As shown in Table 1, our study functionally compared PubTator3.0, GPDminer, and other tools to highlight our platform’s dynamic configurability and visualization. PubTator3.0 was selected for quantitative benchmarking as a domain-standard tool (SOTA model AIONER, validated across prior studies), ensuring objective evaluation of incremental value (Section 3.4). Results—94.75% full-text retrieval (vs.42% for PubTator3.0) and 40,912 correctly extracted clinical items (vs.25,081 Figure 4—demonstrate the framework’s superiority in real-world tasks.
  2. (Reviewers #2, #3) Reliance on existing models Our contribution focuses on framework design (literature tree architecture, modular templates) and clinical implementation, not model optimization. Using BioBERT ensures reliability (validated in clinical scenarios), while modularity supports future integration of advanced models (e.g., GPT-4o). Systemic innovation and clinical decision support (Table 1, Figure 4C) exceed capabilities of isolated tools.
  3. (Reviewers #2, #3) Lack of GAT validation and sensitivity analysis The threshold θ=0.4 was empirically optimized (Section 3.1) to balance precision/recall, validated by full-text extraction performance (Section 3.4). While sensitivity details are omitted, GAT’s structural encoding (Equation 8) is indirectly verified: simpler embeddings (e.g., BioBERT alone) cannot achieve comparable cross-domain consistency. Stable biomedical results (Figure 4C) confirm robustness.
  4. (Reviewers #2, #4) Insufficient technical details (1) Model parameters: BioBERT uses pre-trained weights without fine-tuning (Section 2.4); parameters follow official repositories. (2) Template configuration: Syntax rules (regex + ontologies) are detailed in Section 2.3; clinical gene templates are provided in the codebase. (3) Multimodal parsing (Figure 2C) and data engine (Equation 9) are described in Section 2.2; code ensures reproducibility. Response to Reviewer #2 Q1/Q2 (Dataset scale; Clinical impact):(1) Experiments used 800 ACMG-rated clinical articles (Section 3.4), simulating real-world workflows. (2) The platform reduced evidence-mining time from 10 to 3.5 minutes/article (65% efficiency gain, Section 3.3); extraction volume (Figure 4C) and 94.75% retrieval success validate clinical value. Q5 (Cross-domain applicability):(1) Core innovations (literature tree, CRC) support multimodal parsing of literature (Equations 3–4) for chemistry/materials science. (2) CRC supports integrating domain-specific ontologies (e.g., ChEBI) via APIs; new regex templates enable dynamic adaptation. Response to Reviewer #3 Q2 (System limitations):The 94.75% parsing success rate (Section 3.2) accounts for failure cases. Limitations stem from data constraints (abstracts-only access) and dense formatting (table/formula errors). Multiple parsers (SCIPDF+PyMuPDF) enhance robustness. Response to Reviewer #4 Q1 (MICCAI relevance):(1) The platform integrates multimodal parsing (Figures 2A–B) with clinical decision optimization (Section 3.3), aligning with MICCAI’s focus on computer-assisted medical analysis.(2) Prior MICCAI works (e.g, NeuroConText [2024]) validate biomedical literature processing value. Q2 (Literature retrieval and LLMs):(1) The platform integrates PMC/BMC sources (Section 2.1) for PMID/PDF acquisition and parsing (Figure 2A).(2) LLMs were excluded due to task-specific uncertainties but are supported in architecture (e.g., GPT-4o integration)—a future research direction.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces a biomedical literature mining platform with practical features and some engineering innovation. However, the work lacks methodological novelty, as it mainly integrates existing models rather than proposing new algorithms. The evaluation is limited, with small-scale validation and insufficient quantitative analysis of clinical impact. The paper does not compare its approach with recent large language model-based methods.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top