Abstract

Harnessing the robust capabilities of Large Language Models (LLMs) for narrative generation, logical reasoning, and common-sense knowledge integration, this study delves into utilizing LLMs to enhance automated radiology report generation (R2Gen). Despite the wealth of knowledge within LLMs, efficiently triggering relevant knowledge within these large models for specific tasks like R2Gen poses a critical research challenge. This paper presents KARGEN, a Knowledge-enhanced Automated radiology Report GENeration framework based on LLMs. Utilizing a frozen LLM to generate reports, the framework integrates a knowledge graph to unlock chest disease-related knowledge within the LLM to enhance the clinical utility of generated reports. This is achieved by leveraging the knowledge graph to distill disease-related features in a designed way. Since a radiology report encompasses both normal and disease-related findings, the extracted graph-enhanced disease-related features are integrated with regional image features, attending to both aspects. We explore two fusion methods to automatically prioritize and select the most relevant features. The fused features are employed by LLM to generate reports that are more sensitive to diseases and of improved quality. Our approach demonstrates promising results on the MIMIC-CXR and IU-Xray datasets. Our code will be available on GitHub.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0877_paper.pdf

SharedIt Link: https://rdcu.be/dV17W

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_36

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

https://drive.google.com/file/d/1DS6NYirOXQf8qYieSVMvqNwuOlgAbM_E/view

BibTex

@InProceedings{Li_KARGEN_MICCAI2024,
        author = { Li, Yingshu and Wang, Zhanyu and Liu, Yunyi and Wang, Lei and Liu, Lingqiao and Zhou, Luping},
        title = { { KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {382 -- 392}
}

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors propose a Knowledge-enhanced Automated Radiology Report Generation framework (KARGEN). This proposed framework integrates disease-related features and regional image features to generate radiology reports more accurately. Additionally, the proposed method incorporates domain knowledge, which can improve the model’s performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

First, the proposed method incorporates the domain knowledge, which not only improve the model’s performance but also make the result more reliable.

Second, the proposed method develop module to fuse disease-ralated features and regional image feature.

Third, the proposed verify on two public dataset, and results demonstrates the effectiveness of the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

First, there are multiple images corresponding to one report in both datasets in some patient samples. How did the authors deal with this situation?

Second, for the IU-Xray dataset, did the results of state-of-the-art models just directly copy from the references? Do you confirm that you use the same training and testing set as they did?

Third, the authors mentioned using a frozen LLM to generate reports in the abstract but didn’t mention how they used the LLM in the main manuscript, such as fine-tuning, fully fine-tuning, or not training the LLM. Additionally, why did the authors not prefer fine-tuning the LLM to generate the report? Computational resource problem?

Fourth, what is the generalization ability of this model?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

(1) Please verify the the generalization ability of this model? (2) Please address the main weaknesses of the paper.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The main strengths and main weaknesses of the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper uses LLMs for automated radiological report generation from chest x-ray images. The primary novelty is to incorporate a knowledge graph linking different disease groups to guide the model in producing relevant output.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Timely and well motivated.

Nicely written and presented.

Evaluation is pertinent and appears to compare against existing state of the art with increased performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Unclear how significant the increases in performance metric are.

Could do with explaining the set of metrics utilised and what exactly they convey to the reader.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

no
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
This is a nicely presented paper addressing a timely challenge. The idea is sensible and the results seem promising. I have just a few comments and suggestions:
- For the primarily imaging audience at MICCAI, it would help a lot to give some introduction to the set of metrics used to evaluation the model output. What features of the output text do they evaluate? Are they all essentially the same, or do they examine different aspects? How do we interpret improvement in score?
- How significant are the score improvements observed? The differences in absolute values appear fairly small over competing techniques, but the single qualitative example appears to show substantially improved performance over a baseline. How typical is this result? How do we interpret score improvements of a few %?
- It is unclear from the ablation study what the effect of the knowledge graph is. Would be great to see how much perturbations of the knowledge graph affect performance. E.g. if it were entirely randomised, or if edges / nodes were deleted at random. What is the dependence on the precise structure of the graph? (That structure seems rather arbitrary and only grouping diseases of the same organ rather than capturing coincidence of different diseases so unclear precisely why it is useful.)
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See above - I quite liked it.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper introduces KARGEN, a new framework framework to integrate disease-related features extracted from chest X-ray images with regional image features through the use of a medical knowledge graph. The primary focus is on enhancing the quality and accuracy of automated radiology reports by leveraging the vast information available in medical knowledge graphs and the advanced natural language processing capabilities of LLMs. KARGEN significantly outperforms existing methods in the task of radiology report generation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths of this paper are:
- a novel framework that integrates Large Language Models with a medical knowledge graph. It is able to extract disease-related features from chest X-ray images and integrate them with regional image features results in the generation of more sensitive and high-quality radiology reports.
- the evaluation is robust and comprehensive. The authors compare their framework with several state of the art frameworks
- the authors test their framework in two state of the art datasets
- manuscript is well structured and written
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This is a very comprehensive study and a very good contribution to the field. The only concerns that I see are related to how the knowledge graphs are maininted and updated, since the paper does not provide details about it.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The existing literature has extensively explored automatic report generation from medical images, underscoring the significance of advancements in this area. The results presented in Table 1 clearly demonstrate that KARGEN has surpassed all other state-of-the-art models in performance, marking a significant contribution to the field. The robustness of the framework is evident, and the conducted ablation studies further confirm its effectiveness, showcasing the framework’s ability to generate clinically useful radiology reports by leveraging Large Language Models and a knowledge graph. However, for future iterations and to enhance the generalizability of this framework, it would be beneficial to provide more detailed insights into the creation, maintenance, and updating processes of the knowledge graph. Understanding these processes is crucial for ensuring the framework remains relevant and accurate as medical knowledge evolves.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Strong Accept — must be accepted due to excellence (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The paper presents a novel framework that for the first time integrates a medical domain knowledge graph with Large Language Models for R2Gen. This integration is crucial as it activates relevant knowledge within LLMs, leveraging the wealth of knowledge within these models and the specific, structured information contained in the knowledge graph.
- The framework outperforms all other state-of-the-art models in generating radiology reports, as evidenced by its performance on the MIMIC-CXR and IU-Xray datasets.
-The ablation studies conducted provide clear evidence of the individual contributions of the framework’s components, including the knowledge-enhanced disease-related features, Graph Convolutional Network, and fusion methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models

Author(s):