Abstract

The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, Structural Entities extraction and patient indications Incorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics. The code is available at https://github.com/mk-runner/SEI.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1768_paper.pdf

SharedIt Link: https://rdcu.be/dV1WD

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_41

Supplementary Material: N/A

Link to the Code Repository

https://github.com/mk-runner/SEI

Link to the Dataset(s)

https://physionet.org/content/mimic-cxr/2.0.0/

BibTex

@InProceedings{Liu_Structural_MICCAI2024,
        author = { Liu, Kang and Ma, Zhuoqi and Kang, Xiaolu and Zhong, Zhusi and Jiao, Zhicheng and Baird, Grayson and Bai, Harrison and Miao, Qiguang},
        title = { { Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {433 -- 443}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    To generate more accurate reports from radiology images, this paper proposes a Structural Entities extraction and patient indications Incorporation (SEI) framework. In this framework, the authors design a cross-modal alignment for better text-image semantical alignment. Besides, when generating reports, they seek to incorporate similar historical cases and patient-specific indications to further improve the performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea that incorporates historical cases to further enhance the performance and accuracy of report generation is interesting and promising.
    2. The challenges that consider text-image alignment and patient-specific indication are vital.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In Table 2, the performance improves slightly or even decreases with the proposed indications (SEI-1), especially for the clinical efficacy metrics (i.e., RG: 0.241 -> 0.249; CX5: 0.542 -> 0.545; CX14: 0.474 -> 0.460).

    2. In Table 1, some NLG evaluation metrics are missed, such as BLUE1, BLUE3 and CIDEr. Especially for CIDEr, it is more significant as it can be better to reflect the performance generated report [1] and many existing methods, like the baseline RGRG [2] also calculate this one. It would be better to provide these results, which would make this paper more convincing.

    [1] Vedantam R, Lawrence Zitnick C, Parikh D. Cider: Consensus-based image description evaluation. CVPR, 2015. [2] Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. CVPR, 2023.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to Weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using similar and historical cases to enhance the performance is interesting. However, the results cannot demonstrate the effectiveness of the proposed modules. That is what I am concerned about.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The SEI method exhibits remarkable proficiency in mitigating noise during the cross-modal alignment process, thus enabling seamless retrieval of analogous historical cases without relying on gradients. Its exceptional performance on the MIMIC-CXR dataset is particularly noteworthy, showcasing substantial enhancements in both natural language generation and clinical efficacy metrics compared to the state-of-the-art approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper presents a method for extracting structural entities to retrieve factual entity sequences from reports. (2) This paper introduces a cross-modal fusion network to amalgamate similar historical cases, patient-specific indications, and imaging data. This incorporate empirical information from similar cases, and comprehend the examination intentions of patients. (3) This paper contrasts the methodologies employed in recently published articles.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Is the generation of imageing reports commonly utilized in clinical practice? I’m curious about the time and efficiency differences between doctors directly composing imaging reports and the process of AI generating them initially, followed by doctors reviewing and refining them. (2) The letters of formula (1) need to be marked and explained more clearly

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Unfortunately, no code seems available to replicate or build upon the proposed approach.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The author should consider addressing the weaknesses above in order to further strengthen the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the strengths and weaknesses of the paper, I would recommend weak accept. This paper presents the structural entities extraction approach for extracting factual entity sequences from medical reports. Additionally, it introduces a cross-modal fusion network aimed at directing the text decoder’s attention towards discriminative features within X-ray images, while also incorporating historical empirical information from comparable cases. The paper presents experimental results that demonstrate improved performance compared to existing similar methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Authors propose an approach to generate radiology reports from chest X-ray scans. The approach is based on two main components. First, the method employs a multi-modal alignment of images and texts, where important words in the reports are selected using structural entities extraction. Second, the text decoder component uses the pre-trained multi-modal model, similar historical cases from gradient-free retrieval, and patient-specific information to generate high-quality reports. The approach is validated on the MIMIC-CXR dataset in different scenarios compared to recent and relevant comparison methods. Ablation studies on the different components of the proposed methods are also performed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Overall, this paper is well-written, and the organization is clear.
    • The methodology is sound and provides interesting methodological contributions to solve the challenging radiology report generation task.
    • Robust evaluation demonstrating overall outperforming results in comparison to numerous relevant SOTA approaches specifically designed for this task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The approach is based on numerous technical components, and the overall framework appears quite intricate. The description and justification of some specific design choices could be better/further explained.
    • Not much discussion on the results nor limitations of the proposed method were proposed by the authors. Integrating some would benefit the quality of the paper. (e.g. being evaluated on one dataset? would it work for other imaging modalities or medical applications? How does it rely on the quality/amount of data?)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • Code availability (upon acceptance?) would be a big plus.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major:

    • Section 1, paragraph 3: the methodology description in this part of the manuscript lacks clarity. I deem this part needs some rephrasing to better understand the components of the proposed approach.
    • Figure 1: Overall, the text is too small, and the figure is difficult to understand at first sight and on its own. One missing information to integrate is maybe the order of the steps in using all these components.
    • Section 2 & Figure 1: Output rules are depicted but not discussed at all in the main text, which is an important design choice (to further justify) to account for modalities/input missingness. Also, I am curious about the ratio of missing inputs in the dataset. How does it impact the training and the performances?
    • Section 3, implementation details: The initialization of the encoders / decoders in such multimodal approaches are crucial, especially considering the amount of available models in this domain and that medical datasets are significantly smaller to train those. But authors used ImageNet pretraining (not aligned to text, why not CLIP weights) for the image encoder, SciBERT (specific to scientific knowledge but not to biomedical) for the text encoder. Does it have a significant impact on the performances?
    • Section 3: authors present results with SEI-0 and SEI-1; while this is, to me, an interesting contribution to the methodology, I was expecting SEI-n with n>1. How does it impact performances?
    • Section 3, Table 2: I am confused on the difference between “+similar historical cases”, “+indications (SEI-0)”, “+indications (SEI-1)”, i.e. lines 3, 4, 5, as the caption says “SEI-n represents our SEI incorporated with information from n similar historical cases.” What is the actual difference between line 3 and line 5? using this information during training and/or testing only?

    Minor: Figure 2: Overall, the text is too small. What is the “new / non matching color” in the reference report? A few other examples would be welcome (in supplementary materials?), with maybe some failing/missing information examples?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is well-writtent, with precise objectives and relation to previous work, clear methodology, and sound evaluation. A few points (see weaknesses and major comments) could improve the quality of the manuscript. However, strengths weigh over those modest weaknesses. So I recommend Weak Accept score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Authors provided some clarifications and answers to my comments in the rebuttal. I am satisfied with most of the points. However, the SEI-n n>1 instability and the fact that in the absence of indications, authors observe optimal performance at n=5, bring some further concerns in the soundness of the proposed approach and it’s evaluation. So, I would recommend the same score 4. Weak Accept.




Author Feedback

We appreciate all the reviewers for valuable comments, and the concerns are addressed in the following.

R1-Q1 and R5: Clarification of the ablation study in Table 2. The ablation study effectively demonstrates the positive effects of similar historical cases (SHC) and indications on model performance. Specifically, (c) and (d) in Table 2 denote SEI-1 without indications and SEI-1 without SHC, respectively. The key difference between (c) and (e) lies in whether they use indications in the report generation module. These clarifications will be added in our final version. We observe that (c) and (d) exhibit improvements across all metrics when compared to (a) and (b), implying that using either SHC or indications independently can enhance performance. While the CX14 metric in (e) shows a decrease compared to (c), the sum of all metrics in (e) exhibits an increase of 4.9%. This indicates that the combined use of SHC and indications, namely SEI-1, leads to further enhancements in performance. Also, the experiments in Table 1 prove the effectiveness of SEI-1.

R4 and R5: Code availability. We use the widely recognized public dataset MIMIC-CXR. Additionally, we have made our code and checkpoints accessible anonymously at anonymous.4open.science/r/SEI-14B8. These ensure that our experiments are reproducible.

R4-Q1: Our model’s efficiency. AI can quickly generate high-quality draft reports, thereby allowing doctors to focus on analyzing more complex cases and enhancing overall efficiency. According to Reference [1], AI can reduce the diagnostic time for chest X-rays by 10% compared to radiologists. Our model, on average, generates a report in just 7 seconds, providing doctors with timely and high-quality drafts for review, as shown in Fig. 2.

R5: Implementation details of encoders. As our contributions focus on cross-modal alignment and integrating SHC and indications, we use general encoders to demonstrate that performance improvements are due to our design, not domain-specific pre-trained models. This approach allows us to clearly attribute the improvements to our innovative design. Using domain-specific pre-trained models might further improve the generation effect, we will try it in the future.

R5: SEI-n performance. When n>1, the performance of SEI-n declines compared to SEI-1. Our analysis suggests that this reduction comes from the instability in the feature space, which is likely caused by missing inputs, as well as interference between SHC and indications. Interestingly, in the absence of indications, we observe optimal performance at n=5.

R5: Our model’s limitations. Our model does not incorporate the patient’s temporal and multi-view information. However, it can be fine-tuned to accommodate other 2D medical images with about 1000 cases. We will emphasize these limitations in the final version.

R1-Q2: Adding additional NLG metrics. Due to space constraints, we were unable to include all NLG results. Additionally, the NLG metrics provided in our draft are adequate for assessing the lexical similarity between reports. It’s important to note that this task places greater emphasis on the CE metrics.

R5: The ratio of missing inputs in Fig. 1. In SEI-1, 42.2% of the testing set lacks indications. Compared to samples without indications, the sum of all metrics for samples with indications increases by 12.9%.

R5: Non-matching color in Fig. 2. The non-matching color in the reference report indicates missing information. Additional examples will be included if supplementary materials are available.

R4-Q2 and R5: Other issues. A detailed description of Eq. (1), the technical components, and paragraph 3 in Section 1 will be provided in our final version. Additionally, we will include explanations of the output rules in Fig. 1 and enhance the clarity of Figs. 1 and 2.

[1] Ahn JS, et al. Association of Artificial Intelligence–Aided Chest Radiograph Interpretation With Reader Performance and Efficiency. JAMA Network Open. 2022.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The use of structual entity extraction and similar case retrieval are inspirational, which would impact future medical imaging research that involve natural language processing and exploit similar cases in large-scale datasets.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The use of structual entity extraction and similar case retrieval are inspirational, which would impact future medical imaging research that involve natural language processing and exploit similar cases in large-scale datasets.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents an interesting method that uses multi-modal alignment of images and texts, and historical cases to improve report generation. The reviewers have raised several points requiring further clarification to improve the overall quality of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents an interesting method that uses multi-modal alignment of images and texts, and historical cases to improve report generation. The reviewers have raised several points requiring further clarification to improve the overall quality of the paper.



back to top