Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Radiology report generation (RRG) is an emerging field that aims to automatically generate free-text clinical descriptions of radiographic images, incorporating temporal disease progression. However, existing methods rely on coarse-grained image representations and lack explicit mechanisms to integrate patients’ historical information. To address these limitations, we propose a novel framework Diff-RRG that introduces longitudinal disease-wise patch Difference as guidance for large language model (LLM)-based Radiology Report Generation, aligning with the real-world diagnostic process. Our approach extracts disease-wise difference maps to identify fine-grained patches associated with specific diseases and to capture the difference between consecutive radiographs. Such information is fed into the LLM to provide direct guidance on disease progression. Accordingly, the resulting generated reports can be explained by pinpointing the related regions in the image, thereby enhancing explainability. In the extensive experiments, we have achieved state-of-the-art performance in most of the natural language generation and clinical efficacy metrics on the Longitudinal-MIMIC dataset. Our code is available at https://github.com/ku-milab/Diff-RRG.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4138_paper.pdf

SharedIt Link: https://rdcu.be/eHwWg

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04981-0_15

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/ku-milab/Diff-RRG

Link to the Dataset(s)

https://physionet.org/content/mimic-cxr-jpg/2.1.0/

BibTex

@InProceedings{YunHan_DiffRRG_MICCAI2025,
        author = { Yun, Hannah AND Maeng, Junyeong AND Kang, Eunsong AND Suk, Heung-Il},
        title = { { Diff-RRG: Longitudinal Disease-wise Patch Difference as Guidance for LLM-based Radiology Report Generation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},
        page = {152 -- 161}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces longitudinal disease-wise patch Difference as guidance for large language model (LLM)-based Radiology Report Generation (Diff-RRG), which attempts to align with the real-world diagnostic process. The method develops Disease-wise Difference Map extraction (DDM) and Disease Progression Guid ance (DPG). DDM generates disease-wise difference maps by analyzing differences between current and prior chest X-ray images at the patch level. DPG ensures that the generated reports preserve clinical context while accurately capturing the temporal dynamics of disease evolution.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper tries to take advantage of the temporal dynamics in longitudinal images to capture the morphology changee in chest x-ray images for report generation. The motivation is great.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Inconsistent notation usage - Particularly between the term Sij in Equation (2) and si, ss in Equation (3), which creates ambiguity.
2. Formatting inconsistency - The textual description of Equation (2) does not fully align with its mathematical formulation. Equation (2) appears to lack a closing parenthesis before the “>” operator in its current presentation.
3. Clarification needed for Epos - The role and design rationale of the positional encoding term Epos in Equation (3) require explicit elaboration.
4. Labeling inconsistency in DPC section - The use of “label -2” alongside labels 0, -1, and 1 is semantically distinct and risks confusion. We recommend employing a different labeling convention for this category to ensure conceptual clarity.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

method descriptions and method design should be improved
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper’s main contribution is the novel Diff-RRG framework for radiology report generation, which explicitly captures fine-grained disease progression from longitudinal chest X-ray data. This is achieved through two key components: Disease-Wise Patch Difference, which captures fine-grained spatial details in disease-relevant regions, and Disease Progression Guidance, which provides explicit supervision based on disease progression. The proposed method achieves superior performance on the Longitudinal-MIMIC dataset.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The writing is clear and well-organized, making the paper easy to follow. The authors design solid experiments and present both quantitative and qualitative results. The model’s explainability makes it well-suited for clinical applications. Additionally, the proposed DDM and DPG modules effectively address the research gap by capturing fine-grained details and providing direct guidance for modeling disease progression.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

In the results section, Diff-RRG shows only marginal improvements compared to HC-LLM, particularly on NLG metrics, which warrants further explanation. Additionally, the incremental gains from adding the DPM and DPG modules are limited and should be discussed in more detail.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The writing is clear and easy to follow, and the motivation is well stated. While the proposed method is appropriate for addressing the identified problem, the results do not strongly support the core idea and require further discussion.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper presents a novel longitudinal radiology report generation (RRG) framework, Diff-RRG, by incorporating longitudinal disease effect analysis into LLM-based RRG. Specifically, the authors propose two key components named Disease-wise Difference Map extraction (DDM) and Disease Progression Guidance (DPG), to compute the fine-grained patch-level differences between prior and the current images, and classify disease progression status as well as use that for prompt guidance for LLM respectively. Experimental results on the Longitudinal-MIMIC dataset show that the proposed model can outperform other single input and longitudinal input RRG baselines over NLG and CE metrics.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is overall well-written with organized structure and clear presentation; The quality of the paper is relatively high especially with regard to the comprehensive experimental results section; The topic of RRG by incorporating longitudinal disease information as guidance is novel and relevant for the community
- The authors propose two methodology novelties, which are reasonable design choices to provide patch-level progression-aware guidance to generate more clinically meaningful reports
- The authors have conducted extensive experiments for evaluation on the Longitudinal MIMIC-CXR dataset, the model has shown general improvements comparing other baselines, and also through the ablation studies the authors show the effectiveness of the proposed modules
- Qualitative comparisons (Fig.2.) and model explainability (Fig.3.) are provided to show that Diff-RRG is superior to HC-LLM, and also the model is able to identify the connection between the visual findings and the corresponding text
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- It would be beneficial to include the performance of disease progression classifier (Section 2.3), as it will directly supervise the LLM via prompts
- The proposed model is only evaluated on a single dataset, which raises the concern about the generalizability; I wonder if it will be feasible to test the model on different longitudinal cohorts to serve as external validation set for further strengthening the paper results
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is overall a good work that is novel and relevant for the direction of medical report generation. The authors have conducted comprehensive evaluation (although on a single dataset) for both quantitative and qualitative results. The authors have also provided link to code. I recommend acceptance of the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely appreciate all reviewers’ constructive feedback and insightful comments. R#1 Q1: Inconsistent notation usage A1: S_ij in Eq. 2 and s_i, s^-1 in Eq. 3 denote conceptually distinct notations. Nevertheless, we acknowledge that using similar uppercase and lowercase letters may cause ambiguity. For clarity, we will replace s_i and s^-1 with more descriptive notations. Q2: Formatting inconsistency A2: We will correct the missing closing parenthesis in Eq. 2 in the final version. Q3: Clarification for E_pos A3: E_pos is a learnable positional encoding that embeds disease identity and order, enabling the model to leverage disease relationships in the codebook. Q4: Labeling progression labels A4: “Label -2” indicates cases where no valid patches are selected for a disease, reflecting clinical scenarios in which physicians find no observable evidence to assess progression. To clearly distinguish this state from worsening, stable, and improving categories, we will change this output label to “N/A”. R#2 Q1: Limited improvements over HC-LLM in NLG metrics A1: As the NLG metric primarily reflects linguistic naturalness, most methods exhibit comparable performance, and our proposed approach is no exception. This, in turn, indicates that our method is capable of generating linguistically natural reports. However, we would like to draw attention to the significant differences observed in clinical efficacy metrics, which are more critical and should be emphasized in medical applications. Specifically, our model achieves significant gains in clinical efficacy (CE) metrics (Table 1), demonstrating a notable 2.6% improvement in F1-score compared to HC-LLM. Furthermore, qualitative results (Fig. 2) highlight that our model generates more progression-aware and clinically relevant reports, going beyond mere textual similarity as measured by NLG scores. In addition, unlike HC-LLM, our method uniquely provides explainability by visualizing disease-relevant patches. Q2: Incremental gains of DDM and DPG modules A2: Compared to the baseline with prior image and report (Table 1), the DDM module improves BLEU-4 by 0.6% and F1-score by 1.6% by accurately identifying fine-grained disease-relevant patches, allowing the model to focus on clinically meaningful regions. The DPG module further increases BLEU-4 by 0.4% and F1-score by 0.6% by explicitly modeling disease progression states, guiding the LLM to generate progression-aware reports. In the RRG domain, such gains are considered meaningful due to the task’s complexity and clinical sensitivity. Beyond numerical improvements, the DDM also contributes to the model’s explainability (Fig. 3). R#3 Q1: The performance of the disease progression classifier A1: The classifier operates as an auxiliary module within the DPG, generating progression prompts to guide the report generator. Its effectiveness is thus reflected indirectly in the improvements in NLG and CE metrics observed after adding the DPG (Table 1). We appreciate the reviewer’s insightful suggestion to report its performance explicitly. In our evaluation, the classifier achieved a test F1-score of 0.84, supporting its reliable progression guidance. We will add the related content in the camera-ready version if accepted. Q2: Generalizability concerns A2: Recent works, including [8,18], also evaluate their methods solely on the Longitudinal-MIMIC dataset, which is the largest publicly available benchmark for longitudinal RRG. Although another dataset exists in [14], it contains only a limited number of diseases and is rarely used in this field. Therefore, we did not pursue a direct comparison. We acknowledge the importance of external validation and plan to extend our study to additional cohorts in future work.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper is well written and also the method is reasonably designed and also novel. However, something needed to be figured out: the experiments doesnot well support the claim.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All reviewers leaning towards acceptance. While there are plenty of papers on chest X-rays, the temporal evolution of a disease modeling aspect is a good addition to reflect in the conference.

back to top

Diff-RRG: Longitudinal Disease-wise Patch Difference as Guidance for LLM-based Radiology Report Generation

Author(s):