Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Image-to-text radiology report generation aims to produce comprehensive diagnostic reports by leveraging both X-ray images and historical textual data. Existing retrieval-based methods focus on maximizing similarity scores, leading to redundant content and limited diversity in generated reports. Additionally, they lack sensitivity to medical domain-specific information, failing to emphasize critical anatomical structures and disease characteristics essential for accurate diagnosis. To address these limitations, we propose a novel retrieval-augmented framework that integrates exemplar radiology reports with X-ray images to enhance report generation. First, we introduce a diversity-controlled retrieval strategy to improve information diversity and reduce redundancy, ensuring broader clinical knowledge coverage. Second, we develop a comprehensive medical lexicon covering chest anatomy, diseases, radiological descriptors, treatments, and related concepts. This lexicon is integrated into a weighted cross-entropy loss function to improve the model’s sensitivity to critical medical terms. Third, we introduce a sentence-level semantic loss to enhance clinical semantic accuracy. Evaluated on the MIMIC-CXR dataset, our method achieves superior performance on clinical consistency metrics and competitive results on linguistic quality metrics, demonstrating its effectiveness in enhancing report accuracy and clinical relevance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3102_paper.pdf

SharedIt Link: https://rdcu.be/eHdUk

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04978-0_58

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaBao_SemanticAware_MICCAI2025,
        author = { Zhang, Baochang AND Jia, Chen AND Liu, Shuting AND Schunkert, Heribert AND Navab, Nassir},
        title = { { Semantic-Aware Chest X-ray Report Generation with Domain-Specific Lexicon and Diversity-Controlled Retrieval } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {607 -- 616}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes DrLS, a novel framework for chest X-ray report generation, integrating a diversity-controlled retrieval strategy (utilizing Determinantal Point Processes - DPP), a domain-specific lexicon-weighted cross-entropy loss, and a sentence-level semantic loss. The goal is to enhance diversity, clinical accuracy, and semantic coherence in automatically generated medical reports. Experiments on the MIMIC-CXR dataset demonstrate superior performance compared to the state-of-the-art method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper innovatively employs DPP for controlling retrieval diversity, introduces a specialized chest-specific medical lexicon, and develops a novel sentence-level semantic loss, effectively enhancing the clinical relevance of generated reports.
2. Authors thoroughly demonstrate the contributions of each proposed component through detailed experiments, offering empirical support.
3. Demonstrated state-of-the-art results on well-established clinical consistency benchmarks and competitive performance in linguistic metrics on MIMIC-CXR.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. While the baseline methods used for comparisons are representative, the inclusion of additional recent approaches would strengthen the robustness of comparative evaluations, such as [1,2,3,4,5].
2. There is no analysis provided for the experimental data corresponding to the different M_gt presented in Table 1. What is the significance of listing these results?
3. According to the data in Table 2, the performance improvement brought by L_sls is limited. Besides, why the ablation study under the setting of M_gt=100 instead of Cpl.?
4. λ_med controls the weight of the lexicon, thereby influencing the model’s output of medical domain-specific information. However, the experimental section lacks an ablation study on this parameter.
5. Do L_sws in Figure 1 and L_sls in Equation 6 represent exactly the same meaning? If they do, it would be best to unify their expressions.
[1] KIA: Knowledge-Guided Implicit Vision-Language Alignment for Chest X-Ray Report Generation [2] Simulating doctors’ thinking logic for chest X-ray report generation via Transformer-based Semantic Query learning [3] PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation [4] A medical semantic-assisted transformer for radiographic report generation [5] Bootstrapping large language models for radiology report generation
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript proposes inspiring research ideas based on the baseline method SEI. However, considering the specific issues mentioned in weakness need to be addressed in the rebuttal, especially the performance improvements of the proposed contributions are trivial, I decide to rate my initial score as 3.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors propose a report enhancement generation method that integrates exemplar radiology reports with X-ray images, achieving outstanding performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Clear expression and well-structured presentation.
2. Comprehensive comparative experiments.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Equation (1) defines the DPP probability but does not discuss its computational complexity, particularly the determinant calculation of large-scale submatrices. A more efficient strategy and a comparison with alternative optimization methods are recommended.
2. The vocabulary weighting strategy employs only binary weighting, neglecting the varying importance of medical terms. A finer-grained weighting approach, such as TF-IDF or medical knowledge graphs, could enhance sensitivity to critical medical information.
3. DPP prioritizes diversity, whereas clinical retrieval emphasizes relevance. The consistency between DPP-selected subsets and expert choices should be analyzed, and weighted or constrained DPP optimization should be considered to improve clinical applicability.
4. The domain adaptability of the SEI pre-trained model remains unverified, and the sim(., .) measure may suffer from scale inconsistencies.
5. Experiments are conducted on a single dataset, lacking an evaluation of generalization across multiple datasets.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The adequacy of the experiments and the innovativeness of the methodology.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
This paper addresses the task of image-to-text radiology report generation, aiming to generate accurate and comprehensive diagnostic reports by leveraging both X-ray images and historical report data. The authors identify key limitations in existing retrieval-based methods and proposed a new framework called DrLS, which integrates:
1. Diversity-Controlled Retrieval Strategy: A novel retrieval method that enhances information diversity and reduces redundancy in retrieved exemplars, ensuring broader clinical knowledge coverage.
2. Medical Lexicon-Weighted Loss: The development and use of a comprehensive medical lexicon (covering anatomy, diseases, radiological findings, treatments, etc.) integrated into a weighted cross-entropy loss, to boost the model’s attention to domain-specific terminology.
3. Sentence-Level Semantic Loss: A semantic alignment mechanism at the sentence level to improve factual consistency and clinical relevance in the generated reports.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A Novel Retrieval Strategy: the authors introduce a diversity-controlled approach to retrieval that reduces content repetition and enriches report diversity.
2. The authors build a specialized medical lexicon and incorporates it into the loss function to prioritize important clinical terms during generation.
3. Experiments on the MIMIC-CXR dataset show that DrLS achieves superior performance in clinical consistency metrics while maintaining competitive linguistic quality. 4.The proposed framework shows potential for real-world deployment in clinical settings, and the code will be released to support reproducibility and further research.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Limited Evaluation on a Single Dataset: The model is only evaluated on the MIMIC-CXR dataset.This raises concerns about generalizability to other datasets or modalities (e.g., CT, MRI, multi-view X-rays).No cross-dataset validation or domain adaptation is mentioned.
2. Reliance on Pre-existing Reports for Retrieval: The retrieval-based approach may still inherit biases or errors present in historical reports.The framework depends on the availability and quality of exemplars, which may not be consistent across institutions.
3. Lexicon Dependency: The effectiveness of the Medical Lexicon-Weighted Loss is tied to the completeness and accuracy of the lexicon.If the lexicon is incomplete or overly focused on chest-related terms, it may limit scalability to other body parts or specialties.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The manuscript is well-structured, with high-quality figures, rigorous logical flow, and fluent writing. It presents a clear and coherent narrative, with no evident shortcomings.
2. The proposed methodology exhibits a satisfactory level of innovation and offers considerable potential for future extension and application.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank all reviewers for their thoughtful and constructive feedback. We are encouraged by the recognition of the novelty of our approach (R1, 2). Reviewers appreciated the clear structure and organization of the paper (R1, 3), as well as its rigorous experimentation and comprehensive evaluation (R1, 3). Our work was noted as a methodologically meaningful contribution (R2, 3) with strong potential for clinical applicability (R2) and real-world impact (R3). We are also pleased that our efforts toward reproducibility were acknowledged (R1,2,3). Below, we address all key comments and clarify any misunderstandings to further improve the quality and clarity of our paper.

Additional Baselines(R1): Thanks for sharing additional recent baselines for comparison. While our current baselines (SEI, CGPT2, M2KT, RGRG, etc.) represent strong and diverse state-of-the-art methods, we agree that further comparison is also valuable. We will further improve the comparative experimental design in future journal extension.

Analysis of M_gt and Ablation on Cpl.(R1): Thank you for the valuable reminder, The M_gt truncation levels reflect report lengths commonly seen in clinical practice. Our aim was to demonstrate robustness across varying lengths. The ablation studies used M_gt=100 to balance length consistency and evaluation stability. In the revised version, we will provide a clearer explanation of this parameter to help readers better understand the experimental results.

Hyperparameter selection of λ_med (R1): We have conducted experimental analyses to evaluate the impact of λ_med on medical term coverage and overall generation quality. We acknowledge that this aspect can be better explained in the paper, and we appreciate the constructive feedback. We will use it to further optimize λ_med and incorporate this direction into future extension work.

Generalizability Beyond MIMIC-CXR(R2,3): MIMIC-CXR is currently the largest publicly available chest X-ray dataset and is widely used for benchmarking. We agree on the importance of generalization and add a discussion in the paper.

DPP Efficiency (R2): In our implementation, we leverage the submodular property of DPP and apply a greedy algorithm to approximate the maximization of the subset determinant, thereby avoiding the high computational cost of directly computing determinants of large submatrices. Additionally, we adopt a low-rank feature approximation strategy when constructing the DPP kernel matrix to further improve efficiency. These designs allow us to maintain exemplar diversity while effectively controlling computational overhead. We also appreciate your suggestion regarding efficiency and alternative optimization strategies.

Suggestions(R2,R3): Thanks for your constructive suggestions on lexicon dependency, lexicon weighting strategy, and diversity-controlled retrieval strategy. We will explore more robust and generalizable lexicon construction methods in future work, including incorporating knowledge graphs and dynamic term weighting strategies such as TF-IDF. For the retrieval strategy, we will investigate integrating relevance-aware or expert-guided constraints into the DPP framework to better align with clinical decision-making patterns.

Correction (R1): Thank you for noticing that. L_sws in Figure 1 and L_sls in Equation (6) refer to the same semantic loss term. We will correct it.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

All reviewers see the technical merit of the approach.

back to top

Semantic-Aware Chest X-ray Report Generation with Domain-Specific Lexicon and Diversity-Controlled Retrieval

Author(s):