Abstract

Data scarcity and privacy concerns limit the availability of high-quality medical images for public use, which can be mitigated through medical image synthesis. However, current medical image synthesis methods often struggle to accurately capture the complexity of detailed anatomical structures and pathological conditions. To address these challenges, we propose a novel medical image synthesis model that leverages fine-grained image-text alignment and anatomy-pathology prompts to generate highly detailed and accurate synthetic medical images. Our methodology integrates advanced natural language processing techniques with image generative modeling, enabling precise alignment between descriptive text prompts and the synthesized images’ anatomical and pathological details. The proposed approach consists of two key components: an anatomy-pathology prompting module and a fine-grained alignment-based synthesis module. The anatomy-pathology prompting module automatically generates descriptive prompts for high-quality medical images. To further synthesize high-quality medical images from the generated prompts, the fine-grained alignment-based synthesis module pre-defines a visual codebook for the radiology dataset and performs fine-grained alignment between the codebook and generated prompts to obtain key patches as visual clues, facilitating accurate image synthesis. We validate the superiority of our method through experiments on public chest X-ray datasets and demonstrate that our synthetic images preserve accurate semantic information, making them valuable for various medical applications.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3619_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_Medical_MICCAI2024,
        author = { Chen, Wenting and Wang, Pengyu and Ren, Hui and Sun, Lichao and Li, Quanzheng and Yuan, Yixuan and Li, Xiang},
        title = { { Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a medical image synthesis model that leverages fine-grained image-text alignment and anatomy-pathology prompts to generate highly detailed and accurate synthetic medical images. They validate the superiority of their method through experiments on public chest X-ray datasets and demonstrate that their synthetic images preserve accurate semantic information, making them valuable for various medical applications.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A medical image synthesis model that leverages fine-grained image-text alignment and anatomy-pathology prompts to generate highly detailed and accurate synthetic medical images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method shares significant similarities with the approach presented in [1], with the main difference being the replacement of the LLM for generating reports with GPT-4. The two key components, namely the anatomy-pathology prompting module and the fine-grained alignment-based synthesis module, have already been introduced in [1]. This raises questions about the novelty and originality of the contributions made in this paper.

    The authors fail to provide a direct comparison between their method and the approach presented in [1]. This is a significant omission, as [1] reports impressive results on the MIMIC-CXR and OpenI datasets, achieving FID scores of 1.0916 and 1.5938, respectively. Moreover, the paper reports inconsistent results for the LLM-CXR method. According to [1], LLM-CXR achieves FID scores of 2.1788 and 1.6597 on MIMIC-CXR and OpenI, respectively. However, Table 1 in this paper reports FID scores of 11.9873 and 5.9869 for the same method and datasets. These discrepancies raise concerns about the reliability and fairness of the comparisons made in the paper.

    The paper heavily relies on GPT-4, a large language model that may limit the reproducibility of the proposed method. GPT-4 is a proprietary model with limited access, making it difficult for other researchers to replicate and build upon the work presented in this paper. Additionally, the token cost associated with using GPT-4 is very high, which could hinder the practical application and widespread adoption of the proposed method, especially in resource-constrained settings.

    [1] Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method shares significant similarities with the approach presented in [1].

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My concerns are addressed and therefore I decide to raise my rating to weak accept.



Review #2

  • Please describe the contribution of the paper

    This paper proposed a medical image synthesis model that improves image-text alignment and anatomy-pathology prompts to improve medical image synthesis. The innovative part of this work is the design of anatomy-pathology prompt and a alignment based synthesis module. This paper designed anatomy and pathology vocabularies based on MIMIC-CXR datasets. Then, the generated report and visual codebook was aligned to generate matched results for the final LLM to generate image.

    The results show that this framework decreases the FID score on MIMIC-CXR and OpenI datasets. The model also showed improvement in anatomy and pathology classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength: 1) novelty in model development to combine prompting, text-image alignment, and image synthesis together to develop this framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) lack of illustration of the model choice. Need more details in the selection of LLM and VQ-GAN. 2) need more demonstration in the usage of this image generator. Other than improving the FID score and serve as data augmentation to improve classification, this paper did not show other directions of this framework.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper needs additional experiments to show the application of this framework.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I don’t think this paper demonstrate its strong usage in clinical side, and it needs more details to justify each component of this framework, such as ablation analysis.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel medical image synthesis model that addresses the challenges of data scarcity and privacy concerns in the medical imaging field. By leveraging fine-grained image-text alignment and anatomy-pathology prompts, the proposed methodology generates detailed and accurate synthetic medical images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper combines advanced natural language processing techniques with image generative modeling to achieve precise alignment between descriptive text prompts and synthesized medical images’ anatomical and pathological details.

    2. The evaluation results show that the proposed method significantly outperforms real data in terms of accuracy and precision in anatomy and pathology classification.

    3. The fine-grained alignment-based synthesis module is a new feature of the proposed methodology.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper may lack generalizability due to the focus on specific chest X-ray dataset.

    2. The paper may not thoroughly address potential biases or limitations in the training data used for the synthesis model.

    3. The scalability and efficiency of the proposed model may not be thoroughly discussed.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Consider expanding the scope of study to include a more diverse range of datasets or imaging modalities to improve the generalizability of your methodology.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel formulation that combines advanced natural language processing techniques with image generative modeling. The evaluation results show that the proposed method outperforms real data in terms of accuracy and precision in anatomy and pathology classification.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I will make the accept decision after reading the rebuttal.




Author Feedback

We appreciate valuable comments from all reviewers and will consider them in revised manuscript.

R#1 Q1: Lack of generalizability A1: Our experiments are mainly conducted on the MIMIC-CXR and IU X-ray datasets. We will conduct more experiments on other CXR datasets and other medical data to prove its generalizability.

Q2: Potential biases or limitations in the training data A2: In our method, the fine-grained alignment based synthesis module consists of a frozen LLM and VQ-GAN model to synthesize images, which are pre-trained on the CXR dataset. This might limit the synthesis module to the training data. In future work, we will utilize more diverse CXR data to pre-train the synthesis module and avoid the bias or limitations of the training data.

Q3: Scalability and efficiency of our method A3: For scalability, our method includes about 3 billion parameters and can be scaled up by replacing the LLM with a larger model with 7 or 12 billion parameters, making it more powerful in language understanding. For efficiency, it takes about 2.24 seconds for the whole pipeline, indicating its practicability.

R#3 Q1: Conern about novelty A1: There are several different aspects between our method and [1]. 1)The goal of [1] is to achieve fine-grained image-text alignment through cyclic generation, making the fine-grained alignment more accurate and explainable, while our method focuses on generating high-quality CXR images with accurate anatomical and pathological details. 2)Their approaches are different. [1] proposes AdaMatch for patch-word aligment and AdaMatch-cyclic for cyclic generation. Our method innovatively devise anatomy-pathology prompting to automatically generate descriptive and reasonable reports with anatomical structures and pathological conditions, and utilize an image synthesis module to reconstruct images from generated reports that can be replaced with any suitable synthesis model.

Q2: Performance shown in [1] and our paper A2: There is a difference in the input data between [1] and our paper. [1] directly tests on the original CXR dataset by using the real report as input, while ours uses anatomy-pathology prompting to randomly generate reports and applies those to the synthesis module. This difference in input data means the experimental results cannot be directly compared. Our paper’s quantitative results thus provide a more accurate assessment of the image synthesis performance of different methods.

Q3: Reproducibility and cost of GPT4 A3: Due to the significant performance of GPT4 in medical field, we employ it as effective tool in our work to generate medical reports. We acknowledge that it may lead to issues of reproducibility and high cost. To avoid this, we explored the use of other effective and open-source large language models like LLaMA3. Our previous experiments showed that LLaMA3 achieved comparable performance to GPT4. To ensure the reproducibility of our method, we will replace results obtained with LLaMA3.

R#4 Q1: Selection of LLM and VQ-GAN A1: We use the dolly-v2-3b LLM, trained through instruction tuning for high-quality instruction following. Since images are challenging for LLMs, we use VQ-GAN to encode and decode image tokens, as it is an effective and efficient network for learning the image-token mapping, outperforming VQ-VAE. Current model selection is simple yet effective.

Q2: Other usage of our method A2: Our method aims to generate faithful and useful synthetic medical images. For faithfulness, we recruited radiologists to confirm the practicability of synthetic images. For usefulness, the synthetic data can be used for medical AI model training, to augment data and avoid privacy issues. We had used synthetic images to improve performance on downstream tasks like disease diagnosis, COVID-19 prognosis, and report generation.

Q3: Clinical importance A3: Our method can generate medical images with various diseases to support radiology education while avoiding patient privacy concerns.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is a paper with contrasting reviews. After the rebuttal, R3 who raised the main concern in the first round about comparison to ref [1] as well as computational cost of GPT4 raised their rating to Weak Accept. Considering the reviewers’ comments, rebuttal and ranking against other rebuttal papers, this paper is just ranked around the acceptance borderline in my batch of papers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This is a paper with contrasting reviews. After the rebuttal, R3 who raised the main concern in the first round about comparison to ref [1] as well as computational cost of GPT4 raised their rating to Weak Accept. Considering the reviewers’ comments, rebuttal and ranking against other rebuttal papers, this paper is just ranked around the acceptance borderline in my batch of papers.



back to top