Abstract

The burgeoning integration of 3D medical imaging into healthcare has led to a substantial increase in the workload of medical professionals. To assist clinicians in their diagnostic processes and alleviate their workload, the development of a robust system for retrieving similar case studies presents a viable solution. While the concept holds great promise, the field of 3D medical text-image retrieval is currently limited by the absence of robust evaluation benchmarks and curated datasets. To remedy this, our study presents a groundbreaking dataset, {BIMCV-R}, which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports. Expanding upon the foundational work of our dataset, we craft a retrieval strategy, MedFinder. This approach employs a dual-stream network architecture, harnessing the potential of large language models to advance the field of medical image retrieval beyond existing text-image retrieval solutions. It marks our preliminary step towards developing a system capable of facilitating text-to-image, image-to-text, and keyword-based retrieval tasks. Our project is available at \url{https://huggingface.co/datasets/cyd0806/BIMCV-R}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1194_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1194_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

https://huggingface.co/datasets/cyd0806/BIMCV-R

BibTex

@InProceedings{Che_BIMCVR_MICCAI2024,
        author = { Chen, Yinda and Liu, Che and Liu, Xiaoyu and Arcucci, Rossella and Xiong, Zhiwei},
        title = { { BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a publicly accessible English dataset that pairs 3D CT images with corresponding radiological reports, covering 96 disease types.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The BIMCV-R dataset’s size and scope make it a comprehensive datasets available for 3D medical imaging. The collaboration with clinicians and the focus on practical diagnostic use cases increase the real-world applicability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the paper introduces MedFinder, there might be insufficient empirical validation or comparison with other existing methods. The author also need to discuss the differences with other vision-language model in the field of medical image.
    2. The dataset primarily sourced from one repository might limit the diversity in terms of demographic and pathological varieties, potentially introducing bias.
    3. Heavy reliance on large language models may introduce dependencies on specific model architectures and their inherent biases, which might affect the generalizability of the findings. The author should add discussion about this kind of influence to the study.
    4. Similarity Matching Loss is similar with the LV loss in work [1]. The author should clarify the differences. [1] Li, Z., Li, Y., Li, Q., Wang, P., Guo, D., Lu, L., … & Hong, Q. (2023). Lvit: language meets vision transformer in medical image segmentation. IEEE transactions on medical imaging.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authod need to provide more information about the dataset such as the distribution of organs.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The reproducibility and the clinical significance of proposed dataset.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the rebuttal. My key concerns have been addressed. But the author should still further elaborate on the construction process of the dataset, especially how to avoid the introduction of bias.



Review #2

  • Please describe the contribution of the paper

    The paper introduces BIMCV-R for 3D medical imaging diagnostics, which includes 8,069 3D CT volumes paired with their corresponding radiological reports. The dataset is meticulously curated to cover 96 disease types, with radiological reports anonymized and translated into English using GPT-4 for consistency and privacy concerns. The authors then developed MedFinder, a dual-stream network architecture that integrates a text encoder (BiomedCLIP) and a visual encoder to extract features from medical texts and images, respectively. The authors conducted experiments verify its effectivenesses.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is easy-to-read and follow, and the experiments are thorough.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The methodological depth is not fully explored; the paper could have provided more insight into the choice of hyperparameters and the rationale behind specific design decisions in the MedFinder architecture, for instance, how to set $\alpha$ in Eq.8. Besides, there should more discussions on the ablation study. Latly, it may be suggested to provide a limitation sections and set clearer expectations for subsequent studies in the field.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see weaknesses part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The dataset would benefit and inpsires more works on this field.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns. I am leaning to accept it.



Review #3

  • Please describe the contribution of the paper

    This study presents a groundbreaking dataset, BIMCV-R1, which includes an extensive collection of 8,069 3D CT volumes, encompassing over 2 million slices, paired with their respective radiological reports.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work curates the first publicly accessible English 3D text-image CT dataset BIMCV-R, inclusive of authentic radiological reports and detailed diseasetype diagnoses.
    2. They introduce MedFinder, an exhaustive suite of medical retrieval schemes, including innovative approaches for text-image, image-text, and keywordimage retrieval, a pioneering effort on a real-world dataset.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Does the work conduct the any quality control for the 3D text-image CT dataset?
    2. In experiment, it is better to explain the parameters setting in the MedFinder and conduct the sensitivity analysis.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work introduce a collection of 8,069 3D CT volumes and their corresponding radiological reports to offer a valuable resource for future researches.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thank you to all the reviewers and chairs for your comments. Our responses are as follows: R3 & R6: Explain the rationale behind hyperparameter choices and model design decisions. We selected hyperparameter $\alpha=1.5$ on the validation set, as it balanced the MSE loss constraining latent space similarity and the similarity loss affecting matching. Further tuning could improve results. R3 & R6: Discuss limitations, set clearer expectations for future research. Our approach has limitations, such as information loss from cropping or downsampling 3D volumes and restricted input text tokens. Future work could employ advanced techniques like GPT4-o for processing large 3D data and leverage updated language models supporting longer inputs. BIMCV-R has many applications beyond retrieval, including long-tail classification, image-text pre-training, and grounding. R4 & R6: Clarify if the dataset underwent quality control, provide more dataset details. To ensure the quality of the BIMCV-R dataset, we implemented a rigorous multi-step quality control process. For the textual data, we engaged professional translators, followed by GPT-4 refinement and simplification. Medical experts then provided diagnoses, identifying 96 distinct disease types with a long-tailed distribution. We removed excessively long or short diagnoses and will provide more details in the open-source files. Regarding image quality, we conducted manual screening and applied deep learning-based super-resolution and denoising techniques. R6: Conduct more empirical validation and comparison with existing methods, and discuss differences. Our work is the first to address 3D medical text-image retrieval. Although many video-text retrieval methods exist, such as the CLIP4clip model we reproduced, there are significant differences between video and medical data:

  1. Video descriptions are often much shorter than medical reports and contain fewer professional terms.
  2. Videos can be input frame-by-frame, while medical images are usually treated as 3D volumes, limiting the application of advanced video retrieval methods due to memory constraints. We attempted to reproduce the SOTA video retrieval method DRL (Wang et al. 2022) using downsampling, but its R@1 score was lower than our baseline 3D-MIR. R6: Discuss potential biases and limitations of single-source dataset. BIMCV is a large-scale database containing diverse data from various ethnicities, age groups, and disease types, despite the long-tailed distribution. To protect patient privacy, we have anonymized the data while ensuring diversity through uniform sampling across ethnicities. We acknowledge the limitation of homogeneous imaging equipment in BIMCV, which relies on the community’s collective efforts to address comprehensively. Nevertheless, we believe BIMCV-R is a valuable resource for advancing 3D medical image retrieval and analysis research. R6: Discuss the potential influence of large language model dependencies and biases. In our work, we used biomedclip, the most advanced medical text encoder available at the time. Our ablation experiments demonstrated the superiority of biomedclip over clip. Although newer models like llama-med have recently emerged, deploying such large-scale models with billions of parameters remains impractical. We acknowledge that biases in large language models are inevitable, and we eagerly anticipate the development of more specialized language models for the medical domain in the future. R6: Clarify differences between similarity matching loss and existing methods. We will cite the relevant work in the next version. However, they differ in several key aspects: 1) purpose (semi-supervised segmentation vs. supervised retrieval), 2) calculation (mask vs. feature vector similarity), 3) supervision signal (organ shape/location vs. positive/negative pair similarity), and 4) application domain. Despite some high-level resemblance, these differences in design and usage distinguish our Lsim.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Based on the reviewer’s comment, I think that we can accept this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Based on the reviewer’s comment, I think that we can accept this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While al three reviewers see the value in the creation of the dataset provided it is made open access, it is not clear if this is the primary contribution of the paper as there is no separate category for dataset contribution in MICCAI similar to other conferences. The papers lacks sufficient details on the dataset itself e.g. which body part, how many diseases are covered, how many findings, etc. Without their cataloging, the dataset itself may not be a valuable contribution

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    While al three reviewers see the value in the creation of the dataset provided it is made open access, it is not clear if this is the primary contribution of the paper as there is no separate category for dataset contribution in MICCAI similar to other conferences. The papers lacks sufficient details on the dataset itself e.g. which body part, how many diseases are covered, how many findings, etc. Without their cataloging, the dataset itself may not be a valuable contribution



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I agree with Meta-reviewer 1. Overall, the paper presents a high-quality study and the dataset will be very useful. The concerns from reviewers (about bias) could potentially be a good research topic for others using the dataset.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I agree with Meta-reviewer 1. Overall, the paper presents a high-quality study and the dataset will be very useful. The concerns from reviewers (about bias) could potentially be a good research topic for others using the dataset.



back to top