Abstract

Mammography serves as a vital tool for breast cancer detection, with screening and diagnostic modalities catering to distinct patient populations. However, in resource-constrained settings, screening mammography may not be feasible, necessitating reliance on diagnostic approaches. Recent advances in deep learning have shown promise in automated malignancy prediction, yet existing methodologies often overlook crucial clinical context inherent in diagnostic mammography. In this study, we propose a novel approach to integrate mammograms and clinical history to enhance breast cancer detection accuracy. To achieve our objective, we leverage recent advances in foundational models, where we use \vit for mammograms, and RoBERTa for encoding text based clinical history. Since, current implementations of Vit can not handle large 4Kx4K mammography scans, we device a novel framework to first detect region-of-interests, and then classify using multi-instance-learning strategy, while allowing text embedding from clinical history to attend to the visual regions of interest from the mammograms. Extensive experimentation demonstrates that our model, MMBCD, successfully incorporates contextual information while preserving image resolution and context, leading to superior results over existing methods, and showcasing its potential to significantly improve breast cancer screening practices. We report an (Accuracy, F1) of (0.96,0.82), and (0.95,0.68) on our two in-house test datasets by MMBCD, against (0.91,0.41), and (0.87,0.39) by Lava, and (0.84,0.50), and (0.91,0.27) by CLIP-ViT; both state-of-the-art multi-modal foundational models.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1311_paper.pdf

SharedIt Link: https://rdcu.be/dVZeg

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72378-0_14

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1311_supp.pdf

Link to the Code Repository

https://mammo-iitd-aiims.github.io/MMBCD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Jai_MMBCD_MICCAI2024,
        author = { Jain, Kshitiz and Bansal, Aditya and Rangarajan, Krithika and Arora, Chetan},
        title = { { MMBCD: Multimodal Breast Cancer Detection from Mammograms with Clinical History } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {144 -- 154}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work focuses on tackling breast cancer detection in diagnostic scenarios. It reveals an ROI-based detection network that can extract the region of interest. Then, a fusion network combining both mammograms and text is applied to conduct the fusion. Cross-attention is leveraged as well.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper has good integrity. The writing is easy to follow.
    2. The paper has a broad literature review regarding multi-modality learning that explores mammograms (images) and history (text).
    3. The paper clearly illustrates the intuition and motivation of tackling this problem.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Several statements/assumptions from this paper are highly questionable: 1, “Non-expert doctors do the screening mammography” is highly unlikely anywhere in the world. Most institutions like the US, Canada, the UK, EU, China, Australia, etc, have professional radiologists performing mammography screening tasks. In most cases, they are just different groups compared to those who operate the diagnostic jobs. Claiming them as “non-expert” is very questionable. Also, screening mammograms do come with diagnostic/text reports, though the format may vary from the diagnostic ones. 2, “For mammograms, history is often gathered through a cost-effective questionnaire” is not convincing. Though questionnaires do account for a certain amount of history collecting, the majority of patients’ history are collected through different processes and departments of a patient’s visit to the hospital. 3, “Oversight of screening renders existing deep neural network models unreliable, as they fail to incorporate essential diagnostic features valued by trained radiologists.” The fundamental reason for screening mammograms gathering more focus is that it is crucial and the very first step of a patient’s clinical cycle for breast cancer detection. Due to this, the amount of screening mammograms is often tens to the times of diagnostic ones. Even so, it is hard to see how this situation can lead to the so-called “unreliable,” as claimed by the paper.

    Apart from the above clinical facts, the overall novelties of this work are limited. As for ROI extraction, most state-of-the-art detection models can extract interested regions from mammograms, whether large or small. MIL’s fusion of image and text modalities is a standard pattern in natural and medical images. Cross-attention has been extensively applied for cross-modality fusion tasks.

    Lastly, there needs to be more comparison of mammography-related detection methods, as it is hard to tell the actual performance of this framework.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Pay more attention to the needs of the clinical side and generate a more solid assumption of this task. Try to build more innovations in modality fusion.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Reject — must be rejected due to major flaws (1)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are some fundamental false statements about this problem from the paper: the expertise of the radiologists who perform mammography screening, the origin of histories/text data of patients, and the actual reasons why most current deep learning models focus on tackling screening mammograms. These statements/facts are not consistent with actual clinical situations.

    Besides, the overall approach needs more novelties regarding the ROI-extraction and multimodality fusion. No comparing mammography-related SOTA methods have been compared. The results are not promising.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Reject — must be rejected due to major flaws (1)

  • [Post rebuttal] Please justify your decision

    I raised five major weaknesses during my initial decision. The author addressed the first one, which was due to a confused sentence. I maintain my opinion regarding the other four:

    • Regarding the questionnaire issue, I interviewed seven hospitals from the US, UK, Germany, Japan, and China; each has more than 700 beds. None of their radiology department has a questionnaire collection during diagnosis. Even though the author claimed that it “is routinely collected in our hospital,” I don’t think this is a universally adopted process worldwide. This puts fundamental questions on the applicability of this work.

    • The paper states that current DNNs are unreliable “as they fail to incorporate essential diagnostic features valued by trained radiologists.” Claiming that current DNNs for mammography screening are unreliable as they only incorporate image features is very unconvincing. FDA has approved several AI systems for mammogram screening, which are built with image-only models. I don’t see how they are unreliable if the FDA can offer approval.

    • Regarding the novelty issue, in the rebuttal, the author states, “We did not claim that our basic architecture is novel. However, our technique for integrating clinical history with mammogram data is novel”. If the architecture doesn’t provide enough novelty, I don’t see how the idea of integrating textual and visual features is novel. Applying these well-studied approaches to mammography screening isn’t novel, in my view.

    • For comparison, even though there are no competitive multi-modal SOTA methods, MMBCD should still be compared to existing image-only SOTA mammography screening methods. Otherwise, how can the idea of “incorporating essential diagnostic features makes the DNNs reliable” be validated?

    I consider all the above issues as major flaws, as a result, I maintain my overall decision to this work.



Review #2

  • Please describe the contribution of the paper

    The paper presents a novel approach for breast cancer detection that integrates mammographic imaging with clinical history using multimodal models. The authors propose a framework to handle high-resolution mammograms by focusing on ROIs and classifying these using a multi-instance-learning strategy. The approach also integrates text embeddings from clinical histories to guide visual regions in mammogram analysis. The authors demonstrate the experimental performance gains on two in-house datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Integration textual clinical history with visual mammogram data is novel and the authors illustrate qualities analysis on cross-attention layer results.
    2. The authors proposed a method to maintain and process high-resolution mammogram images without downscaling, potentially preserving important diagnostic details.
    3. The proposed model achieves significant performance gains comparing several state-of-the-art models, which provides a potentially impact on diagnostic practices in mammography.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The model’s performance is only tested on in-house datasets, which may not be representative of generalized. I think it is better that authors can verify their models on benchmark datasets and then apply to the real-world datasets.
    2. In experimental settings, it is not clear that how prompts used during the model’s training and testing phases. It may occur potential label leakage, since authors include “Cancer:{Yes/No}” in the prompts.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Can authors provide ablation studies on text modality, such as compare with medical text encoders, or the impacts of masking different type of keywords?
    2. Did the authods also finetune the ROIs extraction model?
    3. The authors mentioned “Our detection model is initialized with COCO weights”, but in the Tables, the weights are pre-trained on other dataset. It is not clear what is the detection model refer to in the implementaion details.
    4. how long does it take to inference one study?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed methods with clinial history, and experiments on in-house data

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors don’t address my concers and answer my questions. I will keep my ratings.



Review #3

  • Please describe the contribution of the paper

    The paper explores combining the advanced vision model and language model for the clinical breast cancer detection task, which is a timely and interesting topic. The author proposes a novel multi-modal breast cancer detection model that leverages mammograms and clinical history information. Results on two in-house datasets show the surpassing performance of incorporating clinical information.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper proposes novel a multi-model approach leveraging the multi-instance learning and cross-attention module that integrates the ROIs of full-resolution mammograms with textually clinical history effectively to enhance the accuracy of breast cancer detection.
    • The proposed architecture utilizes advanced foundational models, such as FocalNet-DINO for ROI extraction from high-resolution mammograms, ViT for ROI processing, and RoBERTa for textual embeddings of clinical history.
    • Extensive ablation experiments are implemented to demonstrate the improvement of model design and foundational model selection.
    • The design of the method is well presented and several datasets are being used for evaluation. The paper is well-written and easy to follow.
    • The paper releases the source code and going to release in-house dataset2 in the near future.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In the comparison experiment, the proposed model is compared with different nature vision foundational models but lacks the comparison with SOTA breast cancer detection models.
    • Only shown in Fig. S1, but seems that lack the comparison results when ablation of the ROI extractions (with the ACC, F1, and AUC metrics).
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The contribution of the paper is strong and the experimental validation appears extensive.

    • It would be more clear if could provide some samples of the clinical history reports.
    • Figure 3, It would be better to provide the ground truth bounding boxes and corresponding textual information.
    • The dataset is divided based on the acquisition dates rather than patient-specific data. The author should clarify if the dataset contains multiple exams from the same patients.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Combining the advanced vision model and language model for the clinical breast cancer detection task is an important and interesting topic. The author proposes novel a multi-model approach leveraging the multi-instance learning and cross-attention module that integrates the ROIs of full-resolution mammograms with textually clinical history effectively to enhance the accuracy of breast cancer detection. The contribution of the paper is strong and the experimental validation appears extensive.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors seem to focus on addressing other reviewers’ concerns.




Author Feedback

We thank the reviewers for their detailed and valuable feedback. We refer to weakness section as “W” in our rebuttal below.

R1,W1: “Non-expert doctors do the screening mammography” is highly unlikely anywhere in the world: The said phrase does not appear in our paper. In Section 1 (Screening vs. diagnostic mammography), we wrote, “For non-expert readers, the screening mammography…,” in which, we are referring to “readers of our manuscript” without the medical background. We shall rephrase for better clarity.

R1, W2: “history is gathered through a cost-effective questionnaire” is not convincing: The statement is clearly not intended to be generalized to all aspects of medical history taking. The process of eliciting a medical history is an art and no questionnaire can compete with it. However some simple aspects of history- such as whether the patient feels a lump, has a history of discharge etc are within the patients domain to answer, and this is routinely collected in our hospital. This is information for anyone interpreting these studies and holds potential for making deep neural networks reliable.

R1,W3: “Oversight of screening renders existing deep neural network models unreliable”: Our manuscript does not contain the said phrase. In Section 1 (Screening vs. diagnostic mammography), we wrote, “This oversight renders existing deep neural network models unreliable,” referring to the limitation of DNNs that only use mammograms without patient’s clinical history. We did not imply that screening mammography is unreliable.

R1,W4: “overall novelties of this work are limited”: The comment that cross-attention for cross-modality fusion is standard, itself implies that many works use fusion with different details. Similarly, our work uniquely integrates textual clinical history with visual mammogram data, as noted by Reviewers 2 and 3. We did not claim that our basic architecture is novel. However, our technique for integrating clinical history with mammogram data is novel, and outperforms any other detection approach. Additionally, our proposed method allows to exploit computer vision foundational models, which have been used for small-sized natural images so far, on high-resolution medical images. This was acknowledged as innovative by Reviewer 3. We believe our unique and interpretable approach offers valuable contributions to the community.

R1,W5;R3,W1: more comparison of mammography-related detection methods: To the best of our knowledge, no existing literature proposes a multi-modal DNN for mammography with source code released for comparison. We can show results with image-only models in camera ready, but it will not be fair as our model also uses additional clinical history information.

R2,W1: Application to real-world datasets: Our experimental setup is specific to the medical imaging domain and does not make sense for natural images. In natural image settings, typically, if the image description is available, the image class would also be known, making such a problem irrelevant. If the reviewer could suggest a suitable benchmark dataset for the experiment, we would be happy to perform it.

R2,C1: ablation studies on text modality: In our experiments we also trained MedCLIP encoder based on BioClinicalBERT. We found RoBERTa to be a superior encoder. Adhering to the MICCAI guidelines, we do not disclose the numbers here.

R2,C3-5: finetune the ROIs extraction model?: As mentioned in Section 3.1, we annotated a small subset of our training data with bounding boxes to train the detection networks to extract relevant ROIs from the mammograms. All object detection networks mentioned in Table 3 were trained in the same manner.

R2, W2: how prompts are used during training and testing: For our model we do not include “Cancer:{Yes/No}” during the model’s training or testing phase. This is introduced only in the CLIP training paradigm for baseline comparison. This is a standard setting with no label leakage.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a novel approach for breast cancer detection that integrates mammographic imaging with clinical history using multimodal models. Combining the advanced vision model and language model for the clinical breast cancer detection task is an important and interesting topic. Though the rebuttal addressed some concerns, some of concerns raised by reviewers are not clear. Especially, why there is no comparison with vision based SOTA. Due to this, I recommend reject.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a novel approach for breast cancer detection that integrates mammographic imaging with clinical history using multimodal models. Combining the advanced vision model and language model for the clinical breast cancer detection task is an important and interesting topic. Though the rebuttal addressed some concerns, some of concerns raised by reviewers are not clear. Especially, why there is no comparison with vision based SOTA. Due to this, I recommend reject.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper has some technical merits in terms of integrating images and texts (in forms of patient clinical history), that is perhaps it got good scores from the two reviewers. As one reviewer pointed out, this paper has major flaws in clinical scenario descriptions (the first and second paragraphs of the two paper), where it is not valid to contrast screening and diagnostic settings as they have distinct clinical functions. To me the paper is not acceptable due to this flaw. Plus, as reviewers pointed out, it also missed some experiments to better show the added value of the proposed method.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper has some technical merits in terms of integrating images and texts (in forms of patient clinical history), that is perhaps it got good scores from the two reviewers. As one reviewer pointed out, this paper has major flaws in clinical scenario descriptions (the first and second paragraphs of the two paper), where it is not valid to contrast screening and diagnostic settings as they have distinct clinical functions. To me the paper is not acceptable due to this flaw. Plus, as reviewers pointed out, it also missed some experiments to better show the added value of the proposed method.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received mixed reviews and the criticism relates to clarity of the clinical task and missing comparisons. This meta reviewer argues that the paper makes a valuable contribution despite its limitations. In particular, the novelty of the approach to assessing mammography data integrating clinical information was highlighted by reviewers and ACs. The authors should improve the description and add details as requested in the reviews and AC comments. Overall, the paper makes a valuable contribution from a health equity perspective, which this meta reviewer felt outweights the limitations.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received mixed reviews and the criticism relates to clarity of the clinical task and missing comparisons. This meta reviewer argues that the paper makes a valuable contribution despite its limitations. In particular, the novelty of the approach to assessing mammography data integrating clinical information was highlighted by reviewers and ACs. The authors should improve the description and add details as requested in the reviews and AC comments. Overall, the paper makes a valuable contribution from a health equity perspective, which this meta reviewer felt outweights the limitations.



back to top