Abstract

Detecting abnormalities in medical images poses unique challenges due to differences in feature representations and the intricate relationship between anatomical structures and abnormalities. This is especially evident in mammography, where dense breast tissue can obscure lesions, complicating radiological interpretation. Despite leveraging anatomical and semantic context, existing detection methods struggle to learn effective class-specific features, limiting their applicability across different tasks and imaging modalities. In this work, we introduce Exemplar Med-DETR, a novel multi-modal contrastive detector that enables feature-based detection. It employs cross-attention with inherently derived, intuitive class-specific exemplar features and is trained with an iterative strategy. We achieve state-of-the-art performance across three distinct imaging modalities from four public datasets. On Vietnamese dense breast mammograms, we attain an mAP50 of 0.7 for mass detection and 0.55 for calcifications, yielding an absolute improvement of 16% points from previous state-of-the-art. Additionally, a radiologist-supported evaluation of 100 mammograms from an out-of-distribution Chinese cohort demonstrates a twofold gain in lesion detection performance. For chest X-rays and angiography, we achieve an mAP50 of 0.25 for mass and 0.37 for stenosis detection, improving results by 4% and 7% points, respectively. These results highlight the potential of our approach to advance robust and generalizable detection systems for medical imaging.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2054_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/2054_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

Public datasets: https://vindr.ai/datasets/mammo https://physionet.org/content/vindr-cxr/1.0.0/ https://zenodo.org/records/10390295

BibTex

@InProceedings{BhaShe_Exemplar_MICCAI2025,
        author = { Bhat, Sheethal and Georgescu, Bogdan and Panambur, Adarsh Bhandary and Zinnen, Mathias and Nguyen, Tri-Thien and Mansoor, Awais and Elbarbary, Karim Khalifa and Bayer, Siming and Ghesu, Florin-Cristian and Grbic, Sasa and Maier, Andreas},
        title = { { Exemplar Med-DETR: Toward Generalized and Robust Lesion Detection in Mammogram Images and Beyond } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {207 -- 217}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents EM-DETR, a feature-based detection framework tailored for medical imaging. The method leverages an exemplar generation module to extract class-specific representative embeddings that guide the detection process. The authors further enhance the model with additional feature-level loss functions and a multi-stage iterative training strategy. Extensive experiments across multiple datasets demonstrate the effectiveness and generalizability of the proposed method.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is well-aligned with the specific challenges of lesion detection in medical images, particularly in handling visually subtle abnormalities and reducing false positives caused by confusing anatomical structures. The use of exemplar-based representation and background modeling directly addresses these practical difficulties.
    2. EM-DETR achieves state-of-the-art performance on multiple datasets and imaging modalities, including mammography, chest X-rays, and angiography, with consistent improvements over baselines and strong generalization to out-of-distribution data.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. There is a mismatch between the task-specific framing in the introduction and the general-purpose application of the method in later sections. The authors should clarify whether the method is intended for mammography specifically, or for general lesion detection in medical images. If the latter, the title and introduction should be revised accordingly.
    2. The paper does not clearly explain how the regions corresponding to each class—used to extract Xk and Pk—are obtained during training. Is this based on ground truth bounding boxes? Moreover, it remains unclear how such class-specific regions are defined during inference, where ground truth is not available. A clearer description or illustration of how Xk and Pk are derived in both training and inference would significantly improve the clarity and reproducibility of the method.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall recommendation is based on the clarity of key methodological components and the inconsistency between the stated task focus and the general-purpose evaluation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The paper would benefit from improved logical flow and clearer transitions between sections, as the current structure at times obscures the core ideas and makes the method harder to understand.



Review #2

  • Please describe the contribution of the paper

    The authors focus on the relevant topic of the lesion detection in mammography. In particular, the authors target the analysis of the mammaography of dense breasts. The proposed method relies on a transformed-based deep learning network with an integrated memory bank. The algorithm is trained using multiple stage strategy allowing to improve the detection performances of both, calcifications and masses. The improvement is demonstrated on the different datasets (VinDR and CMMD) using mAP metric.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The targeted topic is clinically relevant, and the proposed method fits in the state of the art. The proposed method appears novel with its combination of the Transformer and memory bank, and a engineered multi-stage approach. Overall, the presentation is clear, allowing the reader to understand the proposed method. Finally, annotating the CMMD dataset is also a valuable piece of work.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper lacks some details for the good understanding of the contribution. That is, the authors propose a method for detection of masses and calcifications in mammograms. This is a challenging task requiring particular attention to the spatial resolution and data distribution. However, the authors reference the work of Panambur et al [22] for the pre-processing technique, and this work uses mammography of 224x224, which is substantially slow to detect such small findings as clusters of malignant calcifications. Hence, in that scope, the presented results are questionnable. Similarly, the paper does not have sufficient details about the breast density distribution in the dataset used. This makes more difficult to understand how the proposed method delas with the dense breasts.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I would like the authors do develop some of the aspects of the method and the experimental setup. - could the authors better describe the pre-processing technique applied on the mammography images? Additionnaly, the Fig. 2b. depicts the MLO image in a distorted way. Is it an issue of presentation, or the pre-processing? - The authors highlight the improved performances in the dense breasts. Could the authors provide more details on the composition of the datasets? Moreover, could the authors indicate how the CMMD samples wer selected? - On Fig 4.a is shown the TSNE representation of the CMMD dataset detections, that appears perfect (no masses) However, in the supplementary material the results are not all perfect and some False Positives and False Negatives are present. Could the authors comment? - On a minor note, I suggest the authors to perform proof-reading to reduce repetitive sentences and training typos.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper appear appealing conference material, however the lack of methodological and experimental details may leave a knowledgeble reader puzzled. I would suggest the authors to provide more details to make the paper clearer.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I would like to thank the authors for the rebuttal.

    The provided details cover some of the apsects of the reviews and make some parts clearer. The overall presentation of the experiments still appears a bit confusing: in particular, I’m puzzled about the Pre-processing technique that may prevent from segmenting the calcifications. However, I believe the paper may be accepted for the worthy material.



Review #3

  • Please describe the contribution of the paper

    On the methodological side, the paper proposes a novel modification of Grounding DINO [14]. Instead of “language guided query selection”, Med-DETR utilizes learnable class-wise tokens as class-specific examples to guide detection. They also introduce a 3-stage training with latter stages focusing on distinguishing findings from background and false positives in anatomically similar area. On the application side, the papers contribute comprehensive evaluation on the proposed method on mammography (VinDR-Mammo and CMMD), chest X-ray (VinDR-CXR), and angiography (ARCADE).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel formulation: The paper proposes a simple but effective modification to the Grounding DINO framework. This is an intuitive and easily implementable enhancement tailored for the structure of medical images.
    2. Comprehensive evaluation: The experimental section is a standout, with strong comparisons across four datasets and three modalities (mammography, chest X-ray, angiography). The authors benchmark against strong baselines and demonstrate state-of-the-art performance.
    3. Good ablation studies: The ablations are informative and well-presented.
    4. Clarity and structure: The paper is well-written, logically organized, and easy to follow. Figures and tables are effective in communicating both methodology and results.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Reproducibility concerns: The paper lacks certain important implementation details. In particular, the construction of the text prompts for the text encoder is not described. Further, I am confused by the training procedure of stage II-III in the context of usual detector training. Are we synthesizing inputs by mixing normal background and abnormal patches from anatomically similar regions?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good methodological contribution, great evaluation. Will change my rating to 5/6 if reproducibility concern is addressed in rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I maintain my rating from first round.




Author Feedback

We thank the reviewers for their constructive feedback and for highlighting the novelty of our proposed method for example-based medical lesion detection. Our approach introduces a novel multi-modal contrastive detector that enables feature-level lesion detection. The submission was deemed intuitive, well-structured (R1,R2,R3) and clearly presented (R3). The experiments were considered “standout”, with strong comparisons across 4 datasets and 3 modalities (R1,R2), achieving SOTA results. The method was also recognized for its clinical relevance (R3) and alignment with specific challenges in lesion detection (R1). Below, we address their comments, which further strengthen our study:

Unclear scope of the paper:

1.(R1) While developed for mammography, the method shows potential in other 2D medical imaging domains. We will clarify that the primary focus remains mammography and present cross-domain results as preliminary evidence. General-purpose validation will be the focus of future work.

Lack of clarity in methodology:

1.(R1) Inference workflow: Xk and Pk are derived from the ground truth box of respective classes [Exemplar Generation,Sec. 2, Fig. 1]. During inference, Xk and Pk are not available, and loss is not computed. Predictions are solely based on X’, P’, stored ek, and tk (from the text prompts). We will mark the inference path with dotted lines in overview Fig. 1 and clarify both points upon acceptance.

2.(R2) Text prompt construction: Overview Fig. 1 and Exemplar Generation, Sec. 2 indicate that text prompts are literal class names (e.g. “mass” ,”stenosis”, “background”), based on ground truth or generated boxes, similar to grounding DINO [14]. We will explicitly state this.

3.(R2) Clarity in stagewise training: In Stage II, background annotations depend on data availability: for mammograms and CXR (~50% normal in training), we sample 8 random boxes from normal images. Additionally, in CXR, we randomly sample from lesion locations observed in the training data to learn anatomical priors. In contrast, as the stenosis dataset lacks normal images, backgrounds are randomly selected from outside annotated regions. In Stage III, the stage II model evaluates training data and top 8 false positives are selected as background. In both stages, the generated annotations (offline, one-time) progressively improve contrastive learning between confusing regions [Fig. 4b]. This is described briefly [Iterative training strategy, Sec. 2], and will be revised to increase clarity.

4.(R3) Mismatch between t-SNE and supplementary examples: t-SNE plots of memory bank embeddings (ek1/ek0) show clear separation of lesion (m) vs. background (b). Dense tissue regions may be misclassified at inference if their similarity to ek1 (m) exceeds that to ek0 (b). Test embeddings from out-of-distribution CMMD predictions (supplement) were not used in Fig. 4a. We will clarify in text and caption.

Incomplete evaluation details:

1.(R3) Dataset clarification: We use East Asian mammography datasets (VinDR, CMMD) with predominantly high breast density. VinDR [21] contains 76.5% C and 13.5% D density cases [Data description, Sec. 3]. CMMD [6] is a Chinese dataset with 3,728 samples and no density labels, from a population known to exhibit high breast density. We will cite additional medical literature.

2.(R3) We will update that 100 CMMD images were randomly selected for clinical evaluation.

3.(R3) Data preprocessing at full resolution only includes cropping of excess background region outside breast area, without downscaling. During training, we apply default MMdetection [5] multi-scale random resizing augmentation (widths 480–800 px, fixed height 1333, aspect ratio preserved). Fig(s) resized for presentation only. We will clarify in Experimental details and caption.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces Exemplar Med-DETR, a novel exemplar-based detection framework for lesion localization in mammography. The reviewers found the methodological contribution compelling, particularly its intuitive modification of Grounding DINO through learnable class-wise tokens and a multi-stage training strategy. The method demonstrated strong performance across multiple datasets and imaging modalities. R1 raised concerns about the clarity of methodological details—especially regarding inference, training procedures, and the framing of the paper’s scope. These issues were thoroughly addressed in the rebuttal, with clarifications provided on exemplar generation, dataset characteristics, and the preprocessing pipeline.

    Although minor concerns remain, the paper’s overall contribution, novelty, and practical relevance make it a valuable addition to the MICCAI community. I recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top