Abstract

Multiple Instance Learning (MIL) is a powerful weakly supervised learning framework for high-resolution medical images, but its application in mammographic breast cancer (BC) diagnosis overlooks instance interactions and the multi-scale nature of BC lesions. In this work, we propose a novel Feature Pyramid Network (FPN)-MIL model for BC classification and detection in high-resolution mammograms, integrating (1) a FPN-based instance encoder that enables a multi-scale analysis across different receptive-field granularities while operating on single-scale input patches; (2) deep-supervised scale-specific instance aggregators that support conventional attention (AbMIL) or transformer-based (SetTrans) mechanisms; (3) an attention-based multi-scale aggregator that dynamically combines scale-specific features, improving robustness to lesion scale variability. Our experiments show that FPN-MIL is superior to conventional single- and multi-scale patch-based MIL models, with FPN-SetTrans outperforming baselines in calcification classification and detection while FPN-AbMIL performs best for mass classification.



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1992_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/marianamourao-37/Multi-scale-Attention-based-MIL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MouMar_Multiscale_MICCAI2025,
        author = { Mourão, Mariana and Nascimento, Jacinto C. and Santiago, Carlos and Silveira, Margarida},
        title = { { Multi-scale Attention-based Multiple Instance Learning for Breast Cancer Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {368 -- 378}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes to improve two aspects of current MIL applications in mammography (1) overlooking the multi-scale feature of lesions, and (2) lack of instance interaction. To address (1) while maintaining patch input size, authors propose usage of an FPN to extract multi-scale features. Authors thus obtain multi-scale bag representations. Bag loss is applied on each multi-scale bag and an overall bag that aggregates across all scales. For (2), authors simply adopt attention-based modules from previous work in their bag feature aggregation module (for each scale and across scales).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Simple intuitive idea: The use of FPN to derive multi-scale features from fixed-size patches is a clean idea. FPN is common in object detection frameworks; however, adopting it to create multi-scale features for each patch is relatively novel.
    2. Modularity: The proposed framework is modular and compatible with existing MIL models.
    3. Improved performance over single-scale equivalent: The proposed multi-scale MIL approach shows improved classification and detection performance on the VinDr-Mammo dataset compared to simple single-scale MIL baselines.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Unfair/Weak Experimental Setup: The comparisons with single-scale patch MIL (SSP-MIL) baselines are potentially misleading. FPN-MIL operates on higher-resolution patches (512×512), while SSP-MIL is evaluated on 256×256 patches. This difference could significantly affect performance and confound the benefits of multi-scale representation. Further, most experiments use a frozen feature encoder; results may be very different in full-fine tuning situation. For example, the feature encoder is taken from MammoCLIP at trained at low resolution, but FSOD baseline with RetinaNet usually benefits from high resolution training (1300x800 and above).
    2. Lack of Strong Baselines: The paper lacks comparison with stronger MIL-based methods previously applied to mammography, including the cited [23], which the authors themselves criticize but do not compare against directly. A more compelling evaluation would substitute the single-scale MIL module in [23] with the proposed multi-scale aggregator to isolate its value.
    3. Limited Novelty in Broader Context: The methodology is not novel if put in context of object detectors. For example, a previous work [] proposed a multi-view and multi-instance learning detector for mammograms. The existing use of FPN there means the features are extracted at multiple scales, and instance interaction is enabled through attention between proposal boxes in the detection head. The difference is the authors of [] focus on WSOD scenario. [*] Truong Vu, Yen Nhi, et al. “M&m: Tackling false positives in mammography with a multi-view and multi-instance learning sparse detector.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023.
    4. Lack of ablation studies: (1) To show the claimed benefit of multi-scale reasoning, it would have been better to have ablation studies showing performance improvement as different feature scales are added, and how they impact AP on small, medium versus large objects. (2) To show the claimed benefits of instance aggregator, there should have been a baseline where the instance aggregator is some simple MIL as opposed to attention-based MIL. The authors attempt to do this in Table 2 for the multi-scale aggregation by comparing with simply concatenating features (i.e. no feature interaction), yet I think the drop in performance there is less about instance interaction but more about larger feature dimension of concatenation resulting in more difficulty in training.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is intuitive, simple and can be applied to various existing models that adopt MIL. However, paper is weak in experimental results, such as unfair comparison with baseline, lack of comparison with stronger previous work, and lack of ablation studies to prove the main claimed benefits of multi-scale reasoning and instance interaction.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper is of good quality and I’m satisfied with rebuttal.



Review #2

  • Please describe the contribution of the paper

    This study propose a novel model for breast cancer (BC) classification and detection in high-resolution mammograms.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Experiments were conducted on public datasets easing reproducibility and the concrete implementation is clear. 2) Paper is well-written and easy to follow. The description of the method is very detailed.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Novelty of this work is a bit limited for me.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I find this work to have limited novelty because of the lack of original modules. However, I lean towards accepting this work since the method is fairly well benchmarked.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    After reviewing the authors’ rebuttal, they have successfully addressed my concerns regarding novelty. The revisions made are satisfactory, and the paper now meets the necessary standards for acceptance.



Review #3

  • Please describe the contribution of the paper

    This work introduces an embedded FPN-MIL model with multi-scale encoding, scale-specific and attention-based aggregators, achieving state-of-the-art breast cancer classification and localization.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is overall well-structured. The methodology is clearly explained, and the diagram is easy to follow. The authors provide implementation details and describe the data used, which enhances the reproducibility of their study. The separate analysis of masses and calcifications in their metrics is particularly well thought out.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    This paper is of good quality, I don’t see any major weaknesses

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) The ease of understanding of their study due to the clarity and good structure 2) The quality of the analysis and discussion around the calcifications and masses

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers’ feedback and respond to their main concerns. 1.Novelty: While we do not introduce new architectural modules, as noted by R1, our work proposes a novel embedded-based MIL framework that combines known DL methods to address three MIL challenges in MBCD: 1) multi-scale ROIs; 2) instance ambiguity; 3) instance interactions. As R2 acknowledged, our framework is modular and adaptable, leveraging the FPN (seldom used outside object detection) and SetTrans to innovatively tackle MIL challenges 1) and 3) in MBCD, respectively. This modular adaptability also enabled a lesion-specific analysis particularly valued by R3, showing that calcifications and masses benefit from different instance aggregators. Regarding M&M object detection work mentioned by R2, it uses bounding-box supervision to learn region proposals and an auxiliary MIL loss to include unlabeled negative images and perform breast-level classification. In contrast, our framework is fully weakly supervised using only image-level labels.

2.Ablations: While progressive scale ablations were not conducted to study the benefits of multi-scale reasoning as R2 mentioned, our model uses scale-specific instance aggregators that generate heatmaps, enabling qualitative and quantitative assessment of each scale: the small scale captures fine detail crucial for small lesions; the medium scale balances detail and context beneficial for medium lesions; the large scale captures broader patterns effective for large lesions. Due to space constraints, we prioritized comparing FPN- vs. MSP-based instance encoders, corroborating FPN’s improved receptive-field granularity and performance. For instance aggregation, simpler MIL methods (e.g. mean/max-pooling) were not used, as R2 suggested, since they lack interpretability, whereas AbMIL and SetTrans quantify instance relevance to bag classification via localized attention and self-attention, respectively. Regarding the multi-scale aggregator, R2 suggests that concatenating scale-specific bag features underperforms due to increased dimensionality complicating training. However, the attention-based aggregator has more parameters and is arguably harder to train, yet performs better. The reported detection performance drop with the concat approach for specific lesion sizes supports the claim that it dilutes scale-specific discriminative information, as noted in prior MIL works [7, 14, 27] cited in our paper.

3.Baselines: R2 criticizes patch-size discrepancy between our FPN-MIL models and SSP-MIL baselines, besides the comparison to other SOTA MIL models in MBCD. SSP-MIL was also tested with different patch-sizes: 128, 256, 384 and 512. The 512x512 input proved too coarse for SSP-MIL, degrading heatmap quality and lesion detection. The commonly used 256x256 patches offered the best classification and detection trade-off and were thus the only ones we reported. In contrast, our FPN-MIL models use 512x512 inputs to enable a comprehensive multi-scale analysis across different receptive-field granularities. While broader SOTA comparisons are a valuable future research direction, we focused on AbMIL and SetTrans to ensure consistent instance aggregation across models, facilitating comparisons with SSP-MIL and aligning with our broader goal of addressing instance ambiguity and interactions.

4.Experimental Setup: R2 raises concerns about the frozen backbone. Frozen backbones are standard in MIL to reduce the computational cost of processing large-size bags, in our case due to the extracted pixel-level instances at multiple scales. We chose MammoCLIP’s image encoder for its strong vision-language pretraining tailored to MBCD. Importantly, all models (ours and baselines) share this frozen backbone, ensuring consistent benchmarking and fair comparison, as R1 and R3 acknowledged. Future work could explore full-finetuning.

We will consider the reviewers’ constructive feedback in future work. Code will be made available upon acceptance.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper receives an initial review of 1R (R2 with score 3) and 2A (R1 with score 4, R3 with score 5). After rebuttal, all reviewers change to Accept, with R2 stating “The paper is of good quality and I’m satisfied with rebuttal” and R1 noting “they have successfully addressed my concerns regarding novelty.” The main concerns raised were: 1) Limited novelty - both R1 and R2 pointed out the lack of original modules, with R2 noting the methodology is not novel in the context of object detectors and citing prior work using FPN for multi-scale features in mammography, 2) Unfair experimental setup - R2 highlighted that FPN-MIL uses 512×512 patches while baseline SSP-MIL uses 256×256 patches, potentially confounding results, 3) Lack of strong baselines and insufficient ablation studies - R2 criticized the absence of comparison with stronger MIL methods and proper progressive scale ablations. However, the authors’ rebuttal successfully addressed these concerns, arguing their framework’s modularity and adaptability, justifying the patch size differences, and explaining their experimental choices, leading to final acceptance from all reviewers.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top