Abstract

Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capabilities of the Segment Anything Model (SAM) with domain-specific knowledge for accurate and robust ovarian tumor segmentation. MBA-Net employs a hybrid encoder architecture, where the encoder consists of a prior branch, which inherits the SAM encoder to capture robust segmentation priors, and a domain branch, specifically designed to extract domain-specific features. The bidirectional flow of information between the two branches is facilitated by the robust feature injection network (RFIN) and the domain knowledge integration network (DKIN), enabling MBA-Net to leverage the complementary strengths of both branches. We extensively evaluate MBA-Net on the public multi-modality ovarian tumor ultrasound dataset and the in-house multi-site ovarian tumor MRI dataset. Our proposed method consistently outperforms state-of-the-art segmentation approaches. Moreover, MBA-Net demonstrates superior generalization capability across different imaging modalities and clinical sites.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1538_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1538_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Gao_MBANet_MICCAI2024,
        author = { Gao, Yifan and Xia, Wei and Wang, Wenkui and Gao, Xin},
        title = { { MBA-Net: SAM-driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work adapts the Segment Anything Model (SAM) for ovarian tumor segmentation through the addition of a CNN module that communicates bidirectionally with SAM. The authors validate their work on an open source dataset as well as an in-house dataset, as well as providing a cross-modality and cross-site analysis. They show numerically better performance in almost all cases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work appears to be the first application of the Segment Anything Model to ovarian tumor segmentation. The authors are detailed about their modifications to the vanilla SAM model. During validation, the authors compare their model to other common/state-of-the-art segmentation models, and achieve numerically better performance in most cases. This result holds both for an open-source dataset, as well as in in-house dataset. The authors conduct an ablation study to verify the efficacy of their proposed changes. The figures are clear and useful. There are some categories in the cross-modality and the cross-site analysis where there is a large margin of improvement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Without describing the limitations of existing state-of-the-art segmentation models (which perform similarly to the proposed method in most cases), the paper lacks strong clinical motivation. The authors did not perform statistical testing to show that the improvement of MBA-Net significantly outperformed the other methods. The authors did not compare their work to a vanilla SAM, and the ablation study does not include a variant that is equivalent to vanilla SAM. Additionally, the authors did not discuss a medical variant of SAM (MedSAM)[1]. MedSAM was published as a pre-print on Arxiv prior to publication in Nature Communications, and v2 (July 2023) references the open-source repository.

    [1] Ma, J., He, Y., Li, F. et al. Segment anything in medical images. Nat Commun 15, 654 (2024). https://doi.org/10.1038/s41467-024-44824-z

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    To aid in reproducibility, the authors could add a reference to their code repository. Implementation details, such as the programming language/framework is missing. There is missing information regarding the in-house dataset. The authors could include Supplementary Material to shed additional clarity on their methods, and aid reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please justify why the authors did not compare their technique to SAM and MedSAM.
    2. Please conduct statistical testing between the MBA-Net results and the results of the other SOTA models.
    3. In Figure 2, consider showing >1 case to lessen effect of cherrypicking.
    4. Please provide additional details about the in-house dataset. What is the breakdown of the different tumor types, and per center?
    5. To improve clarity, it would be helpful to include the breakdown of the tumor types in the public dataset as well (even though it is publicly available)
    6. In the cross-site analysis, the model was trained on data from one center and evaluated on the others. What was the actual split in terms of number of images in the training and testing sets? Same comment applies to the cross-modality analysis of the ultrasound dataset.
    7. Please elaborate on the results in addition to presenting them. For example, in the cross-modality experiment, all models are trained on data from 1 out of 5 centers. The relatively poorer performance of the other SOTA models compared to MBA-Net may simply be due to a smaller training size, since the SOTA models are trained from random weights, whereas MBA-Net uses pretrained weights from SAM. Therefore, MBA-Net would be more resilient to a smaller training set, and why comparison to vanilla SAM or MedSAM is crucial.
    8. Some of the technical details could be put in a Supplementary Material, and therefore there would be more room for discussion.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the authors are the first to apply SAM to ovarian tumor segmentation, there is limited clinical motivation as to why existing techniques are insufficient. There are important details missing from the methods section, specifically with regard to descriptions of the data and how it is used. They did not compare their approach to either vanilla SAM or MedSAM. The authors provide a detailed description of their model, but this is at the expense of providing a discussion section. The experimental results are merely presented, but not interpreted in detail.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    This reviewer has raised their score following a strong rebuttal. The results are impressive, especially with the inclusion of statistical testing.

    However, clinical motivation remains a weakness of the work. It is not clear what manual segmentations are currently used for, and how a (small in most cases) increase in automatic segmentation accuracy will benefit ovarian patients patients. For example, ultrasound is primarily used in diagnosis of ovarian cancer. At this stage, it is unknown whether the patient actually has a malignant mass, and therefore this represents a classification problem rather than a segmentation problem. For both the MRI and ultrasound analyses, the authors did not assess model performance on scans of patients without cancer. It is possible that MBA-Net is more likely than the other models to “hallucinate” tumors in these cases, since it is based on a foundation model. In the rebuttal, the authors do say that their work may enable radiomic analysis, but did not provide citations to existing work. Additionally, radiomics is losing ground to deep-learning methods, which are generally more powerful models that do not require segmentations [1].

    The reviewer also feels that a more detailed discussion about MedSAM should be included in the paper. Although it is true that MedSAM requires an input prompt, there are ways to automate this. For example, one could use an automatic box prompt, which the authors themselves use. Alternatively, the prompt could be created from the segmentations predicted by another model (such as U-NET). Additionally, it is unclear why the authors are using SAM as the backbone rather than MedSAM (which is also open source and has public weights), as one would think MedSAM is more suited for medical image segmentation.

    [1] Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018 Aug;18(8):500-510. doi: 10.1038/s41568-018-0016-5. PMID: 2977717



Review #2

  • Please describe the contribution of the paper

    The paper proposes a segmentation architecture that combines the Segment Anything Mode with domain specific knowledge for an specific application and demonstrates the architecture for ovarian tumor segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes an architecture that combines two branches, adding domain specific information to the capabilities of the SAM model, to adapt it to an specific segmentation problem. Specific integration networks are designed for the bidirectional contributions among the branches. The model is evaluated on two different modalities, considering multi-centric and multimodal data. An ablation study is presented to demonstrate the contribution of each of the integration networks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The technical description of the architecture is well detailed although some specific design decisions such as the size of the input to the domain branch or the specific points of knowledge injection are not explained. The training and the design on the experiments are not well documented:

    • The ultrasound database is an open source database referenced in the text but the authors don’t provide details in the paper about the final number of images. Data augmentation is applied to the data, but the authors don not explain why it is needed and the increase in numbers due to augmentation.
    • The details on the cross-modality results shown are not provided. The network is trained on standard ultrasound and tested on contrast ultrasound data. The need for this should be stated. This is specially important as the higher improvement of the model for ultrasound is actually obtained in this cross-modality scenario.
    • The MRI dataset lacks details (sequences, scanners, field, etc.). The authors present results on cross-site validation but no details about these experiments are provided. This is specially important as the higher improvement of the model is actually obtained in this setting.

    The authors use the Dice score as the metric for segmentation evaluation. In the case of ultrasound, in figure 2a) it is not clear that the Dice score alone can capture the differences in results. In the standard ultrasound experiments, the proposed architecture beats previous methods by around 2%, and is not clear how significant that is for the clinical problema, as no comment on the diagnostic protocol is provided.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    There is not enough details about the data used in the different. In the case of multi-site evaluation on the MRI dataset, the only information is the number of sites.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper would benefit from providing some more details about certain design choices (e-g size of input image, points of knowledge injection) and specially showing more details about the design of the evaluation experiments: Number of images, separation in to different sets. This is specially important in the cross-modality and the multisite experiments, for which the proposed model gives the best results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main idea of combining SAM with domain specific knowledge is an interesting one and the details of the architecture are well set out, but the justification of the specific domain and the experiments are not justified well enough to justify the need for the development.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have given clear answers to my concerns. Although there is the general concern about how a certain increase in segmentation accuracy really improves clinical outcomes, I think technically the paper is good and interesting for the MICCAI audience. The authors hace provided statistical data confirming the improvement of their model. They have also provided further details on the experiments.



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors introduce a new architecture for segmenting ovarian tumors. The network utilizes a hybrid encoder structure and combines the advanced segmentation abilities of the SAM with domain-specific knowledge using bidirectional feature aggregation

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novality of this paper lies in the hybrid encoder structure, that comprising a prior branch inheriting the SAM encoder to capture strong segmentation priors, and a domain branch tailored to extract domain-specific features.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Inconsistency in terminology usage

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well-written. However, the following changes are suggested to further improve its quality:

    1) What is MBA-Net means? It is suggested to add bit more details of MBA-network in introduction.

    2) Please ensure consistency in terminology usage throughout the text and avoid unnecessary repetition of phrases or concepts.

    3) Enhance text font in Figure 1.

    4) Please provide more description of Figure 2 in results section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the authors introduce a new architecture for segmenting ovarian tumors. Also, the paper is well-written and meets the standards of the conference.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have clearly addressed the concerns I raised earlier.




Author Feedback

We warmly thank the reviewers for their positive/constructive comments. They say that our method is “novel” (R3&R4), “an interesting one” (R5), and “well-written” (R3). Here we address the main points in their reviews. 

  1. Lack of comparison with SAM and MedSAM (R4): SAM and MedSAM are designed for interactive segmentation, requiring user-provided prompts (e.g., points or boxes) to specify regions of interest. In contrast, MBA-Net is a fully automatic end-to-end framework without relying on such localization information. Directly comparing the two would be unfair, as the additional input of ROI locations gives SAM and MedSAM an huge advantage. Moreover, SAM did offer the automatic mode, but performs very poorly on medical images. Our ablation study shows that reducing DKIN and RFIN modules decreases performance, demonstrating our approach’s unique advantages beyond using SAM’s pretrained weights. 
  2. Clinical motivations and contributions (R4): Existing methods face two major challenges: 1) the significant heterogeneity of ovarian tumors across various histological subtypes; and 2) the limited availability of training samples for relatively rare subtypes. These challenges are evident from the suboptimal U-Net performance, especially for high-grade serous (US: 83.16%, cross-modality: 48.85%) and mucinous cyst (MRI: 82.09%, cross-site: 69.75%), which are much lower compared to other subtypes. MBA-Net addresses these limitations by integrating SAM’s robust features with domain knowledge, leading to significant improvements in these challenging subtypes (4% and 20% increase for high-grade serous, and a 3% and 6% increase for mucinous cyst). The results validate our clinical motivation. We will rephrase the corresponding sentences in section 1 to clarify this. 
  3. Dataset descriptions (R4&R5): We apologize for the lack of details regarding the datasets. Following the order in Table 2, the number of cases for the five tumor types in the in-house MRI dataset are 211, 31, 70, 97, and 84, respectively. MRI sequences include T2-weighted images, acquired on 1.5T and 3T scanners (Siemens Avanto 1.5T, GE Twinspeed 3.0T, GE 750W 3.0T, Philip Ingenia 3.0T, etc.) We employed stratified sampling to ensure that the proportion of each tumor type remains consistent. For cross-site experiments, we used data from one center (99 patients) for training and the remaining centers (394 patients) for testing. Details about the public US dataset will be added. We will include these details as two tables in the appendix to ensure clarity. 
  4. Discussion of results (R3&R4&R5): First, MBA-Net achieved a 4% increase in performance for the more challenging subtypes (high-grade serous). Second, by enabling precise segmentation, our method facilitates the development of advanced tools, such as radiomics, which can potentially enhance the management of ovarian cancer. We will discuss the clinical implications of improved accuracy to highlight our method’s practical value, particularly for the cross-modality/site experiments. The superior performance in these settings may be attributed to SAM’s robust features. We will also provide more description of Figure 2. 
  5. Lack of statistical testing (R4): Wilcoxon signed-rank tests comparing MBA-Net with other methods showed statistically significant improvements (p<0.05) in 17/22 settings. In the 2 settings where MBA-Net was not the best, differences were not statistically significant. This will be added to Table 1. 
  6. Tables, Figures, and writing (R3&R4&R5): We will also address the points mentioned by R3 and R4, such as providing implementation details, ensuring consistency in terminology usage, and optimizing figures and tables. If space permits, we will explain some of the design choices as suggested by R5. We have prepared the code repository and will provide a link to the code in the revised version.  We would like to express our gratitude to the reviewers for your valuable feedback and suggestions! :-)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Agree with meta-reviewer 3

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Agree with meta-reviewer 3



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers concurred to accept the paper. The rebuttal gave clear answers to the reviewers’ concerns.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers concurred to accept the paper. The rebuttal gave clear answers to the reviewers’ concerns.



back to top