Abstract

Deep learning has enabled great strides in abdominal multi-organ segmentation, even surpassing junior oncologists on common cases or organs. However, robustness on corner cases and complex organs remains a challenging open problem for clinical adoption. To investigate model robustness, we collected and annotated the RAOS dataset comprising 413 CT scans (~80k 2D images, ~8k 3D organ annotations) from 413 patients each with 17 (female) or 19 (male) labelled organs, manually delineated by oncologists. We grouped scans based on clinical information into 1) diagnosis/radiotherapy (317 volumes), 2) partial excision without the whole organ missing (22 volumes), and 3) excision with the whole organ missing (74 volumes). RAOS provides a potential benchmark for evaluating model robustness including organ hallucination. It also includes some organs that can be very hard to access on public datasets like the rectum, colon, intestine, prostate and seminal vesicles. We benchmarked several state-of-the-art methods in these three clinical groups to evaluate performance and robustness. We also assessed cross-generalization between RAOS and three public datasets. This dataset and comprehensive analysis establish a potential baseline for future robustness research.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1633_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/Luoxd1996/RAOS

Link to the Dataset(s)

https://github.com/Luoxd1996/RAOS

BibTex

@InProceedings{Luo_Rethinking_MICCAI2024,
        author = { Luo, Xiangde and Li, Zihan and Zhang, Shaoting and Liao, Wenjun and Wang, Guotai},
        title = { { Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper contributes a new dataset that identifies a key missing component in the existing datasets for medical imaging like BTCV, MSD and abdomenCt-1k. The paper verifies that the performance of algorithms with shifts to the image distribution (surgery with or without missing or partial organs) drops significantly and it has not been studied due to lack of such dataset. Hence, they collect and annotate a dataset that precisely allows them to study the differences.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is very well written (with motivation and addressing the problem at hand)
    • The dataset description is spot on
    • The dataset has been benchmarked on almost all publicly available segmentation models, thus ensuring a good set of results for comparison in the future
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The word robustness has been used throughout this paper without being defined explicitly. It has a lot of meaning, for example, when doing lung segmentation, a collapsed lung can be a out of domain instance and as the model has not accounted for (by data augmentation or otherwise) such an instance during training, it would naturally fail.
    • The paper states that only 23.2% of the images consist of main motivation behind this dataset, and does not provide additional information on the subset with respect to CECT/non-CECT nature.
    • The paper states that it adds extra robustness as the delineations were obtained purely by oncologists, but does not give information on the total human hours spent in the dataset curation process.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • The paper does not state specifically if the dataset will be released/made public in the future
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The paper is easy to follow, however, it appears to have been written at the last minute with all the results mentioned together.
    • There are two key things missing from the paper (which atleast for this reviewer, does not have a supplemental:
      1. Representative images showcasing the actually stated limitations with results in SetA, SetB and SetC.
      2. A more concise plan of action for the dataset adoptability: considering the fact that the key motivation of this paper is represented only with 23-24% of the data, there is no specific plan forward to adopt these examples directly in training unless cross-validation or an existing fold is presented.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper does mention a very specific out of domain scenario that can confuse the segmentation models. However, it is already a well known fact that all neural networks learn an underlying distribution based on the training data, and hence, a new out of distribution example will confuse it. This dataset is a valuable addition to the field of medical imaging, however, without more information on actual cases where the network’s struggle can be seen and plans of releasing the dataset for study, this is a difficult rating choice.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors have built an abdominal organ segmentation dataset with clinically challenging cases and annotations for 19 organs, which presents a more clinical and challenging dataset than previous ones, enabling robustness evaluation in clinically challenging scenarios. They established the RAOS benchmark by investigating the performance of state-of-the-art methods on clinically challenging cases and introducing the organ hallucination ratio to measure these methods’ robustness in resected patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The dataset is very valuable.
    2. The experiments are comprehensive.
    3. The implementation details are thorough.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is not mentioned whether the dataset is publicly available, nor is there any information about the release date if it is to be made public.
    2. The incorrect template was used for the paper.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    see weakness

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    no further comments

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well-executed in terms of motivation, experimental design and results, and organizational structure. I recommend a weak accept.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents a novel dataset for abdominal multi-organ segmentation, encompassing various organs and different clinical scenarios, establishing a robust benchmark to evaluate the performance of recent medical segmentation model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Provides highly practical data, with a large volume and consideration of patients at different clinical stages, offering valuable resources for addressing real-world clinical issues.
    2. The dataset is meticulously annotated, covering both common and uncommon organs, enhancing its utility for medical research.
    3. The paper is clear and understandable. Extensive and detailed experiments validate the dataset’s practicality.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    None

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    None

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a highly practical and novel dataset, accompanied by detailed annotations, which can significantly contribute to the advancement of research in medical segmentation. Its provision of extensive resources and thorough validation make it a valuable asset for the medical imaging community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Sincerely thanks to all reviewers and meta-review for their positive and constructive comments. We believe that the constructive feedback will help us improve the quality of the paper and promote further studies on this robustness and generalization segmentation topic.




Meta-Review

Meta-review not available, early accepted paper.



back to top