Abstract

Nuclei semantic segmentation is a key component for advancing machine learning and deep learning applications in digital pathology. However, most existing segmentation models are trained and tested on high-quality data acquired with expensive equipment, such as whole slide scanners, which are not accessible to most pathologists in developing countries. These pathologists rely on low-resource data acquired with low-precision microscopes, smartphones, or digital cameras, which have different characteristics and challenges than high-resource data. Therefore, there is a gap between the state-of-the-art segmentation models and the real-world needs of low-resource settings. This work aims to bridge this gap by presenting the first fully annotated African multi-organ dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope. We also evaluate state-of-the-art segmentation models, including spectral feature extraction encoder and vision transformer-based models, and stain normalization techniques for color normalization of Hematoxylin and Eosin-stained histopathology slides. Our results provide important insights for future research on nuclei histopathology segmentation with low-resource data.



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2801_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/zerouaoui/AMONUSEG

Link to the Dataset(s)

https://github.com/zerouaoui/AMONUSEG

BibTex

@InProceedings{Zer_AMONuSeg_MICCAI2024,
        author = { Zerouaoui, Hasnae and Oderinde, Gbenga Peter and Lefdali, Rida and Echihabi, Karima and Akpulu, Stephen Peter and Agbon, Nosereme Abel and Musa, Abraham Sunday and Yeganeh, Yousef and Farshad, Azade and Navab, Nassir},
        title = { { AMONuSeg: A Histological Dataset for African Multi-Organ Nuclei Semantic Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors of this paper present AMONuSeg, a new dataset for histopathology nuclei semantic segmentation acquired with a low-precision microscope and evaluate the performance of different state-of-the-art (SOTA) methods on it under different color normalization techniques.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The dataset presented is novel and may have different characteristics and challenges than the currently available datasets since it has been acquired with a low-precision microscope. This dataset might be helpful to make models that generalize better than those trained using the existing high-quality datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The need for this new dataset has not been experimentally proved. The authors’ hypothesis that models trained on existing datasets struggle to “generalize across varied patient populations and pathology species” is not supported by experimental results.

    The comparison with other public datasets is reduced to three, while other larger datasets are missing.

    The evaluation of SOTA methods using the AMONuSeg dataset lacks some relevant methods such as Cellpose (Stringer et al., 2021) and StarDist (Schmidt et al., 2028).

    All results of the evaluated methods yield a very similar Dice score. This can be due to problems in the annotations or a lack of further training to optimize each model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors do not mention the range of hyper-parameters considered nor the method to select the best hyper-parameter configuration.

    A description of the computer infrastructure (hardware and software) used is not provided..

    There is no analysis of situations in which the method failed.

    There is no description of the memory footprint nor an average runtime for each result, or estimated energy cost.

    There is no analysis of statistical significance of reported differences in performance between methods.

    The results are described with the mean but their variation (variance, standard deviation) is not provided..

    The specific evaluation metrics and/or statistics used to report results are correctly referenced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    An effort should be made with regards to reproducibility and evaluation. More specifically, the authors should provide a better description of the range of hyper-parameters considered for every method, the number of training and evaluation runs, validation results, etc. In that sense, I recommend to follow the code of good practices proposed by Dodge et al. (“Show your work: Improved reporting of experimental results”, 2019).

    As mentioned before, the need for the dataset has to be justified with experimental results showing current methods lack generalization on the specific context provided by AMONuSeg.

    In the same direction, two of the most important current tools for cell and nuclei segmentation such as Cellpose (Stringer et al., 2021) and StarDist (Schmidt et al., 2028) should be included in the report and the evaluation.

    A fast search for public datasets for nuclei semantic segmentation on H&E images yielded more results than the three referenced in the paper (MoNuSeg, CryoNuSeg and TNBC). The exclusion criteria of other datasets should be clarified. See for instance Hou, L., Gupta, R., Van Arnam, J.S. et al. Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types. Sci Data 7, 185 (2020). https://doi.org/10.1038/s41597-020-0528-1

    Evaluation results based on Dice score should be provided with their variance or standard deviation. In any case, it seems clear from Table 2 that all methods perform similarly with the current training configuration and under no matter what preprocessing. This fact should be further analyzed and explained. In that sense, the conclusion stating that “the segmentation of nuclei may not be improved due to the challenge of the small size of nuclei of H&E-stained histopathology images” is not supported by any experimental evidence and should be removed or better justified.

    Minor comments:

    • Revise the text for typos.
    • The “Dice” score should be written with capital D.
    • Table 1 is missing the number of nuclei per dataset, a crucial information.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The need for the proposed dataset should be supported by experiments. The comparison with the state of the art needs to be greatly improved both in terms of the dataset and the methods.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    The justifications given for the lack of comparison of SOTA methods with Cellpose and StarDist in the presented dataset are not convincing. Cellpose and StarDist have been compared before but not in such dataset.

    Standard deviation values can be provided even when cross-validation is performed, and other segmentation metrics can also be used. There is no justification for their abscence.



Review #2

  • Please describe the contribution of the paper

    This manuscript presents a new (public) histopathology data set and validates a range of the state-of-the-art pre-processing and segmentation methods on this data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • New data set, with several state-of-the-art methods validated on it

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • Lack of novelty • Added value of the new data set is not demonstrated • Structure of the manuscript, presentation and language need improvement

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors present their data sets as public, but provide no further information or links on this regard. Detail on implementation of the stain normalization techniques are also not provided

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. This manuscript needs an additional language check.

    2. The authors failed, in my opinion, to demonstrate the added value of this new data set. In the Introduction they mention “generalization across varied patient populations and pathology species”, but I do not see how this was improved using this new data set. Moreover, in the first sentence the authors claim this data set to be the “first public fully annotated … dataset”, which is not accurate as further on the same page they list three other similar histopathology data sets. It looks like the authors meant “the first … African dataset”. Finally, comparing the magnification factors reported in the Table 1, I conclude that that (hence, the resolution) of the presented data is more than 6 times higher than the magnification of the other three public data sets; which contradicts the claim about the “low-resource”.

    3. The authors did not provide any pointers to the data set itself, and no information on that regard. I guess they are planning to do it after the (eventual) acceptance.

    4. I did not understand how the unsupervised segmentation with Fiji contributes to the annotation process, as the phrase about the “first guide” is too vague. The illustration presented in Figure 1 also does not add to understanding, as for me it is unclear what exactly message this image is intended to convey. It is also unclear why on step #3 the annotations of “annotator 1” were validated, while the previous step was executed by two annotators. Finally, on the next step, the validation was performed by the “three expert pathologists”; I presume two of them were the “annotators” performing step #2, but this part is confusing.

    5. The structure of this manuscript can be improved. In particular, large part of the Section 3.2 would better fit the Introduction section.

    6. Please define the “TCGA” abbreviation present in Table 1.

    7. Please use the first capital letter in the “Dice score”.

    8. Page 4: “5 μ” should be “5 μm” I guess

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Very straightforward analysis. The new data set might be of certain value, which, unfortunately, was not demonstrated in this manuscript.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This study introduces the first annotated African multi-organ dataset for histopathology nuclei segmentation using low-precision microscopes, evaluates advanced models, and explores color normalization techniques to improve segmentation in low-resource settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The Interesting points to be noted from this work are,

    1. The diversity of dataset adds value as not many samples are acquired from African continent in low resource setting environment.

    2. 250x Magnitication can produce high-quality and good delineated boundaries of cells and can help to identify the cell pattern in a fine-grained microscopic scale. This type of datasets can aid the furtherline of research in helping models learn much rich feature boundaries to segment.
    3. The tissue sectioning (the fixation, dehydration, and infiltration) process seems fine.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This reviewer is not convinced with the study on various stain normalization. This reviewer is unable to map the relation between benchmarking SOTA models with multiple stain normalization techniques. From Table 2, it is quite clear that, with and without normalziation the orginal dataset gets dice score more than 80% on all the models. Why use normalization?

    Also, Macenko has $\alpha$ and $\beta$ parameters which do influence the intensity of slide and eventually reflect on the performance. If standard parameters are used for your study where exactly the performance drop in seen even with the application of normalization methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I believe this is a dataset paper and thus no more comments(Some benchmarking is done on few SOTA methods but, the details are elucidated in the paper)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would be great if you answer the above questions regarding why normalization methods are dropping in their performance. Also, if the authors find some notable details from dataset, it would be great to mention (Example: The no of cells extracted using Automatic Annotations and Manual Annotation), although its visible clearly, it would be great to note the detailed cell statistics.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The normalization analysis is not convincing. As the dataset seems to have diversity and good scope due its expert annotations, a further statistical analysis on the images is to be done for betterment of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their constructive feedback. We are delighted that the reviewers found the proposed dataset novel and recognized its diversity value by offering samples using African tissue slides acquired in a low-resource setting environment. Dataset Value (R1, R3, R4): Since the annotated dataset is our major contribution, we acknowledge the reviewers concerns and will provide additional justification. Although the images were acquired using the 250X MF, it is important to note that by low-resource, we refer to the use of a digital microscope rather than a whole slide scanner. This makes our dataset the first AMONuSeg dataset, aiding the research community in adapting their models by studying the domain distribution with diverse population. We understand the importance of including the number of nuclei and will update our table accordingly. We chose to compare our dataset with the reported three public datasets due to similarity in terms of sample numbers. Additionally, the phrase ‘generalization across varied patient populations and pathology species’ will be modified by deleting the pathology species part. Data and Code (R1, R3, R4): The annotated dataset and modified Y-Net model will be made publicly available with ethical approval. Code for the methods we assessed is on the original papers GitHub repositories to support reproducibility and transparency.
Annotation process (R3): We ensured that the annotation process was rigorous, as our main goal is to provide a well-annotated dataset that can benefit the research community. Two annotators were involved in this process: A data scientist (A1), and a postdoctoral researcher (A2) trained by an expert pathologist for nuclei cells annotations. Initially, the two annotators used the automatic solution FIJI in the software ImageJ to generate preliminary annotations of the tissue slides as a first step, providing a rough estimate of the nuclei locations and simplifying the initial annotation process. Due to the inaccuracy of the automatic annotations, manual ones were necessary by both annotators. A2 validated the annotations made by A1. To ensure the highest quality, three expert pathologists reviewed and validated the annotations, and a final validation was conducted by a fourth pathologist prior to the rebuttal to ensure that the annotations met the highest standards. Regarding Fig. 1, its purpose was to visually depict the difference between the automatic and manual annotations. Novelty & Analysis (R1, R3, R4): Our study focused on spectral feature and transformer-based segmentation models using the AMONUSEG dataset, excluding Cellpose and Stardist as they were previously compared [1]. We reported mean values from a three-fold cross-validation to ensure robustness, which is why standard deviations weren’t provided. Our analysis focused on the Dice score for segmentation accuracy, the other used metric can be added in the appendix. We shared hyperparameters in sec. 4.1 and will add more details for reproducibility. We will revise our conclusion to ensure our findings are robust and well-supported. [1] 10.1016/j.cdev.2022.203806 Stain Normalization (R4): Stain normalization is essential for removing staining variations and enhancing segmentation models generalizability in histology images. While the original dataset performs well without normalization, the experiment’s goal is to study the impact of different normalization techniques. We recognize the influence of parameters such as alpha and beta in the Macenko method, emphasizing optimization for optimal results. We propose moving these results to the appendix. This study initiates ongoing research on the effects of non-standardized staining protocols on model performance. Addressing these challenges is key for progress in histology image analysis and improving model adaptability. Minor Comments (R1, R3):We will revise the final version to correct typos and define any missing acronyms, ensuring clarity.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper lacks novelty and the fails to establish the added value of the dataset. It also suffers from structural problems and deficiencies in clarity and language. This is besides what the reviewers pointed out in inadequacies in experimental validation and methodological detail.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper lacks novelty and the fails to establish the added value of the dataset. It also suffers from structural problems and deficiencies in clarity and language. This is besides what the reviewers pointed out in inadequacies in experimental validation and methodological detail.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper addresses the gap between state-of-the-art nuclei segmentation models and the needs of pathologists in developing countries who rely on low-resource data. It introduces the first fully annotated African multi-organ dataset for histopathology nuclei segmentation, acquired with a low-precision microscope. The idea for considering microscopy resolution limitation in low-resource countries is indeed interesting and inspiring. However, it is argued that the claims are not well justified with experimental evidence. Other issues including intricate writing and paper organization, insufficient comparisons to existing datasets and methods, and the unconvincing results on stain normalization, further downgrade the quality of this paper. It appears that the manuscript needs more revision before ready for publication.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper addresses the gap between state-of-the-art nuclei segmentation models and the needs of pathologists in developing countries who rely on low-resource data. It introduces the first fully annotated African multi-organ dataset for histopathology nuclei segmentation, acquired with a low-precision microscope. The idea for considering microscopy resolution limitation in low-resource countries is indeed interesting and inspiring. However, it is argued that the claims are not well justified with experimental evidence. Other issues including intricate writing and paper organization, insufficient comparisons to existing datasets and methods, and the unconvincing results on stain normalization, further downgrade the quality of this paper. It appears that the manuscript needs more revision before ready for publication.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper provides a new African dataset for nuclei semantic segmentation from histology images, and provides a comparison of several existing methods on this dataset.

    The reviewers provide several criticisms regarding the choice of baseline models as well as whether the dataset truly describes a low-resource setting given that the images are actually quite high quality. Nevertheless, I would like to champion this paper: There is a general lack of publically available data from Africa, which makes it incredibly hard to monitor performance of developed algorithms to African data, even when one actually wants to do so. Even if the magnification factor of the images is greater than that found in the western datasets, it remains true that the population is different, and this is crucial to even be able to monitor the fairness of develooped algorithms. Thus, this dataset is a great addition to the MICCAI community from a health equity angle.

    The authors should take the reviewers’ comments into account, and please also make sure to include as much demographic meta-information as possible (ethnicity, gender, age, etc) about the subjects in the dataset; this will aid any further fairness analysis of algorithms using this dataset.This paper provides a new African dataset for nuclei semantic segmentation from histology images, and provides a comparison of several existing methods on this dataset.

    The reviewers provide several criticisms regarding the choice of baseline models as well as whether the dataset truly describes a low-resource setting given that the images are actually quite high quality. Nevertheless, I would like to champion this paper: There is a general lack of publically available data from Africa, which makes it incredibly hard to monitor performance of developed algorithms to African data, even when one actually wants to do so. Even if the magnification factor of the images is greater than that found in the western datasets, it remains true that the population is different, and this is crucial to even be able to monitor the fairness of develooped algorithms. Thus, this dataset is a great addition to the MICCAI community from a health equity angle.

    The authors should take the reviewers’ comments into account, and please also make sure to include as much demographic meta-information as possible (ethnicity, gender, age, etc) about the subjects in the dataset; this will aid any further fairness analysis of algorithms using this dataset.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper provides a new African dataset for nuclei semantic segmentation from histology images, and provides a comparison of several existing methods on this dataset.

    The reviewers provide several criticisms regarding the choice of baseline models as well as whether the dataset truly describes a low-resource setting given that the images are actually quite high quality. Nevertheless, I would like to champion this paper: There is a general lack of publically available data from Africa, which makes it incredibly hard to monitor performance of developed algorithms to African data, even when one actually wants to do so. Even if the magnification factor of the images is greater than that found in the western datasets, it remains true that the population is different, and this is crucial to even be able to monitor the fairness of develooped algorithms. Thus, this dataset is a great addition to the MICCAI community from a health equity angle.

    The authors should take the reviewers’ comments into account, and please also make sure to include as much demographic meta-information as possible (ethnicity, gender, age, etc) about the subjects in the dataset; this will aid any further fairness analysis of algorithms using this dataset.This paper provides a new African dataset for nuclei semantic segmentation from histology images, and provides a comparison of several existing methods on this dataset.

    The reviewers provide several criticisms regarding the choice of baseline models as well as whether the dataset truly describes a low-resource setting given that the images are actually quite high quality. Nevertheless, I would like to champion this paper: There is a general lack of publically available data from Africa, which makes it incredibly hard to monitor performance of developed algorithms to African data, even when one actually wants to do so. Even if the magnification factor of the images is greater than that found in the western datasets, it remains true that the population is different, and this is crucial to even be able to monitor the fairness of develooped algorithms. Thus, this dataset is a great addition to the MICCAI community from a health equity angle.

    The authors should take the reviewers’ comments into account, and please also make sure to include as much demographic meta-information as possible (ethnicity, gender, age, etc) about the subjects in the dataset; this will aid any further fairness analysis of algorithms using this dataset.



back to top