Abstract

Embryo selection is a critical step in the process of in-vitro fertilisation in which embryologists choose the most viable embryos for transfer into the uterus. In recent years, numerous works have used computer vision to perform embryo selection. However, many of these works have neglected the fact that the embryo is a 3D structure, instead opting to analyse embryo images captured at a single focal plane. In this paper we present a method for the 3D reconstruction of cleavage-stage human embryos. Through a user study, we validate that our reconstructions align with expert assessments. Furthermore, we demonstrate the utility of our approach by generating graph representations that capture biologically relevant features of the embryos. In pilot experiments, we train a graph neural network on these representations and show that it outperforms existing methods in predicting live birth from euploid embryo transfers. Our findings suggest that incorporating 3D reconstruction and graph-based analysis can improve automated embryo selection.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1468_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1468_supp.pdf

Link to the Code Repository

https://github.com/chlohe/embryo-graphs

Link to the Dataset(s)

N/A

BibTex

@InProceedings{He_Embryo_MICCAI2024,
        author = { He, Chloe and Karpavičiūtė, Neringa and Hariharan, Rishabh and Jacques, Céline and Chambost, Jérôme and Malmsten, Jonas and Zaninovic, Nikica and Wouters, Koen and Fréour, Thomas and Hickman, Cristina and Vasconcelos, Francisco},
        title = { { Embryo Graphs: Predicting Human Embryo Viability from 3D Morphology } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method for determining embryo suitability for implantation using IVF treatment. This is achieved by applying a graph-based neural network in conjunction with cell segmentation and 3D reconstruction of blastomeres. The novelty of this paper is established in the introduction of Stack NMS, a method of segmenting cells which assumes overlapping bounding boxes can overlap compared to traditional NMS, and a pipeline for the morphological assessment of cells to effectively utilise a graph neural network for selection of live embryos similar to embryologists currently.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Presents the first known case of using graph neural networks to determine embryo suitability for treatment during IVF
    • Presents a new pipeline for 3D segmentation of cells that is a variation of non-maximal suppression (NMS)
    • Outperforms similar methods that utilise a typical CNN backbone and a logistic regression based on a previously established commercial product KIDScore D3 across the board on general measures of image segmentation such as accuracy, precision, and AUC, and significantly greater than other methods in metrics including F1 and Recall score.
    • Clinically relevant application with clear aims and goals that has greater explainability than prior methods which is inherent to the graph neural net and helps enable greater understanding in future work towards understanding embryo survival chances
    • Able to achieve good results on a relatively small dataset
    • The authors demonstrate good scientific design with limited data, the paper is clearly laid out and explained in a straightforward manner
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors only present their data in the context of an internal dataset. To their credit, this is comprised of a number of different sites from across America, Belgium, and France. However, this makes it difficult to fairly compare the models that were trained outside of this singular context. Due to the nature of the data, this can be extremely difficult to acquire from a clinical standpoint, but the paper makes no mention whether other datasets exist. I would like the authors to make reference to whether other datasets exist and, if they do exist, whether attempts to acquire this data was made. On the contrary, if no other datasets exist then please explain the difficulty in acquiring and releasing such data?
    • The paper presents a pipeline for only a single type of Hoffman modulation contrast (HMC) microscope built into the embryo incubator, could the authors please clarify how widespread these incubators are used within the field? Are they ubiquitous, or would their methodology also work in other incubators with different microscopes with different numbers of focal planes, or would up-/down-sampling be required to work with their method?
    • Many assumptions appear to be made during the 3D cell reconstruction step, how much influence does the user have on the generation of these reconstructions when determining “user-defined intervals” within step 2? Could the authors please comment on how involved this process is for the user and does this impact results significantly?
    • Makes reference to self- and semi-supervised methods, alongside transformer-based models, but only minor comparisons to a basic ResNet50 CNN and a logistic regression are made. Is the use of a logistic regression on the commercially available example (KIDScore D3) a fair comparison when other more advanced examples of basic machine learning exist such as SVMs or Random forests, or even more recent examples such as AdaBoost or XGBoost? Is this the intended method for determining suitability given this software? I would like the authors to clarify whether any other machine learning comparisons were made versus other machine learning models or a transformer based model such as ViT.
    • I would appreciate clarification if the embryologists who rated the reconstruction were the same as the ones who annotated them? (Page 6)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors have promised to make code available on acceptance but make no mention of releasing the associated private embryo dataset. This would make these results impossible to replicate, but would enable future researchers to apply their methodology to future embryos. I encourage the authors to make their dataset available as well if possible to allow for future algorithms to compare effectively but it is understandable if there are significant challenges to releasing this form of data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would like to thank the authors for writing clearly and concisely which is easily accessible to the reader, I believe this clinical problem to be very relevant and am interested to see where this work may lead. The authors have done a thorough job of analysing the data presented and show clear improvements over the other methods presented. I believe this work belongs in the CAI track as it presents a novel MIC approach to solving an unmet CAI need and provides a cost-effective alternative to an otherwise expensive solution. I would like to make some minor constructive comments that are not necessarily weaknesses but I would recommend changing:

    • The title of the paper does not currently provide any context to the content of the paper, would the authors consider expanding this?
    • I would like to recommend that Figure 3. in the supplementary materials is very useful to the understanding of the cell reconstruction, would the authors please consider moving this to the main body of the paper if the extra half a page offered by the review process would allow for this?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors of this work present a relevant and significant clinical problem in a clear and easy-to-follow manner and have applied a novel form of graph-based neural network to determine embryo suitability. This is a very useful application and could lead to future work which helps to understand this suitability to improve future IVF treatment as well as reduce costs by reducing the failure rate of implantation. Graph neural networks are an appropriate choice for this is they offer greater explainability in a clinical setting. However, in its current form, the paper seems limited in scope by only applying their method to an internal dataset collected on one type of microscope/scanner. It also contains only limited comparisons to other machine learning methods despite referring to a wide range of methods. I would like the authors to please address whether they believe these methods could have greater applicability in reference to my perceived weaknesses above.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel GNN method to tackle the embryo selection problem in clinical IVF, a crucial task in daily clinical setting. A pipeline consists of cell segmentation, 3D reconstruction, graph representation is proposed and is shown to have achieved state-of-the-art performance. While most of the previous work focus on 2D information, this paper leverages 3D information which is more consistent with the nature of embryo.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is easy to follow, details such as the construction of 3D mesh, architecture of GNN, description of evaluation are thorough and well-written.
    2. Although there are plenty work that apply deep learning to embryo selection task, most of them focus on working with one or only a few focal planes. However, the embryo itself has rich 3D information which is used by clinicians during their inspection. Therefore, this paper propose to conduct 3D reconstruction first before selection is well-motivated. Addtionally, it seems that using GNN is another novelty of this paper compared to prior arts.
    3. The proposed Stack NMS algorithm makes sense in terms of biology. In normal object detection algorithms, two overlapping predictions are often seen as invalid, however, cells that are on top of each other is normal in embryo selection setting. The proposed algorithm has considered this important aspect.
    4. I like the part where the authors did a user study to evaluate the correctness of 3D reconstruction. IVF is a high-stake clinical task, therefore, using a human-in-the-loop approach is important.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It seems the authors did not mention the selection of window size for determining whether suppress an overlapping detection.
    2. In table 1, the classic NMS is significantly higher than the proposed Stack NMS, do the authors have any explanation for this result?
    3. The authors have compared their GNN method to logistic regression based method and CNN based method. However, recent advances in vision transformers should also be considered as baselines for comparison.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be great if the authors can make their dataset publicly available for the research community to follow this research direction.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I have serveal recommendations for the authors to consider:

    1. The title of the paper feels too short, the authors should consider make it longer by including more technical features such as “3D reconstruction” or task such as “embryo selection”.
    2. Please consider the weaknesses mentioned above.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on my evaluation above, I believe the strengths of this paper outweigh its weaknesses. Therefore, I recommend weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The author propose methodological advancements for the automatic analysis of few cell embryos for in vitro fertilization (IVF): 1. A new single cell segmentation and reconstruction method, taking the 3D image information into account. 2. A graph based method for predicting IVF success.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Highly relevant problem
    • Clear description of problem and approach
    • Using 3D information has been avoided so far - authors address a gap
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Data split for training and testing is a bit odd.
    • Robustness is not addressed
    • It is not clear where improvement of prediction performance comes from, and how much new segmentation method contributes
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be great if the authors could release one 3D image from each data set

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The title could be a bit more specific, the current one can be easily mis understood
    • References should be ordered, start with [1]
    • First paragraph: “Perhaps … “ please be more specific
    • Cite the “the few works that do consider multiple (typically up to three) focal planes”
    • Chapter 2: I wouldn’t call different cancers “domains”, “diseases” would fit better
    • Fig 1 is nice, could you make the focal stacks a litte bit larger so that the reader can appreciate image quality? Also, please specify how many focal stacks you use in a typical 3D image.
    • Also 1: What is NMS?
    • Can you use intensities or texture as additional node features?
    • The data description has to be part of the main article, please move there.
    • Segmentation benchmarking: Where does the ground truth come from?
    • Segmentation: Please show a few exemplary images to get an idea of performance and possible issues.
    • Fig 2: Pls add information on ‘hours post insemination’ and the scores given (would be interesting to show all scores, not just the mean). how are the examples ordere now? I suggest to show the distribution of all scores.
    • Please comment: Since AUC performance between the total contracts method and the GNN is similar: is the number of contacts ie the embryo compactness the most telling feature?
    • Please discuss more: why is the GNN preferable? why does a high F1 matter? (If it does).
    • Please make more clear: Is Table 1 based on your segmentations and 3D reconstructions? how Would the total contact method perform on the other NMS methods (eg the classic NMS)?
    • Fig 3 is nice, include in main text if possible
    • Table 3 has to be part of the main text. It raises questions though:
      • Why is the testing set differently balanced, with a strong Belgian bias?
      • Would the segmentation model also work on an unseen dataset?
      • Why is the GNN experiment (I assume this is the prediction experiment) only done on the American cohort?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces two methodological advancements, and it is not clear to me how the two contribute independently. If the new segmentation and 3D reconstruction makes the difference and a simple logistic regression based prediction outperforms previous approaches, then I do not need the fancy GNN.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We are glad that the reviewers found our submission to be well-written and novel, and we thank them for their constructive comments. We have reviewed the manuscript carefully and addressed many of the comments on wording and missing details as appropriate. We further address the major questions and comments below.

[R1, R4, R5] Paper title needs to be more detailed. We appreciate that the title may have been a bit vague. We have updated the title to “Embryo Graphs: Predicting Human Embryo Viability from 3D Morphology”.

[R1, R4] Why not benchmark against vision transformers / other ML models? We chose to compare our results with ResNet as CNNs (such as itself) form the basis of most recent works in automated embryo selection. Using a ViT would make it harder to disentangle performance gains from the use of a ViT from those arising from our contributions. Moreover, Kromp et al. [22] demonstrated that vision transformers (Deit and Swin) aren’t universally superior to standard CNN architectures across embryo tasks.

[R1, R5] Why can’t we release the dataset? Are there any open datasets that we could have used? The dataset was obtained from 3 clinics across 3 different countries. The terms of the contracts we have with the clinics unfortunately mean that we cannot release the data publicly. Though there are open embryo datasets (such as that of [22]), none of them, to our knowledge, come with the genetic data needed for our experiments.

[R1] The pipeline was developed using a specific kind of embryo incubator (Embryoscope). How widespread are they? Can the pipeline be used with other incubators with different numbers of focal planes?

The Embryoscope is among the most widely-used timelapse incubators in the field. As such, our and many other works have focused on data captured on Embryoscopes to improve translatability to the clinic. While variations in the number of focal planes can be up/down-sampled using the Super-Focus method (which, as noted in the manuscript, we use as a pre-processing step), we cannot say for certain that the pipeline would work with other incubators with different optical setups. However, it is encouraging that some other deep learning systems such as the commercially-available CHLOE have been able to be used out-of-the-box with other incubators such as the Geri.

[R1] How much influence does the user have on the generation of reconstructions when determining “user-defined intervals” within Step 2. The “user-defined intervals” define the spacing between layers in the generated mesh. The smaller the interval, the more vertices in the mesh and the smoother it looks. In the software, the interval is simply a parameter with a default value which the user can but does not need to change. In hindsight, it may be better to refer to the interval as a “fixed interval” - we have updated this in the manuscript.

[R5] Why was the GNN only evaluated on the American dataset? The American dataset was the only dataset that had genetic data which was necessary to determine which embryos could be included.

[R5] Why are there more Belgian embryos in the test set? As all clinics used the same imaging hardware, the datasets were combined and split based on the number of cells in each embryo rather than the source clinic.

[R5] The GNN and total contacts have similar AUCs, why is the GNN preferable and does this mean that the number of contacts is the most important feature? Though the GNN achieves a similar AUC to the total contacts method, it is better in terms of recall and F1. Moreover, the GNN allows additional features such as cytoplasmic texture to be integrated relatively straightforwardly. The results do seem to suggest the number of contacts to be the most important feature we looked at. There’s a biological basis for this - namely that more contact leads to improved intercellular communication.

[R5] How would the total contact method perform with the other NMSs? Poorly - this is partly what motivated Stack NMS.




Meta-Review

Meta-review not available, early accepted paper.



back to top