Abstract

For the past few years, deep generative models have increasingly been used in biological research for a variety of tasks. Recently, they have proven to be valuable for uncovering subtle cell phenotypic differences that are not directly discernible to the human eye. However, current methods employed to achieve this goal mainly rely on Generative Adversarial Networks (GANs). While effective, GANs encompass issues such as training instability and mode collapse, and they do not accurately map images back to the model’s latent space, which is necessary to synthesize, manipulate, and thus interpret outputs based on real images. In this work, we introduce PhenDiff: a multi-class conditional method leveraging Diffusion Models (DMs) designed to identify shifts in cellular phenotypes by translating a real image from one condition to another. We qualitatively and quantitatively validate this method on cases where the phenotypic changes are visible or invisible, such as in low concentrations of drug treatments. Overall, PhenDiff represents a valuable tool for identifying cellular variations in real microscopy images. We anticipate that it could facilitate the understanding of diseases and advance drug discovery through the identification of novel biomarkers.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1948_paper.pdf

SharedIt Link: https://rdcu.be/dV1Wh

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_34

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1948_supp.pdf

Link to the Code Repository

https://github.com/WarmongeringBeaver/PhenDiff

Link to the Dataset(s)

https://bbbc.broadinstitute.org/BBBC021

BibTex

@InProceedings{Bou_PhenDiff_MICCAI2024,
        author = { Bourou, Anis and Boyer, Thomas and Gheisari, Marzieh and Daupin, Kévin and Dubreuil, Véronique and De Thonel, Aurélie and Mezger, Valérie and Genovesio, Auguste},
        title = { { PhenDiff: Revealing Subtle Phenotypes with Diffusion Models in Real Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {358 -- 367}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper explores the use of diffusion models to generated treated synthetic images from real untreated images of cells (microscopy).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It’s a clinically relevant question, as it could save time and money treating samples if information and features corresponding to their treated counterpart could be synthetically created

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The results aren’t significant. The synthetic treated images do not look alike the real treated ones. This is a very challenging comparison, since the samples undergo a physical process when treated that would affect the subsequent imaging, however this needs to be acknowledged and explored further. The quantitative results comparing generated and real treated images vs untreated are not statistically significant.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Figures are too small to discern the details that authors claim in the captions. Some visual indications to highlight regions discussed should be added. Some obvious clusters in the original image in Fig1 also disappeared but this is not discussed
    • Results: from the images the generated treated and real treated do not look alike, which would be the aim of the method. This is challenging and partly expected (see my previous comments) but should be acknowledged and further explored. Differences with real untreated are often (judging by the fact of the selected illustrations) visually small, and not clear that they are relevant. Authors need to find better and more examples for qualitative assessment, and a way to further assess quantitavely their results. The boxplots differences are not significative, and they don’t acknowledge this. They cannot conclude that the “phenotipic changes are reliably reproduced” in light of the results.
    • Move Table 1 closer to its discussion in the text.
    • Add more discussion about the imaging type, relevance, patient demographics, etc.
    • Clinical feasibility discussion is also missing.
    • The formal description step by step is good
    • It’s good that the authors do a comparison with other methods, but have the authors optimise the StyleGAN/CycleGAN such that a comparison can be fair? The adaptation seems challenging, as acknowledged by the authors, so I wonder if the adaptation might not be optimal. Can the authors look at other datasets from referenced GANs work to establish a more fair comparison?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results don’t seem significant enough

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces PhenDiff, a novel framework utilizing multiclass conditional diffusion models for translating real cell images across different conditions. The primary objective is to discern subtle phenotypic differences triggered by a perturbation or variations between perturbations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper provides comprehensive analysis of the synthesis results of PhenDiff. The authors conduct a variety of evaluations including morphology visualizations, analyses of image and feature distributions, FID scores, and real untreated image reconstruction quality.
    2. According to the results presented, PhenDiff achieves superior performance over other approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty of the PhenDiff appears limited in the context of existing diffusion-based approaches. It would be beneficial for the authors to clearly highlight the unique aspects of PhenDiff that distinguish it from other diffusion models.
    2. The descriptions of the image inversion and image generation processes in PhenDiff are somewhat vague and require further clarification. Specifically, it would be helpful to detail the starting timestamp for the noised images during the reverse process, which is crucial for understanding the methodological workflow.
    3. The manuscript would benefit from a direct comparison of the generated treated images with real treated images, especially given that PhenDiff is based on the DDIM.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. It would be beneficial for the authors to clearly highlight the unique aspects of PhenDiff that distinguish it from other diffusion models.
    2. The descriptions of the image inversion and image generation processes in PhenDiff are somewhat vague and require further clarification.
    3. The manuscript would benefit from a direct comparison of the generated treated images with real treated images, especially given that PhenDiff is based on the DDIM.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see the strengths and weaknesses of this paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This study presents a diffusion model to identify phenotypic variations in cell images. A noise-denoise approach is used that was defined previously in references. It was tested using public datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Simplicity: no additional parameters are needed A new application of DDIM (as far as I know)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No technical contribution, nor architecture either components of the model

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although the authors are using the model described in ref. 14, I would recommend including some technical descriptions about the setting and training of the model (even if they provide the GitHub link), mainly for clarity and readiness of readers. Furthermore, some hardware details and a comparison of the performance of each model could be welcomed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work is well-organised, clear and concise. The authors compare results with similar recent studies using proper metrics.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers for their thoughtful comments and constructive remarks. Reviewer #1: 1-We apologize for the size of the figures; we will make them bigger in the revised version and will also highlight the regions where the phenotypic variations are visible. Indeed In Fig1 some clusters of cells do disappear, this is expected as the treatment is toxic and kills the cancer cells. Our model thus has faithfully replicated the effect of the treatment on the image. 2-The aim of the method is not to reproduce the treated image shown in the figures, as mentioned. The goal is to simultaneously see what the same cell would look like with and without treatment by image synthesis, as this can not be done experimentally. Therefore we expect that the generated images of treated cells will not show the same cells as a real image but instead artificially display the same treated phenotype on the real untreated cells. Fig2a shows a clear phenotype: the fragmentation of the Golgi apparatus (green channel). We can see that in both the generated and real treated images, the apparatus is splitted whereas it appears aggregated in the untreated images (one big spot). Similarly as shown in Fig3 and quantified in Fig4 the phenotypes related to different treatments are also recapitulated in synthetic images. 3-We respectfully disagree with the reviewer about the statistical significance of the boxplots. The distribution of treated versus untreated cells is significantly dissimilar as validated by the p-values of t-tests (p=1.0110^-28 between conditions using real images and p=1.0910^-14 between conditions using generated images) so the conclusion of the experiment would be the same whether using generated or real images. We will add these p-values to the plot. Furthermore, Fig4 shows a strong correlation between the features extracted by CellProfiler in the real and generated treated images, quantitatively confirming that our method faithfully reconstructs biological features. 5-We will move the table and add more information about imaging type, relevance, patient demographics as well as clinical feasibility in the revised version. 6-For CycleGAN, we did not perform any adaptation as it can translate real images. For Phenexplain (which is based on StyleGAN), we adapted it for real images by implementing the state-of-the-art technique for StyleGAN inversion (iterative refinement of latent codes) and using both W and W+ spaces to balance image fidelity and editability. 7-The datasets used in the paper are the ones used by Phenexplain, which is the reference method. 8-We also want to emphasize that the source code of our method is publicly available, as is the BBBC021 dataset, ensuring the reproducibility of our paper. Reviewer #3: We thank the reviewer for the positive feedback. We will add the parameters used in training our model in the revised version. Reviewer #4: We thank the reviewer for the positive feedback on the analyses we performed in our paper. 1- PhenDiff represents a new application of diffusion models to spot subtle phenotypes in microscopy images via image-to-image translation which is very useful in biology. Importantly, we do not claim that PhenDiff is a new architecture of diffusion models. We will clarify it in our revision. 2- The generation process in Phendiff allows the generation of a new image starting from a random noise and a selected class. The inversion is the reverse process that enables the manipulation of real images. This allows us to see the effect of a treatment on a real untreated image, unlike Phenexplain which is based on GANs and cannot invert a real image. The inversion starts from x_0 to x_T and the generation starts from x_T to x_0, we will correct this in Fig1. 3- We have actually addressed this in Fig3. In addition, Fig4 compares hundreds of quantitative features measured on both real and generated images of treated cells to demonstrate their strong similarity. We will better emphasize it.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors addressed the concerns of reviewer 1 about figure sizes and clarity by committing to enlarge the figures and highlight regions where phenotypic variations are visible. They clarified the goal of the method, emphasizing that it is designed to display the same treated phenotype on real untreated cells, which cannot be achieved experimentally, rather than reproducing exact treated images. Statistical significance of the results was robustly defended with additional p-values and strong correlations between real and generated images, as shown in their experiments. The authors promised to add more detailed information about imaging type, relevance, patient demographics, and clinical feasibility to the revised version.

    Reviewers #3 and #4 provided positive feedback on the analysis and the overall methodology. The authors also committed to adding more details about the training parameters and clarifying the novelty of their application rather than claiming new architecture for diffusion models.

    The introduction of PhenDiff represents a novel application of diffusion models to identify subtle phenotypes in microscopy images. The authors effectively demonstrated that their method can faithfully replicate phenotypic changes induced by different treatments. They provided quantitative and qualitative evidence showing that PhenDiff outperforms existing GAN-based methods, addressing limitations such as training instability and poor image inversion quality.

    The authors emphasized the utility of their method in biology, particularly in identifying phenotypic variations that are not experimentally achieble, thus contributing valuable inishgts for disease understanding and drug discovery.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors addressed the concerns of reviewer 1 about figure sizes and clarity by committing to enlarge the figures and highlight regions where phenotypic variations are visible. They clarified the goal of the method, emphasizing that it is designed to display the same treated phenotype on real untreated cells, which cannot be achieved experimentally, rather than reproducing exact treated images. Statistical significance of the results was robustly defended with additional p-values and strong correlations between real and generated images, as shown in their experiments. The authors promised to add more detailed information about imaging type, relevance, patient demographics, and clinical feasibility to the revised version.

    Reviewers #3 and #4 provided positive feedback on the analysis and the overall methodology. The authors also committed to adding more details about the training parameters and clarifying the novelty of their application rather than claiming new architecture for diffusion models.

    The introduction of PhenDiff represents a novel application of diffusion models to identify subtle phenotypes in microscopy images. The authors effectively demonstrated that their method can faithfully replicate phenotypic changes induced by different treatments. They provided quantitative and qualitative evidence showing that PhenDiff outperforms existing GAN-based methods, addressing limitations such as training instability and poor image inversion quality.

    The authors emphasized the utility of their method in biology, particularly in identifying phenotypic variations that are not experimentally achieble, thus contributing valuable inishgts for disease understanding and drug discovery.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top