Abstract

Generative modeling seeks to approximate the statistical properties of real data, enabling synthesis of new data that closely resembles the original distribution. Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) represent significant advancements in generative modeling, drawing inspiration from game theory and thermodynamics, respectively. Nevertheless, the exploration of generative modeling through the lens of biological evolution remains largely untapped. In this paper, we introduce a novel family of models termed Generative Cellular Automata (GeCA), inspired by the evolution of an organism from a single cell. GeCAs are evaluated as an effective augmentation tool for retinal disease classification across two imaging modalities: Fundus and Optical Coherence Tomography (OCT). In the context of OCT imaging, where data is scarce and the distribution of classes is inherently skewed, GeCA significantly boosts the performance of 11 different ophthalmological conditions, achieving a 12% increase in the average F1 score compared to conventional baselines. GeCAs outperform both diffusion methods that incorporate UNet or state-of-the art variants with transformer-based denoising models, under similar parameter constraints. Code is available at: https://github.com/xmed-lab/GeCA.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0233_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0233_supp.pdf

Link to the Code Repository

https://github.com/xmed-lab/GeCA

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Elb_An_MICCAI2024,
        author = { Elbatel, Marawan and Kamnitsas, Konstantinos and Li, Xiaomeng},
        title = { { An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work propose a diffusion model based high-resolution image synthesis methods based on retinal images. A classification task is used to validate the proposed method’s performance

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript is easy to understand and follow. The figure is clear to show the whole model structure.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The generated colour fundus images in Fig.1 look very weird, especially in the optic head area. The optic disc and cup’s irregular shape do not exist in the real world. This poses very challenging questions: The generated images do not follow human nature and do not exist at all. The AI model creates new images without clinical knowledge or background. The generated OCT also looks wired; the retinal layers are mixed and blurred without showing the actual retina layer structure. The proposed model may not learn the actual anatomy structure.

    2. The author validates a proposed super-resolution method through a classification evaluation task. This cannot reflect the actual super-resolution ability of the proposed methods. Instead, it could lead the model to learn an unrealistic projection of generated images only for promising classification performance. This further weakens the explainability of AI methods in the field of medical image analysis and healthcare.

    3.What’s the main contribution of the proposed method, apart from merging two existing methods together? Such as [18] and [8]?

    4.The term ‘gene heredity guidance’ is misleading; I suggest the author consider another name.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I suggest the author to chose a proper task and dataset to validate the proposed methods.

    Some of the term used in the manuscript need to be considered, as they are misleading.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The validation task is inappropriate for the proposed method and cannot reflect the model’s performance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces Generative Cellular Automata (GeCA) for retinal disease diagnosis using Optical Coherence Tomography (OCT) imaging. GeCA combines Neural Cellular Automata (NCA) with diffusion objectives, improving image generation and disease classification. With Gene Heredity Guidance (GHG), GeCA surpasses existing models like Diffusion Transformers (DiTs) with fewer parameters. By augmenting scarce OCT datasets with synthetic images, GeCA enhances multi-label retinal disease classification by 12 %.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1. The proposed GeCA model combines Neural Cellular Automata (NCA) with diffusion objectives, tailored specifically for NCA’s unique structure. This integration allows for more efficient image generation while utilizing fewer parameters compared to traditional diffusion-based optimization methods.  2. The Gene Heredity Guidance (GHG) technique significantly improves GeCA’s image sampling process. By leveraging GHG, GeCA surpasses state-of-the-art models in both image generation and retinal disease classification while utilizing only half the parameters of comparable models like DiTs.  3. Effective Dataset Expansion for Retinal Disease Classification: GeCA demonstrates its effectiveness not only in generating high-quality synthetic images but also in improving multi-label retinal disease classification.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1. Insufficient evaluation. The authors only compare with DiT and LDM, ignoring the method specifically designed for funds and OCT image generation[1][2]. 2. Can this method be used to synthesize data like AS-OCT[3][4] and improve the classification performance on more diseases like glaucoma? 3. The motivation behind using pix-cell is not clear to me, can you explain more? 4. The hyperparameter of the classifier free guidance affects the quality of the generated image significantly. I suggest the author discuss the hyperparameter.

    [1]Zhao, He, et al. “Supervised segmentation of un-annotated retinal fundus images by synthesis.” IEEE transactions on medical imaging 38.1 (2018): 46-56. [2]Shenkut, Dereje, and Vijayakumar Bhagavatula. “Fundus GAN-GAN-based fundus image synthesis for training retinal image classifiers.” 2022 44th annual international conference of the IEEE engineering in Medicine & Biology Society (EMBC). IEEE, 2022. [3]Yang, Yifan, et al. “Distinguishing differences matters: Focal contrastive network for peripheral anterior synechiae recognition.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24. Springer International Publishing, 2021. [4] Fu, Huazhu, Yanwu Xu, Stephen Lin, Xiaoqin Zhang, Damon Wing Kee Wong, Jiang Liu, Alejandro F. Frangi, Mani Baskaran, and Tin Aung. “Segmentation and quantification for angle-closure glaucoma assessment in anterior segment OCT.” IEEE transactions on medical imaging 36, no. 9 (2017): 1930-1938.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    None.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to main weakness part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work addresses an interesting and important task, the proposed method is technically sound, and the authors claim to release their code for re-productivity. However, there are limitations regarding insufficient comparison, superior performance regarding SOTAs, and so on. If the authors show that the proposed method can surpass the existing method, I will consider changing my score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The work extends the recently proposed Neural Cellular Automata method that was proposed for image segmentation under resource constraint environments to the generative modelling domain. To do so, the authors incorporate a (de)noising diffusion processes, and adopt an optimization technique from genetic algorithms for enhancing it. The presented method is very novel and achieves generating synthetic data with significantly less parameters than Diffusion Transformers. Still the authors relies on diffusion transformers principles to learn the data distribution. The method is applied to generate synthetic data from 2 different imaging modalities: Fundus, OCT. In turn, the authors of this study assess the value of augmenting an in-house and a public dataset of OCT images that contain small number of images, and unbalanced number of images per category, in the downstream task of ocular disease classification. The proposed method outperforms state-of-the-art GANs and Diffusion based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodological: The authors make crucial methodological modifications to the Neural Cellular Automata (NCA) architecture by incorporating the Gene Heredity Guidance, and a single diffusion transformer block. This enables the method to learn the data distribution using fewer parameters than the Diffusion Transformers, and generate images of higher resolution than the existing NCA methods, and with higher fidelity. The method presents an alternative to the current state-of-the-art GANs and Diffusion based methods for generative modelling. It is also a novel methodological extension of a previously presented work in MICCAI 2023 [1] where NCA was used for image segmentation.

    [1] Kalkhof, J., Mukhopadhyay, A.: M3d-nca: Robust 3d segmentation with built-in quality control. MICCAI 2023

    Validation and performance: The authors validate their approach in two different imaging modalities, Fundus and OCT. Even though for the fundus images the evaluation is done measuring only standard generative modelling metrics, like the KID and FID, as well as new ones, like the LPIPS and the GG from FLD, they also perform extensive quantitative evaluation on the OCT image classification task by augmenting with synthetic images two different OCT datasets. For both the fundus and OCT imaging the proposed methods outperforms significantly existing diffusion and Neural Cellular Automata methods in the quality of the generated images using fewer parameters. For the downstream task of classification, the authors demonstrate ,with statistically significant power, that the incorporation of synthetic data from their method could perform better than incorporating images from diffusion-based methods. The authors use cross-validation at the patient-level.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Paper organization and ideas presentation: The organization of the paper is a bit unconventional with the first figure placed over the abstract, and also the keywords section missing. The summarization of the presented contributions could be improved by rephrasing a bit the contribution section and reordering it to increase the impact of the presented ideas. The ordering of the pix-cell state parameters definitions underneath equation (1) is different than equation (1): Cin, Cgamma, Cout, Ch. Even though there is limited space to expand on the different ideas, I was expecting more explanation on the impact of the choice of parameter M and the number of cell HxW entities. Also, I had to go through Attention-based Neural Cellular Automata paper [2] to better understand the purpose of parameter Ch in equation (1)

    The authors do not discuss the limitations of the proposed method.

    [2] Tesfaldet, M. et al. Attention-based Neural Cellular Automata, ArXiv 2022

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) It is recommended to the authors to place Figure 1 after the abstract, cite it in the text, and replace its contents with examples where existing methods fail to generate fundus/OCT images with high fidelity.

    2) It is recommended to the authors to rephrase the contribution section to increase the delivery impact by separating the second point to two additional one with the first one describing the GHG sampling, and the second one describing that the proposed method relies on only half the DiT’s method parameters. Please remove the bold fonts, instead if something needs emphasis use italics fonts.

    3) Could the method be conditioned on different pathologies and generate conditional synthetic data? This is an advantage that the latest diffusion-based methods have. Also the fundus images could contain multiple diseases, for example in fundus images cataract/myopia together with diabetic retinopathy or age-related macula degeneration could be present that usually it is not reflected in the annotations, how this absent multi-disease annotation affect the classification performance and generative process?

    4) It is recommended to the authors to expand on the explanation of the purpose of parameter Ch. The parameter is explained better in the Attention-based Neural Cellular Automata method. Could the Genetic algorithm be replaced with other metaheuristics algorithms to further optimize the diffusion denoising process?

    5) Please change the order of the parameter under equation (1) to reflect the order of equation (1) and further improve the reading flow.

    6) Given that a major advantage of the method is its reduced resources consumption, what is the training and inference time of the authors’ method compared to the existing work? Since the approach is clinically friendly [1], could the method fit to the time constraints of the current clinical workflows in ophthalmology?

    [1] Kalkhof, J., Mukhopadhyay, A.: M3d-nca: Robust 3d segmentation with built-in quality control.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A novel methodological extension that incorporates bio-inspired approaches and that enriches the current schemes of generative models (game theory for GANs and thermodynamics for Diffusion models), it probably represents a new class of algorithms.

    Strong validation in multiple datasets using a wide array of qualitative and quantitative metrics. Clear improvements in performance (resources of memory/metrics) against the existing works in the literature.

    I am certain that MICCAI community will appreciate the quality of contributions of this work, and spark great discussion.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    The authors covered all my questions, and based on my reviews I am certain that there are important scientific contributions that ensures a strong acceptance rating for this paper.




Author Feedback

Dear Area Chairs and Reviewers,

We would like to thank you for your time, review, and the opportunity to clarify the points raised regarding our Generative Cellular Automata (GeCA). We appreciate your positive feedback that GeCA is “a very novel methodological extension that incorporates bio-inspired approaches and enriches the current schemes of generative models,” (R3) while clinically tackling “an interesting and important task” (R4) of 11 multi-label conditional OCT generation. Reviewers acknowledged the novelty in merging NCA [18] with diffusion [8], noting that GeCAs “make crucial methodological modifications to the NCA architecture” (R3). Furthermore, our “Gene Heredity Guidance (GHG) technique significantly improves GeCA’s image sampling process” (R4). There is also recognition of the clinical implications of GeCA, with the “Effective Dataset Expansion for Retinal Disease Classification,” (R4) and the “Strong Validation in multiple datasets using a wide array of qualitative and quantitative metrics” (R3). Yet, there are reservations: i) The AI model creates new images without clinical knowledge (R1): We acknowledge your concern that images generated by AI models may contain unrealistic projections. In response, we highlight GeCA’s clinical implication not to show fake images for clinicians, instead, we highlight the aim of using these images to enhance AI diagnostic ability. Thus we kindly ask you to consider comparative results of GeCA with baselines for clinical and technical implications. a) We demonstrate the significant clinical implications in effectively expanding 11-multi-label OCT dataset, enhancing AI models significantly (*****). The classification evaluation in Table 2 was conducted exclusively on the real non-fake OCT test-set, underscoring improvements to real-world clinical datasets. The code and the OCT-ML dataset will be released. b) GeCA demonstrates significant technical contributions over SOTA baseline, transformers with diffusion, DiT [23], while outperforming both quantitatively and qualitatively. (Tab. 1 and Fig. 4 for comparison) ii) The classification and the super-resolution capability of methods (R1): To address any potential misunderstandings, GeCA does not propose nor employ super-resolution method, while “our generative model is evaluated on real-world clinical non-fake datasets”. iii) Comparison with literature & additional priors (R4): Following R4’s advice, we’ll update our manuscript to address why (R4:[1], [2]) don’t compare well with GeCA. While [1,2] “leverage generative models” with fundus segmentation priors, it is promising to replace their GAN with our baseline DiT (ICCV’23) as well as our “novel generative model”, GeCA. Note that [1,2] is inapplicable without vessel priors in our OCT-ML. We hope the clarifications help readjust the rating to accept GeCA. iv) The motivation behind using ‘pix-cell’ (R4): ‘Pix-cell’ refers to a unique time-state space representation for ‘pix’el or patch in our ‘cell’ular automata. ‘Pix-cell’ and their interaction capture and propagate fine-grained long-term dependencies, crucial for small structures in medical imaging. v) CFG &‘M’ (R3, R4): We observed the same trends in CFG as [9,23] and opted for 1.5, defaulted in [23]. We set ‘M’ while training & inference to 12, DiT number of layers. Notably, our ablation studies on ‘M’ in Supp. Fig. 6 reveals that GeCA demonstrates novel zero-shot inference capabilities, while an optimal ‘M’ in inference can exceed the results presented. vi) Future work (R1, R3, R4): Though we showed our novel GeCA in application of multi-label conditional generation of OCT, GeCA’s potential in medical imaging is yet to be explored. The scarce AS-OCT (Fu et al. MedIA 2020, Yang et al. Biomed. Opt. 2023) holds a great application (R4). Extensions for GeCA are twofold: selective metaheuristic guidance to further optimize the image sampling (R3) and schedulers for denoising strength, M, during inference.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although the technical complexity of integrating Neural Cellular Automata (NCA) with diffusion models might limit accessibility and understanding for a broader audience, potentially hindering widespread adoption, also the SOTA comparison is lacking, the proposal “An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis” is a strong candidate for acceptance due to its innovative approach, significant performance improvements, and potential impact on the field of medical imaging.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Although the technical complexity of integrating Neural Cellular Automata (NCA) with diffusion models might limit accessibility and understanding for a broader audience, potentially hindering widespread adoption, also the SOTA comparison is lacking, the proposal “An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image Synthesis” is a strong candidate for acceptance due to its innovative approach, significant performance improvements, and potential impact on the field of medical imaging.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors properly responsed all the previous concerns. While two of the previous reviewers failed to give a final recommendation, I think this work is interesting to be presented in MICCAI.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors properly responsed all the previous concerns. While two of the previous reviewers failed to give a final recommendation, I think this work is interesting to be presented in MICCAI.



back to top