Abstract

The disparity in access to machine learning tools for medical imaging across different regions significantly limits the potential for universal healthcare innovation, particularly in remote areas. Our research addresses this issue by implementing Neural Cellular Automata (NCA) training directly on smartphones for accessible X-ray lung segmentation. We confirm the practicality and feasibility of deploying and training these advanced models on five Android devices, improving medical diagnostics accessibility and bridging the tech divide to extend machine learning benefits in medical imaging to low- and middle-income countries. We further enhance this approach with an unsupervised adaptation method using the novel Variance-Weighted Segmentation Loss (VWSL), which efficiently learns from unlabeled data by minimizing the variance from multiple NCA predictions. This strategy notably improves model adaptability and performance across diverse medical imaging contexts without the need for extensive computational resources or labeled datasets, effectively lowering the participation threshold. Our methodology, tested on three multisite X-ray datasets—Padchest, ChestX-ray8, and MIMIC-III—demonstrates improvements in segmentation Dice accuracy by 0.7 to 2.8%, compared to the classic Med-NCA. Additionally, in extreme cases where no digital copy is available and images must be captured by a phone from an X-ray lightbox or monitor, VWSL enhances Dice accuracy by 5-20%, demonstrating the method’s robustness even with suboptimal image sources.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1060_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1060_supp.pdf

Link to the Code Repository

https://github.com/MECLabTUDA/M3D-NCA

Link to the Dataset(s)

http://bimcv.cipf.es/bimcv-projects/padchest/ physionet.org/content/mimiciii/1.4/ https://physionet.org/content/chexmask-cxr-segmentation-data/0.4/ https://nihcc.app.box.com/v/ChestXray-NIHCC

BibTex

@InProceedings{Kal_Unsupervised_MICCAI2024,
        author = { Kalkhof, John and Ranem, Amin and Mukhopadhyay, Anirban},
        title = { { Unsupervised Training of Neural Cellular Automata on Edge Devices } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors test their method of neural cellular automata on smartphones allowing on-device training of semantic segmentation tasks using a novel variance-weighted segmentation loss. The authors test this on various X-ray datasets, as well on suboptimal image sources, such as photographs from viewing boxes or monitors. They could show that on device fine-tuning improves the performance up to 20% in their measured DSC.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors address the issue that not everyone has access to powerful medical environments but rather have access to smartphones. They test how well in non-ideal conditions the smartphone can be used to train and test NCA models with also suboptimal image sources.

    I also appreciate that the authors tested their model on various image datasets, namely Padchest, ChestX-ray8, MIMIC-III/ChexMask. Their baselines include UNet, TransUNet and nnUnet. Their Android devices include five devices from different generations and brands.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think the main weakness is comparing apples with oranges. The authors compare Med-NCA, an improved version of NCA described earlier in reference 11. Original NCA needs a significant amount of VRAM, whereas Med-NCA only uses 1/16th of that, assuming around 2.x GB of VRAM for training purposes. The U-Net that the authors describe here has around 38M parameters. Multiple studies have shown that for distinct segmentation purposes an ablation study can be performed while maintaining a high percentage of performance (e.g. Kist and Döllinger, IEEE Access 2020, Fu et al., IJCARS 2021, …). Some statements that are cited with [11] cannot be found in the reference, in this reference I see that UNet performs well, nnUnet with lower parameters (low 1-5M range) performs even better. I believe for a task like segmenting the left and right lung, a low parametrized U-Net could do the job. In addition, this U-Net should be able to be ported to any Arduino device using the authors’ TFLITE approach and trained on device. I think this approach and experiment is crucial for making a fair comparison.

    In terms on on device training and performance, I think the fine tuning task and the validation on five images is nice, but not sufficient to create hard statistics. As well, I would like to see the performance gain offline and how it compares online on the different baselines provided.

    In general, I am not fully convinced on the finetuning part using the VWSL loss. I first would like to see the reason of incorporating Dice and Focal loss as an ablation study, then the STDs are so large (0.110-ish) that the low change by 0.015 is barely statistically significant, such that I would not call that a major benefit, but at least the results are consistently better. The question is, if I would further “fine tune” with the “old”/initial loss, do I get the same or similar results, just because I have trained longer?

    Overall, I think the new insights are incremental to [11].

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors claim: “Upon acceptance, we will make our whole framework available.”

    The data stated in 3.3 is not available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would like to see in a rebuttal responses to the following points, basically following the weakness statements:

    A) Please state the advances in this paper compared to [11] B) Please provide a fair comparison with small U-Nets deployable to your Android phones to see if on device training on the U-Nets are comparable and/or competitive, and if your solution outperforms the U-Net one (train, validation, test). C) Please perform the on device fine tuning and validation experiment on more than five images. I think around 50-ish would be a reliable measure. D) The VWSL loss needs to be better explained and validated. Please test the core features independently and offline to see the effect. Compare this to fine-tuning without the VWSL loss (is longer training doing the thing?), maybe just reducing the learning rate or the regularization itself is the key component for “better” results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has some nice ideas, but a lot of key facts were known before (ref. [11]) or key claims were not addressed properly (comparison apples vs oranges in terms of non-deployable UNet-variants). Currently I am tending to reject the paper due to major new experiments that need to be done, but I am eager to get convinced by the authors during the rebuttal phase.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces an innovative method by training Neural Cellular Automata (NCA) models directly on smartphones for X-ray lung segmentation. This approach aims to improve accessibility to advanced diagnostic tools in regions with limited resources. By utilizing the Variance-Weighted Segmentation Loss (VWSL) for unsupervised adaptation, the method enables efficient fine-tuning from unlabeled data. This process enhances model performance across various medical imaging scenarios without requiring extensive computational resources or labeled datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel fine-tune technique: The incorporation of the Variance-Weighted Segmentation Loss (VWSL) for unsupervised adaptation in Neural Cellular Automata (NCA) models stands out as a notable strength. This approach capitalizes on the inherent adaptability of NCAs, optimizing performance on unlabeled data and facilitating effective model training and adaptation to new domains.

    2. Adaptivity on smartphones: The illustration of training and fine-tuning NCA models directly on smartphones underscores the paper’s strong clinical feasibility aspect. This approach greatly improves accessibility to advanced diagnostic tools in resource-limited settings, demonstrating the practicality and potential impact of decentralized diagnostic solutions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Effectiveness of fine-tuning: The main claim of the paper asserts that VWSL-finetuning assists Med-NCA models trained on dataset A in generalizing to dataset B, as clearly demonstrated in each row of Table 1. However, it appears that the performance of the finetuned model on dataset B is notably worse than the Med-NCA models directly trained on B. For example, the first row indicates that the Med-NCA model trained on MIMIC (dataset A) achieves a performance of 0.867 when finetuned on Padchest (dataset B). In contrast, the Med-NCA model directly trained on Padchest attains a performance of 0.954, as shown in the last row. Although the finetuned model exhibits slightly better performance for the case A=ChestX8 (0.955 vs. 0.954), this seems like a posterior conclusion. There is no clear indication of from which dataset one should initially train the model and then perform finetuning, which raises questions about the optimal training strategy.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would recommend that the authors address the weaknesses mentioned above in their rebuttal. It would be valuable to derive a smart strategy on how to choose a dataset for initial model training and subsequent fine-tuning to achieve the best cross-dataset generalization performance. This discussion could provide insights into optimizing the training process for Med-NCA models, ensuring robust performance across diverse datasets while maintaining generalization to unseen data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty and practicality of the paper are commendable. However, further verification is needed to assess the effectiveness of the fine-tuning process.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors’ feedback has effectively addressed my concerns, so I would like to raise my score to 5 and recommend acceptance.



Review #3

  • Please describe the contribution of the paper

    The paper presents a framework for Neural Cellular Automata designed for training and inference on low-resource devices for X-ray lung segmentation. Additionally, it introduces VWSL (Variance-Weighted Segmentation Loss), a novel loss utilized for model adaptability on unlabeled data, thereby making the model robust to distribution shifts in the target domain. The results demonstrate the possibility of reducing the performance drop of the lightweight baseline compared to unet-like architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper introduces a novel approach for fine-tuning a lung segmentation model on edge devices.
    • The adaptability of the models is crucial, and the novel loss function intriguingly exploits the intrinsic characteristics of NCA.
    • The baseline has a significant decrease compared to Unet architectures, this approach reduces the drop in performance.
    • The evaluation is comprehensive.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper lacks comparison with other domain adaptation techniques.
    • The reliability of the test set is questionable (see details).
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors state to release code upon acceptance, all the evaluation datasets are publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While it is interesting to use only 50 samples for training and adaptation, we feel that utilizing only 50 samples for evaluation might not fully assess the general model’s ability to segment the images. Additionally, it is unclear how smartphone images impact the contribution in a real-world scenario. What evaluations can be performed from such segmentation? Minor: Table 1 is not totally clear; why are ‘-‘ results not reported?”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper innovatively applies neural cellular automata (NCA) for X-ray lung segmentation on mobile devices, introducing model adaptation within the architecture.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thank you for taking the time to read the review and provide your responses. While the contribution of the paper is somewhat limited, i find the novelty of the application field interesting and potentially valuable to the community. However, the main limitation remains the comparison with the state of the art. A more thorough comparison would significantly strengthen the paper.




Author Feedback

We thank the reviewers for their feedback and appreciate their recognition of our novel fine-tuning technique, the training on smartphones for low-resource environments, and our comprehensive evaluation of various datasets, which together highlight the practical significance of our research.

Our work, submitted to the health equity track, aims to democratize medical imaging by training Neural Cellular Automata (NCA) across multiple mobile devices. This approach not only makes medical AI technologies accessible but also evaluates their performance and adaptability in varied real-world settings.

(R3) Advances in comparison to Med-NCA[11]: This work’s major technical contribution is the introduction of the Variance Weighted Segmentation Loss (VWSL). This novel unsupervised training leverages random activations within NCAs to reduce the influence of domain shifts on prediction quality. Additionally, our contributions confirm the feasibility of employing NCA training on various Android devices confirming its great potential as a means of democratizing medical AI technologies.

(R3) Additional Minimal UNet Experiments: While comparing with minimal UNets could offer further validation, Med-NCA’s (26k parameter) performance, which matches a 37M-parameter UNet, already demonstrates its effectiveness. Literature, including [11], shows that reducing parameters results in performance drops of different degrees (e.g., 10% worse Dice on the prostate segmentation task with a 6.3M-parameter UNet). This gap can be lowered with autoML strategies, as the 1.9M-parameter nnUNet [11] on the small hippocampus dataset (64x64 images) shows. However, this is not directly comparable to our context of 256x256 X-ray images, where nnUNet autoconfigures to 30M parameters.

(R3) Fine-tuning and validation should use more than 5 images: It is important to note that the results presented in Table 1 and Figure 6 are indeed based on models fine-tuned and tested on 50 images each, ensuring robustness in our measures. Figure 5 shows training on 50 images and fine-tuning on 5 to demonstrate VWSL effectiveness with minimal data, a critical aspect of real-world application.

(R3) Clarifications of VWSL Loss (added to Section 2.2): VWSL adapts Med-NCA unsupervised to new domains by generating mean predictions and variance maps from multiple forward passes. VWSL combines Dice Similarity Coefficient and Focal Loss and is weighted by pixel-wise variance, acting as the surrogate loss preventing the trivial solution of setting all outputs to zero. Fine-tuning involves 100 additional epochs compared to the initial 1500, with the impact of different variance minimization weightings evaluated in the ablation study in Table 2.

(R1) Unsupervised finetuning performs worse than supervised training on the dataset: Our work demonstrates that VWSL-fine-tuning enables Med-NCA models to generalize better across datasets. While there is a gap compared to models directly trained on the domain, the average improvements of 0.7-2.8% Dice across experiments in Table 1 and individual gains of up to 24% Dice in Figure 4 validate the potential of fine-tuning and are crucial when ground truth data is unavailable.

(R1) Optimal Training Strategy: In real-world scenarios, the choice of the initial training dataset is predetermined by available data. Our study emphasizes that regardless of the initial dataset, VWSL-fine-tuning consistently enhances model performance across all tested optimizations. This demonstrates its robustness and adaptability.

(R5) Clarification Table 1: The ‘-‘ results in Table 1 indicate in-distribution scenarios where training and test data originate from the same source. Since there is no domain adaptation needed it does not present the challenges our work addresses.

(R5) Impact of Smartphone Images: In environments without digital X-rays, such as rural areas in LMICs, smartphone images allow for digitizing X-rays to assess diagnostics and monitor diseases remotely.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes a novel and interesting application and is a suitable topic for the health equity track. The rebuttal sufficiently responded to the reviewers’ concerns (especially R3). I would recommend minorly updating the camera-ready version with the clarifications provided in the rebuttal, specifically about the advancements over MedNCA and VWSL Loss.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposes a novel and interesting application and is a suitable topic for the health equity track. The rebuttal sufficiently responded to the reviewers’ concerns (especially R3). I would recommend minorly updating the camera-ready version with the clarifications provided in the rebuttal, specifically about the advancements over MedNCA and VWSL Loss.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top