Abstract

To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a model designed to produce high-fidelity, long and complete data samples with near-real-time efficiency and explore our approach on a challenging task: generating echocardiogram videos. We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization. As an exemplar, we present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels. As part of our de-identification protocol, we evaluate the quality of the generated dataset and propose to use clinical downstream tasks as a measurement on top of widely used but potentially biased image quality metrics. Experimental outcomes demonstrate that EchoNet-Synthetic achieves comparable dataset fidelity to the actual dataset, effectively supporting the ejection fraction regression task. Code, weights and dataset are available at https://github.com/HReynaud/EchoNet-Synthetic.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0158_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0158_supp.pdf

Link to the Code Repository

https://github.com/HReynaud/EchoNet-Synthetic

Link to the Dataset(s)

https://echonet.github.io/dynamic/ https://echonet.github.io/pediatric/

BibTex

@InProceedings{Rey_EchoNetSynthetic_MICCAI2024,
        author = { Reynaud, Hadrien and Meng, Qingjie and Dombrowski, Mischa and Ghosh, Arijit and Day, Thomas and Gomez, Alberto and Leeson, Paul and Kainz, Bernhard},
        title = { { EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an approach to generate echocardiography videos (2D frames + time), using different diffusion models with a so called privacy filter. The synthetic videos are then further used in a downstream task to predict the ejection fraction values.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work is the introduction of a pipeline able to generate frame- and time-consistent echocardiography videos, which are actually useful in downstream tasks, such as the ejection fraction prediction. The authors also provide an anonymized link to the privacy-preserved synthetic dataset they generated, sharing it with the research community.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the paper lays on the poor way the authors explain the pipeline. Despite it being strong, the clarity of the text and the methodology doesn’t contribute to the readability of the paper. Also, the clinical applicability of the synthetic dataset is not novel, as the authors use it to predict the ejection fraction (several works have worked on improving this task). A final weakness of the paper is linked to the complexity of the model, which complicates a detailed analysis of the results, i.e. the authors don’t explore all the variables that could actually affect the final quality/utility of the synthetic images.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) My main overall concern is related to the usage of the word “privacy”. The point of the pipeline is to make sure the model doesn’t memorize the images in the training dataset. There’s a solid difference between this and claiming to keep the privacy across the synthetic dataset. The public EchoNet dataset is already anonymized, therefore the privacy of the subject was kept (for example, name, gender, age, data origin, …). The authors’ work doesn’t anonymize the synthetic dataset, but just makes sure there are no “repetitions” of the real images on the synthetic ones, i.e. avoids memorization of specific image pixel values. 2) The SOTA regarding using both GANs and DDMs to generate echocardiography images should be further explained, given that this is the aim of the work. For example, read/add these 3 references: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10049068 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9893790 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9324763 3) Figure 1 where the pipeline is explained is quite confusing. For example, in chapter 2 the authors start by training the VAE, but on the figure 1 this seems to be the last step. Consider numbering the different training stages to facilitate the readability of the diagram, or reordering. 4) The complete chapter 2 (Methodology) needs to be clarified. 5) In chapter 2, to keep consistency throughout the text, explain the VAE acronym (Variational Auto Encoder). 6) In chapter 2, video generation 2nd paragraph, make clear if the input image is 2D or 3D, and what is the 3rd dimension of latent tensors. 7) In chapter 2, video generation 2nd paragraph, you write “After the VAE is trained (…)”. Since your VAE is also generating a Gaussian latent space, please provide more clarification on why you used it instead of the also Gaussian latent space created by the forward path of the diffusion model. 8) In chapter 2, final paragraph of video generation section, you write “(…) We start by sampling Gaussian noise (…)”. Add this step to the diagram in figure 1. 9) In chapter 2, final paragraph of video generation section, regarding the LVDM: it should output a video already looking like the training dataset. This makes me question the utility of the VAE once again. 10) In chapter 2, video stitching section, it’s not clear how the LIDM 2D image, the Gaussian noise and the random LVEF value, are inputted to the LVDM (yellow block in fig.1) in the whole video stitching pipeline, plus the overlapping chunks. 11) This last detail, i.e. the overlapping chunks, are slightly described in chapter 3, however it should be done in the methodology chapter. 12) In chapter 2, evaluation of downstream tasks section, you write “To address this (…) de-identification protocol”. You shouldn’t call it de-identification protocol, as the data in EchoNet is already de-identified. The privacy filtering operation is clever. However it’s nothing more than making sure that the LIDM is not memorizing images from the training dataset. I strongly suggest rephrasing this step to “memorization sanity check”, for example. Privacy filtering suggests that there are some identifiers as name, gender, DOB, and such, on the training dataset, and this is not the case. 13) Across tables 1,2,3 and 4, highlight the best results, add PSNR units and also the standard deviation (when applicable, for example SSIM, PSNR, MSE,…). 14) In chapter 3, model evaluation 4th paragraph, did you explore what happens, in terms of memorization sanity check (what you call your privacy filtering step), to these videos conditioned with real images? Maybe you eliminated the memorization effect after LIDM but you might bring it back when you use a real image to condition the LVDM. Also, what would the results be if your did this filtering after LVDM instead of LIDM? 15) Regarding the results in figure 2, the speckle pattern in your images is not very realistic. This is a main concern when generating echocardiography images, so dedicate a few sentences to discuss it. 16) Regarding the future work, did you consider training 1 LIDM? Explore what would happen if you merged datasets with different views and characteristics (different vendors, for example).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The lack of clarity and the usage of the word “privacy” in a context which is not the most appropriate, are my main concerns. There’s a possibility for the paper to be accepted, but the comments need to be taken into account.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposed a pipeline for generating synthetic echocardiogram videos based on latent diffusion models. The proposed method can genereate privacy-preserving data that are useful for training models for down-stream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper proposed a novel pipeline for generating synthetic echocardiogram videos based on latent diffusion models. It adds a privacy filter step to prevent leakage of the original data.
    2. The proposed method is computationally efficient compared to the baselines since it mainly operates in the latent space. In addition, the authors proposed a video stitching strategy for generating long videos.
    3. Experimental results show the superiority of the proposed method compared to the baselines. The value of the synthetic data is also validated by the downstream task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The motivation of de-identifiying echocardiogram data is a little unclear to me since it seems to me that echocardiogram videos do not contain personally identifiable information. I guess it is useful when sharing the original dataset is not feasible due to privacy concerns/ regulations and the synthetic data can be used as a substitute.
    2. The proposed pipeline seems to be a combination of existing techniques such as VAE, LDMs, and re-identification model [21], which limits its technical novelty.
    3. The ejection fraction serves as an input condition for the LVDM. Nonetheless, there is no guarantee that the resulting video output will accurately reflect this ejection fraction, potentially leading to errors for downstream model training.

    Typo: In Table 2 “Rejected Samples” column, 11,25% should be 11.25%?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weakness section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is an interesting application of latent video diffusion models in the medical imaging domain, and demonstrates promising results. I have some minor concerns listed in the weakness section.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes an end-to-end framework for generating echocardiogram videos with dataset anonymization. The basic framework is built upon the latent video diffusion model, and a privacy filtering module is designed to remove ‘memorized’ samples. This framework is generally applicable for medical video generation with attempts to mitigate privacy concerns.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper addresses an important task, and in my opinion, medical video generation with controlled privacy issues is an under-explored topic.
    • The paper is technically sound and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The framework can be decomposed into several modules, namely video generation, stitching, and privacy filtering. Among them, video generation and stitching are well-explored in the video generation literature, and these modules are nothing new compared to video generation and other sequential generation works with diffusion models. Moreover, while the paper argues that privacy preservation is important (as shown in the title, abstract, and introduction), it seems that the relative module (filtering) is quite simple and is orthogonal to the video generation framework, meaning that for other types of data, we can also naively filter out those too similar samples with distance thresholding. Thus, this module has weak relation with the video generation framework. Therefore, the overall novelty is a concern for this paper.
    • Regarding the evaluation of the method, although metrics before and after filtering, and reject ratios are shown, the evaluation does not adequately exhibit the effectiveness of the filtering. Readers have no idea of what generated samples are ‘memorized’ and should be excluded. Perhaps more qualitative results could be presented to illustrate how the privacy-preserving module works.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In the section of related works, it is methoded that “VideoLDM extends LDMs to high-resolution video generation by turning pre-trained image LDMs into video generators by inserting temporal layers.” Maybe this could be better to say “Make-A-Video and VideoLDM extends…” since Make-A-Video [1] also did the same thing and came out much earlier.

    Since in the medical domain, 3D medical images are similar to video, it would also be great to include some works [2, 3] following 3D diffusion models for medical generation in the related work section.

    [1] Singer, Uriel, et al. “Make-a-video: Text-to-video generation without text-video data.” arXiv preprint arXiv:2209.14792 (2022). [2] Zhu, Lingting, et al. “Make-a-volume: Leveraging latent diffusion models for cross-modality 3d brain mri synthesis.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023. [3] Friedrich, Paul, et al. “WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis.” arXiv preprint arXiv:2402.19043 (2024).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic is important and under-explored. The main issues of the papers are the novelty and evaluation. Overall the paper is satisfactory.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers for their constructive feedback, and appreciation of our work.

Response to R1’s comments:

  • We appreciate Reviewer 1’s detailed and precise feedback on our methodology section. We will ensure that the camera-ready version of the paper provides a clearer explanation of our methods.
  • Our definition of ‘privacy’ is aligned with recent works such as Packhäuser et al. (2022). We will explicitly mention this and elaborate on our privacy framework in the final manuscript.
  • We agree on additional relevant literature on echocardiography GANs and DDMs. The relevant references will be incorporated into the revised paper.
  • To improve readability, we will highlight the best results in all tables in the final version.

Response to R3’s comments:

  • We used the EchoNet datasets, even though they are already anonymized, to ensure that our pipeline is fully reproducible. Exploring this approach on a private dataset would have limited its open source release.
  • The primary methodological innovation of our work lies in the pipeline itself. We will elaborate on this in the contribution section of the final manuscript. Additional contributions include domain specific open-source models and synthetic data, which we believe will be valuable for future research.
  • Due to space constraint, we did not show the accuracy of the Ejection Fraction in the echocardiograms generated with the LVDM, although it can be done using a pre-trained regression model as in Reynaud et al. (2023). We believe that in this work, the downstream task is proof enough of the LVDM precision.
  • The typo in Table 2 will be corrected.

Response to R4’s comments:

  • Our latent video stitching approach is, to the best of our knowledge, novel and provides excellent long-term temporal consistency due to precise model conditioning at test time. While the other components of our pipeline utilize existing state-of-the-art methods adapted to our specific task, our innovative combination of these methods to produce effective surrogate datasets is novel too. Additionally, our privacy filtering technique efficiently extends recent image-based methods, such as Packhäuser et al. (2022), to video data by filtering on conditioning frames rather than full videos.
  • Given the space constraints, we cannot include more qualitative results within the main body of the paper. However, we will add several examples, including frames identified as “too similar to the original data” and those that passed the filters, in the supplementary materials of the camera-ready version.

Reference:

  1. Packhäuser, K., Gündel, S., Münster, N., Syben, C., Christlein, V., Maier, A. (2022). Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest x-ray data. Scientific Reports, 12(1), 14851.
  2. Reynaud, H., Qiao, M., Dombrowski, M., Day, T., Razavi, R., Gomez, A., Leeson, P., Kainz, B.: Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis. In: MICCAI. pp. 142–152 (2023)




Meta-Review

Meta-review not available, early accepted paper.



back to top