Abstract

Medical imaging data contain sensitive patient information requiring strong privacy protection. Many analytical setups require data to be sent to a server for inference purposes. Homomorphic encryption (HE) provides a solution by allowing computations to be performed on encrypted data without revealing the original information. However, HE inference is computationally expensive, particularly for large images (e.g., chest X-rays). In this study, we propose an HE inference framework for medical images that uses VQGAN to compress images into latent representations, thereby significantly reducing the computational burden while preserving image quality. We approximate the activation functions with lower-degree polynomials to balance the accuracy and efficiency in compliance with HE requirements. We observed that a downsampling factor of eight for compression achieved an optimal balance between performance and computational cost. We further adapted the squeeze and excitation module, which is known to improve traditional CNNs, to enhance the HE framework. Our method was tested on two chest X-ray datasets for multi-label classification tasks using vanilla CNN backbones. Although HE inference remains relatively slow and introduces minor performance differences compared with unencrypted inference, our approach shows strong potential for practical use in medical images.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0621_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/jongdory/Latent-HE

Link to the Dataset(s)

CheXpert dataset: https://stanfordmlgroup.github.io/competitions/chexpert/ NIH dataset: https://www.kaggle.com/datasets/nih-chest-xrays/data

BibTex

@InProceedings{KimJon_Privacy_MICCAI2025,
        author = { Kim, Jonghun and Jo, Gyeongdeok and Ra, Sinyoung and Park, Hyunjin},
        title = { { Privacy Preserving Chest X-ray Classification in Latent Space with Homomorphically Encrypted Neural Inference } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {497 -- 507}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a privacy-preserving framework for homomorphic encryption (HE) inference in chest X-ray image classification. The proposed approach employs a VQGAN model to compress input images, thereby reducing the computational overhead. To comply with the limitations of HE, the authors approximate activation functions using low-degree polynomials. Additionally, the squeeze-and-excitation (SE) module is incorporated to enhance classification performance. The framework is evaluated using four CNN architectures on two chest X-ray datasets, and the results are encouraging.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper provides a comprehensive evaluation of the proposed method using two X-ray datasets and four CNN architectures.
    2. The method is applied to high-resolution images (256×256), addressing a key limitation in prior studies, which typically focus on low-resolution inputs (e.g., 32×32).
    3. The manuscript is well-written
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The rationale for selecting VQGAN for compression should be better justified, possibly with a comparison with alternative compression methods.
    2. The method is not compared against existing approaches.
    3. The CNN architectures used for evaluation (e.g., LeNet, VGGNet) are relatively dated.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. Please include the statistical evidence in the Abstract and Conclusion to make it easier for the readers to get insight about the research work.
    2. Please include a bulleted list in the Introduction section to clearly outline the primary objectives of the research work. While this may not directly enhance readability, it will aid in comprehending the main aims of the study.
    3. Please revise the title of section 5.
    4. Most of references are outdated, with only one citation from 2024 and one from 2023. Please incorporate and discuss more recent studies.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental evaluation is limited to relatively outdated architectures. Additionally, the lack of comparison with existing approaches makes it difficult to contextualize the contribution and assess its relative performance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Due to the lack of comparison with existing HE methods, which makes it challenging to contextualize the contribution and assess its relative performance, I maintain my rejection.



Review #2

  • Please describe the contribution of the paper

    The authors introduce two key element to enable and speed up medical image classification with homomorphic encryption (HE). The first one is the combination of squeeze and excitation with polynomial approximations to allow implementing non linear activation functions such as ReLU and Sigmoid. The second one is to compress the image into latent representation prior to HE to minimize computational cost related to DL operations in the encrypted domain. Experimental results demonstrate the validity of the approach and importance of the related components.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a valid solution for HE inference allowing privacy prserving processing of 2D medical images. The paper is well written and the experiments are well informative on the impact of the two key steps on inference time and x-ray classification performance.

    The use of polynomials to allow implementing non-linear activation functions is original.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The application scenario is little developed and discussed. In particular, the approach require to have a VQGAN on the client side which significantly increases the complexity. It also questions the importance of having the classifier on the servers side instead of client side with e.g. shared weights via distributed learning. While we can deduce that the approach would allow large scale training on the server side, it remains implicit and a description of a clear application scenario would be important to better understand the value of the approach. In addition, a discussion on the acceptability of the inference times mentioned in Table 3 and Fig. 5 A would be important to compute e.g. total training time with e.g. 500 images. Discussing potential extensions to 3D imaging would also be important.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper contains some original contributions, is well written and provides adequate experiments. Extended discussions are needed to justify the value of the proposed approach.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors partially clarified my concerns. I still think that the application scenario is unclear. It is crucial that the authors clarify the application scenario and include the other most important criticism addressed in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    It introduces a novel framework for enabling homomorphic encryption (HE) inference on high-resolution medical images, which is a technically challenging task. The contribution lies in: 1- Compressing images with VQGAN to reduce computational load. 2-Adapting squeeze-and-excitation modules for encrypted CNNs.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel Use of VQGAN for Encrypted Inference A key strength of the paper is the innovative integration of VQGAN to compress high-resolution medical images ( chest X-rays) into latent representations before encryption

    2. Methodical Adaptation of CNNs for HE Compatibility The adaptation of standard CNNs by approximating non-linear activation functions (ReLU, Sigmoid) with low-degree polynomials.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Encryption Inference Remains Impractically Slow Despite compression using VQGAN, the inference time remains extremely high (e.g., ~92 seconds/sample for ResNet20 with SE at f=8), which severely limits the method’s feasibility in real-time clinical applications.

    2. The study is focused exclusively on chest X-rays only.

    3. No Open Access to Code

    4. The paper emphasizes performance and compression trade-offs but lacks a quantitative analysis of privacy guarantees or HE parameter settings such as key sizes, and noise budgets

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a methodologically novel and practically relevant framework for enabling privacy-preserving inference on high-resolution medical images using homomorphic encryption (HE). It addresses key computational challenges by:

    Introducing VQGAN-based compression to reduce HE computation load.

    Implementing low-degree polynomial approximations for compatibility with HE schemes (CKKS).

    Adapting SE modules to boost encrypted CNN performance.

    Validating across two large and clinically relevant chest X-ray datasets (CheXpert and NIH), achieving results close to unencrypted models.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    he authors’ rebuttal adequately addresses the primary concerns raised during the review process. They provide clear technical clarifications regarding the role of VQGAN, the training/inference split, and the practical reasons for client-side compression. Their justification for model and baseline selection aligns with standards in the homomorphic encryption (HE) literature, and the choice of VQGAN is well-motivated given the need for reconstruction quality in clinical settings.

    While limitations remain—particularly around inference efficiency and generalization to 3D—the paper makes a strong practical contribution by demonstrating a viable HE inference pipeline for medical images.




Author Feedback

We sincerely appreciate your efforts to review our paper. We hope to address your concerns and misunderstandings and further improve our work. We will clarify these issues in the text.

R1. Training and inference: Both VQGAN and the classifier are trained under unencrypted environments. Only the inference runs over encrypted data. Client-side VQGAN: Performing VQGAN image compression on a single image is feasible on a typical CPU and completes quickly (less than 10 seconds). In contrast, if VQGAN were run server-side on encrypted 256×256 images, the computational cost and latency grow exponentially. Supporting VQGAN operations under homomorphic encryption (HE) also requires building it from simple add/multiply modules, which becomes impractical for HE. Distributing the pretrained compression model to the client keeps inference times reasonable. Distributed Learning and HE: Distributed learning enables parallelized model training across multiple nodes, while our use case focuses on secure, encrypted inference. These approaches are complementary and our framework can integrate distributed learning for large-scale training if desired. Training time and 3D image: The model is trained unencrypted on a GPU. During inference under HE, the decrypted outputs approximate the predictions of the original model. Thus, training time remains the same as for conventional (unencrypted) models. Although the heavy computational cost of HE inference makes 3D extension challenging, ongoing advances may enable its future application to 3D medical imaging.

R2-1. Compression methods. VQGAN was chosen because it provides effective compression and high-quality image reconstruction at high compression rates [1,2]. A high compression rate is relevant in our HE setting due to computation constraints. While pretrained encoders like ResNet can perform image compression, they are essentially designed as a feature extractors not necessarily optimized for image reconstruction. Our key requirement is to recover the original image well for human inspection. We believe this ability is more important and thus VQGAN was chosen.

R2-2,3. Limited comparison. In recent work on HE, all CNNs are deliberately simple due to the prohibitive cost of HE inference [3, 4]. Architectural novelty is not the main focus of HE studies. So our baselines, drawn directly from these studies, are both standard and adequate. More importantly, our contribution is to demonstrate a full HE inference pipeline for medical images, which we have clearly shown. Compared to other HE studies [3, 4], we believe our comparisons are sufficient.

R3-1. Slow inference. We acknowledge, even with VQGAN compression, that HE inference remains computationally expensive. Ongoing research into optimized algorithms and hardware support is actively addressing this, and we believe that these advances will make real-time clinical use feasible.

R3-2. While our experiments focus on chest X-rays, our pipeline does not assume chest X-rays. Thus, it can be applied to other organs of 2D modalities.

R3-3. We will release our code upon acceptance.

R3-4. HE parameters must be tailored to the multiplication depth of each network. Since each network compared had different depths, LeNet, VGGNet, HefNet, and ResNet used different key sizes and noise budgets. We selected our settings based on the optimal combinations from previous work [3, 4]. Due to space limitations, we will include these details in our code release.

[1] Esser, Patrick et al. “Taming transformers for high-resolution image synthesis.” CVPR 2021 [2] Rombach, Robin et al. “High-resolution image synthesis with latent diffusion models.” CVPR 2022 [3] Lee, Eunsang et al. “Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions.” ICML 2022 [4] Ran, Ran et al. “Spencnn: orchestrating encoding and sparsity for fast homomorphically encrypted neural network inference.” ICML 2023




Meta-Review

Meta-review #1

  • Your recommendation

    ; Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    ; The authors should clarify the points raised by the reviewers in their rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept;

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top