Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The growing volume of high-resolution Whole Slide Images in digital histopathology poses significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but often fail to preserve fine-grained phenotypic details critical for downstream tasks. In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models as an efficient learned compression framework for pathology images. We systematically benchmark three AE models with varying compression levels and evaluate their reconstruction ability using pathology foundation models. We introduce a fine-tuning strategy to further enhance reconstruction fidelity that optimizes a pathology-specific learned perceptual metric. We validate our approach on downstream tasks, including segmentation, patch classification, and multiple instance learning, showing that replacing images with AE-compressed reconstructions leads to minimal performance degradation. Additionally, we propose a K-means clustering-based quantization method for AE latents, improving storage efficiency while maintaining reconstruction quality.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1570_paper.pdf

SharedIt Link: https://rdcu.be/eHwOg

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_42

Supplementary Material: Not Submitted

Link to the Code Repository

https://huggingface.co/collections/StonyBrook-CVLab/pathology-fine-tuned-aes-67d45f223a659ff2e3402dd0

Link to the Dataset(s)

TCGA: https://portal.gdc.cancer.gov/ NCT-CRC: https://zenodo.org/records/1214456 BCSS: https://bcsegmentation.grand-challenge.org/ CRAG: https://github.com/XiaoyuZHK/CRAG-Dataset_Aug_ToCOCO

BibTex

@InProceedings{YelSri_Pathology_MICCAI2025,
        author = { Yellapragada, Srikar AND Graikos, Alexandros AND Triaridis, Kostas AND Li, Zilinghan AND Nandi, Tarak Nath AND Madduri, Ravi K. AND Prasanna, Prateek AND Saltz, Joel AND Samaras, Dimitris},
        title = { { Pathology Image Compression with Pre-trained Autoencoders } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {442 -- 452}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper addresses the significant storage and transmission demands of digital pathology by repurposing pre-trained latent diffusion model (LDM) autoencoders for pathology image compression. The authors benchmark three such autoencoders and demonstrate their superior fidelity over traditional JPEG compression. They propose a fine-tuning strategy that aligns reconstructions with pathology-specific perceptual features using foundation models, and introduce a k-means-based latent quantization method to further reduce file size without compromising diagnostic utility.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The use of LDM autoencoders for high-fidelity compression of pathology images is well motivated and clearly articulated, leveraging pretrained models for effective repurposing.
- The proposed fine-tuning strategy using a pathology-specific perceptual loss (UNI encoder similarity) is simple yet effective, leading to meaningful boosts in reconstruction quality and downstream task performance.
- The authors demonstrate thorough quantitative evaluation across multiple downstream tasks with realistic patch-based pipelines.
- k-means-based latent quantization is a practical and technically sound addition that improves storage efficiency while outperforming naive int8 quantization.
- The results show that AE-compressed images can reach performance parity with raw inputs in classification and segmentation, highlighting the practical viability of learned compression.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The novelty is primarily in the application of existing components (LDM AEs, perceptual losses, k-means quantization) to a new usecase; while practical and impactful, conceptual innovation is limited.
- The impact of AE-based compression on real-time workflows is briefly mentioned but not quantitatively analyzed. Would AE decoding latency limit clinical adoption?
- Although the paper compares to JPEG, other domain-specific learned compression methods in digital pathology (e.g. low-rank decomposition) are not included in the comparison at all.
- The method relies on pre-existing large-scale training (e.g., LDMs), which may not be accessible for domain-specific retraining or adaptation in smaller institutions.
- No qualitative analysis or failure cases are shown for edge cases (e.g., rare subtypes or heavily stained artifacts), which could impact generalizability.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Include discussion of related pathology-specific compression approaches (such as [10], [11]) in the comparative analysis.
- Definitely consider adding timing benchmarks or inference cost of decoding AEs (in ms) versus JPEG to quantify the mentioned limitation.
- The paper could be further strengthened by providing a short qualitative analysis on failure cases, or discussing whether certain tissue types are harder to reconstruct faithfully.
- In Table 1, it might be helpful to show a direct comparison row of JPEG vs. each AE at equivalent file size to make practical differences clearer.
- While embedding similarity correlates with performance, it would be helpful to include scatterplots or correlation coefficients (between similarity and task accuracy) to further support the claim.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well executed, technically sound, and makes a contribution by adapting foundation models as autoencoders for digital pathology compression. The integration of perceptual fine-tuning and k-means quantization is both practically useful and modestly novel. The scope and depth of empirical validation justify an accept recommendation but I would like to see the mentioned improvements.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have answered my questions. The paper is a modestly valuable contribution in my opinion. Not outstanding but solid. Thus I recommend acceptance.

Review #2

Please describe the contribution of the paper

Against the background of high storage, transmission, and computational cost resulting from histopathology WSI, this paper repurposes to use autoencoders (AEs) of Latent Diffusion Models to serve as a compressor. To evaluate the performance, the paper presents the compression and reconstruction abilities of the method. In addition, it introduces a fine-tuning strategy to enhance reconstruction fidelity and a K-means clustering-based quantization method to improve storage efficiency. it evaluates three models (Stable Diffusion 1.5, Stable Diffusion 3, and DC-AE), and demonstrates minimal performance degradation on various downstream tasks, showing that this approach outperforms JPEG in both perceptual quality and downstream performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work explores how pretrained LDM autoencoders work as compressors in histopathology images. Though the pretrained distribution does not contain the histopathology dataset, the AE achieves high compression rate while preserving essential phenotypical details.

This work evaluates three AE performance, from the aspects of multiple compression rates, reconstruction fidelity, and downstream tasks, showing that the pretrained LDM autoencoders have comparable ability to JPEG.

This work shows how to use fine-tuning and apply k-means-based quantization to the AE latents to further improve the reconstruction fidelity.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

As the paper mentioned, the decompression using AEs is slower than JPEG. There can be a benefit if the paper includes time/memory benchmarking in the experiments. Also, fine-tuning cost is heavy (120,000 iterations on 8 GPUs)

The JPEG comparison experiment is missing in Table 2, otherwise, it is not symmetrical with what is in Table 3, to show the comparison with the traditional method JPEG.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

This paper quantitatively analyzed how well the pretrained LDM AEs can be used as compressors for histopathology images. Though the components are adapted rather than invented, it is a valuable topic since it is a common choice that, when people fine-tune the stable diffusion model with histopathology images, the encoder-decoder parts are frozen. This paper shows that the downgrade effect of not training the AE part is ignorable.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Efficient storage and transmission of pathology images is a growing need in digital pathology. This method addresses this real-world bottleneck and demonstrate the performance from multiple aspects, showing the potential for pathology community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Authors’ rebuttal reasonably answered my main question.

Review #3

Please describe the contribution of the paper

This paper proposes a compression framework for pathology images using fine-tuned autoencoders with an emphasis on preserving clinically-relevant features. Specifically, the authors benchmark multiple AEs (SD-1.5, SD-3, DC-AE), propose a decoder-only fine-tuning strategy using a pathology-specific perceptual loss based on the UNI foundation model, and introduce a K-means quantization method to further reduce storage overhead. They demonstrate that this method maintains performance on downstream tasks like segmentation and classification, with minimal loss in performance, outperforming standard JPEG compression at comparable file sizes.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Storage efficient quantization strategy. The k-means cluster applied to AE latents is a simple, yet effective alternative to static int8 quantization. Mapping to an 8-bit space while maintaining downstream performance is a noteworthy contribution, and could be considered by other methods dealing with learned latents. 2) Pathology-specific adaptation: Fine-tuning using the UNI embedding similarity as a perceptual loss is an interesting way to align decoded reconstructions while improving reconstruction. 3) Strong evaluation pipeline with multiple tasks across different datasets including classification and segmentation. 4) Since JPEG remains the standard for pathology image compression, this work offers a timely and compelling case for adopting learned alternatives.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Not in any particular order: 1) While performances with JPEG-10/50 are included, it is unclear which specific pathological structures are preserved or enhanced by the learned representation, especially in clinical pathology contexts. For clinical translation of this methods, the paper would benefit from a deeper analysis of what is gained over JPEG, particularly in terms of diagnostic features. 2) The finetuning gains are compelling, especially on the embedding similarity metrics, but it is worth questioning whether quantization in image space using non-domain specific AEs might achieve similar results. Additionally, the authors might consider whether training or adapting the encoder, perhaps using lightweight approaches like 2D LoRA, could yield greater pathology-specific gains without incurring high computational costs.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a practical and technically solid framework for pathology image compression using pre-trained autoencoders, with well-motivated modifications including decoder-only finetuning and latent-space quantization. The results are promising and clinically relevant. However, the precise clinical or semantic benefits of the proposed compression are not clearly delineated beyond embedding similarity, task accuracy, and segmentation. A more detailed analysis of the exact pathological features preserved through the perceptual loss, and a clearer comparison to what is lost or retained with JPEG, would strengthen the paper’s overall impact and support its case for clinical adoption.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I think this study discusses a methodological advancement and relevant topic worthy of presentation at Miccai. Although the authors mention that comparing their models to real-world clinical tasks is out of scope for this current versions, the authors show with their experiments that they obtain pathology-specific gains.

Author Feedback

We thank the reviewers for recognizing our work as “well motivated and technically solid” (R1, R3), addressing a “real world bottleneck” (R2), noting our k-means quantization as “practical and technically sound” (R1, R3) and a “noteworthy contribution”(R3). Our primary novelty is the effective repurposing of autoencoders for pathology image compression, which we achieve through two key methodological contributions: -Pathology-specific fine-tuning for reconstruction fidelity by aligning with perceptual features from pathology foundation models. -K-means clustering-based quantization tailored for AE latents, enhancing storage efficiency while maintaining performance.

Compression time vs JPEG (MR, R1, R2): While our AE approach is slower than JPEG (5 ms on GPU vs 0.8 ms on CPU to encode/decode 256x256 image), this is justified by superior reconstruction quality. Table 1 shows our fine-tuned DC-AE (2KB) achieves comparable UNI similarity to JPEG-50 (15KB) while offering ~7.5x storage saving. Further optimizations in the AE implementations can also improve the encoding/decoding speed.

AE-based compression and real-time workflows (R1): Storing slides requires annual provisioning of several PBs of storage, typically glacier or tertiary storage. For such high latency storage the time required for retrieval far exceeds the decoding. Drastically reducing storage requirements can make it financially practical to use lower-latency storage. Finally, a primary use of archived slides is research and model training; in such cases encoding/decoding latency is not a major issue.

Test set/generalization (R1, MR): We use images from a hold-out test set in Table 1. For Table 3, BCSS uses labeled images from TCGA-BRCA. To fine-tune the AEs, we randomly sampled images from TCGA-BRCA/CRC/PRAD. While we did not warrant that the image patches we used for training do not overlap with the WSIs of BCSS, the potential for overlap is minimal. In addition, NCT-CRC (Table 2) and CRAG (Table 3) are out-of-distribution for the fine-tuned AEs, demonstrating the robust generalization.

Qualitative Analysis of Preserved Features (R1, R3): In Fig.1, we compare DC-AE reconstructions with JPEG. The learned compression better preserves cell structures, while low-quality JPEG introduces block artifacts. Downstream classification and segmentation performance also highlight that the critical pathology features are better maintained. A qualitative study by expert pathologists, while valuable, represents a separate undertaking.

Comparison against TIFF formats (MR): OME-TIFF is a container that supports various compression schemes. With lossless LZW, we get:

TIFF-LZW: 192 KB (1.0 UNI similarity)

Finetuned DC-AE: 2 KB (0.86 UNI, 96x smaller)

SD-3 VAE: 16 KB (0.972, 12x smaller) Our work focused on lossy compression, thus JPEG was the primary baseline.

Comparison with domain-specific learned compression methods (R1): We use AEs pretrained on billions of diverse images. This contrasts with [10], [11], which are trained on substantially smaller datasets. Our subsequent pathology-specific fine-tuning significantly improves performance (Tables 1,2) without sacrificing generalizability (decoder-only fine-tune). Direct comparison with [10,11] was not possible due to incompleteness of the available code or complete lack thereof. Since our AEs are built on top of widely-used diffusion AEs, we plan to release the weights immediately. We will expand this discussion in the revised manuscript.

AE vs JPEG at equivalent file size (R1): Figure 2 (left) already provides this crucial comparison, showing how our AEs achieve superior UNI similarity over JPEG at any given file size.

JPEG comparison missing in Table 2 (R2): We will add these numbers to the revised manuscript.

Training the encoder with LoRA (R3): Decoder-only fine-tuning is sufficient to improve pathology-specific metrics, while preserving the latent space of the pre-trained encoder, maintaining generalizability.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

The authors should address the reviewers’ concerns, particularly by clarifying the advantages of their method over standard image compression formats, such as JPEG. Specifically, it is important to compare memory usage, image quality, and compression time in a more detailed manner. Furthermore, it should be clearly stated whether the results reported in Tables 1 and 3 are obtained from the training or test set. Given that deep neural networks are prone to memorization, only generalization performance on unseen test data should be considered when evaluating effectiveness. Eventually, authors should also explain why they only compared with JPEG format, and not with TIFF-based formats (especially OME-TIFF).
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

All three reviewers recommend acceptance, citing strong evaluation, clear writing, and practical relevance. While the paper primarily adapts existing components, it demonstrates credible utility through fine-tuning with pathology-specific perceptual losses and k-means-based latent quantization. The authors addressed minor concerns raised during the rebuttal, and the combination of effective design choices and thorough downstream validation makes the paper a valuable contribution to the MICCAI community.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Pathology Image Compression with Pre-trained Autoencoders

Author(s):