Abstract

Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty to synthesize high-resolution OCT volumes. In this paper, we introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. First, we propose non-holistic autoencoders to efficiently build a bidirectional mapping between high-resolution volume space and low-resolution latent space. In tandem with autoencoders, we propose cascaded diffusion processes to synthesize high-resolution OCT volumes with a global-to-local refinement process, amortizing the memory and computational demands. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods. Moreover, performance gains on two down-stream fine-grained segmentation tasks demonstrate the benefit of the proposed method in training deep learning models for medical imaging tasks. The code is public available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1438_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1438_supp.pdf

Link to the Code Repository

https://github.com/nicetomeetu21/CA-LDM

Link to the Dataset(s)

https://ieee-dataport.org/open-access/octa-500

BibTex

@InProceedings{Hua_Memoryefficient_MICCAI2024,
        author = { Huang, Kun and Ma, Xiao and Zhang, Yuhan and Su, Na and Yuan, Songtao and Liu, Yong and Chen, Qiang and Fu, Huazhu},
        title = { { Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a cascaded amortized latent diffusion model for the synthesis of high resolution retinal OCTs in a memory-efficient way. The method is a combination of a set of non-holistic autoencoders to map the image volumes to a low-resolution space and a cascaded diffusion model for the synthesis of the images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors tackle a crucial task of image generation for 3D images in high resolution in a memory efficient way.
    • The approach is, as far as I am concerned, novel.
    • The visual results of generated images are very convincing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The approach is very complex as hard to follow from the paper. The organization and graphical presentation of the methodological part could be significantly improved.
    • The method requires less memory for training and inference, however the synthesis of a volume takes around 10 min, which is probably explained by the complexity of the model.
    • The co-existence of the autoencoders and diffusion models is not explained thoroughly (e.g. trained separatelly?).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors mention “The code is public available.”, but no link is given. I hope a link will be given upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The methods section needs to be organized more neatly. Since the approach is very complex, the explanations in the methods section do not do it justice. It seems more like a list of different methods without making the connection between them clear, thus, it is hard to follow. The graphical abstract could also be improved, since it is a little chaotic.
    • The generated images look very promising, however checkerboard artifacts are visible on most of the projections. A discussion on this might be beneficial to the paper.
    • The segmentation result improvement seems to be minimal, especially for the layer segmentation task. (Dice: 0.941 vs. 0.944). Are the results statistically significant?
    • A comparison to a fully slice-wise or patch-wise method, would show the advantages of using the low.-resolution whole volume generation additionally.
    • Are the comparison approaches chosen carefully? LVD seems to be originally designed for videos.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method yields very promising results, however it is a very complex multipart approach that cannot be fully comprehended by reading the methods section.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposed a cascaded synthesis strategy to synthesize high-resolution OCT images that can create realistic OCT images as well as provide efficient synthetic augmentation for downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper has several strengths:

    1. The resolution of synthesized OCT has reached 512^3, surpassing previously achieved synthesis resolutions.

    2. The proposed CALDM is a novel synthesis pipeline.

    3. The modules proposed in this paper are also novel, including the Non-holistic Autoencoders and the 3D adapter that links the 3D and 2D information.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I didn’t identify any major weaknesses in this paper. For an image synthesis paper, it covers all the essential aspects and basic components, including a novel synthesis pipeline, high synthetic resolution, image quality assessment, and image utility assessment. However, there are many details lacking in this paper, and here are several aspects that could be improved:

    1. The method section is not clearly written, making it difficult to follow the architecture. According to the reviewer, the authors proposed first synthesizing a 3D latent expression, then projecting this 3D latent expression to a high-resolution encoded space, and finally obtaining volume information slice by slice with the decoder. How was each part of this complex network optimized? For example, what is the optimization function of NHAE? When NHAE is trained, is the 3D adapter optimized together?
    2. Doesn’t NHAE+Diff3D equal LDM3D? How was information obtained from NHAE+Diff3D+Diff2D without a multislice decoder? What is the decoder in this method?
    3. The synthesis pipeline appears to be a conditional synthesis pipeline. Where did the author add the segmentation condition? Is the segmentation condition encoded in the same way?
    4. Are the images shown in Figure 2 synthesized under specific conditions? If so, are other images synthesized, such as LVD or medical diffusion, under the same conditions? It should be a fair comparison.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would also be helpful to provide their implementation of LVD in 3D medical image synthesis.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see my comments in the previous section. Overall this paper lacks sufficient details regarding its method implementation and how the methods for comparison were implemented.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper should be accepted as long as the comparison algorithms and the ablation studies are clearly described. Without this, it is hard to justify that this is a fair comparison. The conditional synthesis strategy should also be described with details.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces CA-LDM, a memory-efficient method for synthesizing high-resolution OCT volumes. ​ It utilizes NHAE for compression and decompression, and cascaded diffusion processes for global consistency and high-resolution details. ​ CA-LDM outperforms existing methods, achieves a resolution of 5123, and demonstrates superior performance on a public dataset. ​ It benefits downstream segmentation tasks and addresses limited memory capacity. ​ The evaluation includes quantitative metrics and visual comparisons. The method shows potential for fine-grained segmentation tasks. ​

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces the cascaded amortized latent diffusion models (CA-LDM) method, which is a novel approach to synthesize high-resolution OCT volumes. ​ The CA-LDM method utilizes non-holistic autoencoders (NHAE) to efficiently compress and decompress high-resolution volumes into low-resolution representations, allowing for memory-efficient synthesis of high-quality volumetric images. ​ The CA-LDM method addresses the challenge of limited memory capacity by amortizing the memory and computational demands of synthesizing whole OCT volumes to different diffusion processes in latent spaces. ​ The experimental results demonstrate that the CA-LDM method outperforms existing methods in terms of generating realistic high-resolution OCT volumes. ​ The synthetic data generated by CA-LDM exhibit high-resolution and global features that surpass the capabilities of previous methods. ​ The paper demonstrates the potential benefits of the CA-LDM method in training deep learning models for medical imaging tasks, specifically in downstream fine-grained segmentation tasks. ​ The paper provides a comprehensive evaluation of the CA-LDM method, comparing it with existing 3D data synthesis models. ​ The quantitative evaluation metrics, such as Fréchet inception distance (FID) and total variation (TV), demonstrate the high quality of the synthesized volumetric images. ​ The visual comparisons and ablation studies further support the superiority of the CA-LDM method. ​ The paper evaluates the CA-LDM method on a public high-resolution OCT dataset, enhancing the credibility of the results. ​

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.: so these model can still produce 512^3 if I have large enough GPUs in parallel? indicate exactly how much greater gpu memory we would typically need with these old models for 512^3: “Despite the high fidelity can be achieved, these methods are limited to synthesizing volumes of up to 2563 in size, due to the substantial memory requirements for the training and inference phases.” also fig3 indicated that LVD and CA-LDM’s memory use is definitely toleratable which conflicts with the above mentioned statment. pls explain.

    2.: indicate reference for this statement: “which lack critical fine-grained or long-term information.”

    1. have you had clinical reader verifying the integrity of the images generated? require author to produce human rater’s mean opinion score result. particularly: fig4: the retinal background does not look quite right (some grid like shades) fig-s2: syn OCT row2 col2: is this clinical expected?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    abstract: “The code is public available.” however cannot be located

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.: so these model can still produce 512^3 if I have large enough GPUs in parallel? indicate exactly how much greater gpu memory we would typically need with these old models for 512^3: “Despite the high fidelity can be achieved, these methods are limited to synthesizing volumes of up to 2563 in size, due to the substantial memory requirements for the training and inference phases.” also fig3 indicated that LVD and CA-LDM’s memory use is definitely toleratable which conflicts with the above mentioned statment. pls explain.

    2.: indicate reference for this statement: “which lack critical fine-grained or long-term information.”

    1. have you had clinical reader verifying the integrity of the images generated? require author to produce human rater’s mean opinion score result. particularly: fig4: the retinal background does not look quite right (some grid like shades) fig-s2: syn OCT row2 col2: is this clinical expected?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    nice method; interesting result and novelty ideas however: question the integrity of the images generated: recommend append MOS result from clinical readers some statements for past methods were not backed by justification or ref. minor deficiency

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top