Abstract

Optical coherence tomography (OCT) images are often acquired as highly anisotropic volumes, where the scanning step is dense along the fast axis but sparse along the slow axis. This affects image analysis such as image registration for longitudinal alignment. To create more isotropic volumes, bicubic interpolation can be used along the slow axis, but it generally produces blurry features. Registration-based interpolation can reduce blurriness, but often fails to generate realistic OCT images. Deep generative models can sample realistic images, but lack the structural consistency constraints required for interpolation. In this paper, we propose an unsupervised image interpolation method that combines registration-based interpolation with a deep generative model to overcome their individual limitations and improve the structural accuracy and realism of interpolated OCT images. We compare the proposed method with both bicubic and registration-based interpolation on real OCT datasets, and show that it achieves the best interpolation performance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4921_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WeiShu_Unsupervised_MICCAI2025,
        author = { Wei, Shuwen and Remedios, Samuel W. and Bian, Zhangxing and Wang, Shimeng and Chen, Junyu and Liu, Yihao and Jedynak, Bruno and Liu, Tin Y. A. and Saidha, Shiv and Calabresi, Peter A. and Prince, Jerry L. and Carass, Aaron},
        title = { { Unsupervised OCT image interpolation using deformable registration and generative models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an unsupervised image interpolation method that combines registration-based interpolation with a deep generative model to overcome their individual limitations and improve the structural accuracy and realism of interpolated OCT images. Compared to both bicubic and registration-based interpolation on real OCT datasets, the proposed method demonstrates superior interpolation performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors introduce an unsupervised image interpolation approach that combines registration-based interpolation with a deep generative model to overcome their individual limitations and enhance the structural accuracy and realism of interpolated OCT images.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Missing Experiments: The experimental evaluation lacks breadth and depth. Specifically: 1.The interpolation baselines (bicubic, FW, BW) are not clearly attributed to existing literature, nor is it explained whether these represent state-of-the-art or basic benchmarks. 2.The ablation study mainly evaluates the presence/absence label of DDPM but does not examine the effect of other architectural design choices or parameter settings. 3.There are no comparisons with more advanced or recent interpolation techniques in the context of OCT or volumetric medical imaging.

    Limited Novelty: The key components appear to be adapted from existing models:

    1.The registration network is looked like based on VoxelMorph, and the generative component adopts a DDPM structure.The paper does not clearly explain whether any architectural or training modifications were made, and if so, how they contribute to improved performance.

    Lack of Clarity:

    1. The interpolation model architecture is not clearly presented.
    2. The description of training setup and implementation details is too brief for reproduction or deeper understanding.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the proposed method shows promising results and introduces an unsupervised image interpolation approach, I have concerns regarding its novelty and the lack of comprehensive evaluation. Given the quality of the results, I lean toward reject.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    In this study, the authors proposed an unsupervised interpolation method to synthesize OCT B-scans for OCT volumes. The method is designed for the OCT protocols where the number of B-scans is limited but the image quality is good. The proposed pipeline primarily consists of a registration model and generative diffusion models. A common challenge in selecting OCT imaging protocols is a trade-off between acquiring high-quality B-scans, which is often preferred by retinal specialists, and obtaining better spatial or volumetric coverage, which is more useful for glaucoma specialists or neuro-ophthalmologists, who often prefer better visualization of optic nerve bundle defects in RPE en-face images or RNFL/GCC retinal thickness maps. The proposed method aims to address this issue by offering a solution that balances image quality and contextual detail.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method addresses a practical and clinically relevant problem in OCT imaging.
    2. The use of unsupervised learning in the proposed method is a more appropriate choice than traditional supervised learning or naïve interpolation methods for this task.
    3. The experimental design and comparative evaluations are meaningful and relevant.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The qualitative and quantitative improvements appear marginal given the complexity of the method.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall design of the proposed method is thoughtful. The authors make good use of diffusion models to generate intermediate B-scans between higher-quality adjacent ones, supplemented by a registration model and segmentation of retinal vessels and layers. The unsupervised approach is particularly well-suited to this application, where ground truth can be difficult to obtain for the retrospective datasets. However, some limitations may impact the overall strength of this study:

    1. The dataset details are not clearly described. It is unclear how many subjects or eyes are included in the 377 and 15 Spectralis OCT volumes in the training and test datasets, and what health conditions they represent.
    2. While the proposed design appears reasonable, the qualitative and quantitative improvement seem modest. For example, in Fig. 5, discontinuities remain noticeably visible compared to the 97 B-scan volume. If this was the best-case result, it is not super promising.
    3. Although Fig. 3 and Fig. 5 show improved vessel continuity in the RPE en-face images, motion artifacts persist in the en-face images. It would enhance the impact of the method if it could also address motion artifacts or even synthesize new B-scans to replace the B-scans with poor signal quality.
    4. With the growing availability of swept-source OCT (SS-OCT) systems, which provide higher scanning speed, better image quality, and wider field of view, the practical advantage of this approach may diminish. The value of the proposed solution could be limited by the hardware advancement.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I am satisfied with the authors’ responses to the reviewers’ comments.



Review #3

  • Please describe the contribution of the paper

    This paper addresses the generation of synthetic, interpolated OCT data to account for typical anisotropy in sampled OCT data (some axes being more dense and some being more sparse), using the power of denoising diffusion probabilistic models (DDPMs). In particular, the authors use a conditional autoencoder DDPM as the deep generative model. The variance is preserved, and the sampling strategy used is that of a denoising diffusion implicit model (DDIM), which allows deterministic mapping and a large jump between the latent space and the data space.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Comparing five different interpolation algorithms (bicubic, forward warp, backward warp, proposed (no label), and proposed (label)), the authors show that their proposed conditional autoencoder DDPM yields significantly better results compared to the rest for generation across all retinal layers. The paper’s strength is in the novelty of their methodology; in particular, the authors use a conditional autoencoder DDPM as the deep generative model. The variance is preserved, and the sampling strategy used is that of a denoising diffusion implicit model (DDIM), which allows deterministic mapping and a large jump between the latent space and the data space.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The weakness of the paper seems to be in the small dataset size (49 b-scans) and in the quality of images 3 and 5, which appear to be at quite low-resolution and are difficult to fully interpret. The future direction of showing whether the proposed interpolation method would be beneficial for retinal OCT volumetric registration in longitudinal analyses would also improve the paper quality.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s strength is in the novelty of their methodology; in particular, the authors use a conditional autoencoder DDPM as the deep generative model. The variance is preserved, and the sampling strategy used is that of a denoising diffusion implicit model (DDIM), which allows deterministic mapping and a large jump between the latent space and the data space. The weakness of the paper seems to be in the small dataset size (49 b-scans) and in the quality of images 3 and 5, which appear to be at quite low-resolution and are difficult to full interpret.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    concerns have been adequately addressed




Author Feedback

We thank the reviewers for their helpful comments and insights and address them below. Due to space limitations, we can only modify a small amount of material to respond to reviewers’ suggestions.

R1C1: Small dataset size (49 B-scans)

A: The retinal OCT dataset contains 377 Spectralis volumes that have 49 B-scans in each volume, and 15 Spectralis volumes that have 97 B-scans in each volume. Therefore, the total number of B-scans is 19,928. We will add “in each volume” to the first sentence in Sec. 3.

R1C2: Application to registration and longitudinal analysis

A: We do raise this point with our closing sentence, however, it is important in initial work (like this) that we do as much quantification of the proposed method before considering subsequent tasks. For this reason, and due to manuscript length restrictions, we have restricted our analysis to this quantification. (Also see R2C1.)

R2C1: Improvements appear marginal

A: Due to the lack of available data, we are limited in the depth of analyses and comparisons we can complete, for example, a longitudinal analysis that might better showcase the potential of our proposed work (see R1C2). We note that longitudinal analyses of OCT data are concerned with small sub-pixel changes in thickness; for those analyses these “marginal” improvements are critical.

R2C2: Data details

A: The 377 Spectralis volumes comes from 178 subjects, representing 296 unique eyes. The 15 Spectralis volumes come from 10 subjects, representing 15 unique eyes. We will modify Sec. 3: Dataset to incorporate these details.

R2C3: Motion Artifacts

A: We agree that correcting for motion artifacts is an interesting idea and would be useful. We note there has been some work on this concept within the literature and this is an excellent avenue for potential future work.

R2C4: Improving scanner hardware

A: Obsolescence is a criticism of any technology. However, we believe that no matter the state of the technology our approach will be applicable and potentially provide improved OCT images.

R3C1: Baselines and their Attributions

A: The baselines of Forward (FW) and Backward (BW) Warps, are defined in Sec. 3: Comparison methods. They represent ablations on our proposed approach. We appreciate R3 highlighting our failure to properly cite Bicubic, we will add a citation to the classical B-spline interpolation approach in the final manuscript.

R3C2: Ablation

A: We note that FW and BW (see R3C1) do not use the generative model but the registration model from our proposed architecture, therefore they represent ablations of our proposed method.

R3C3: Other Comparisons

A: R3 is correct in that we do not present other comparison methods. However, this arises for two reasons: 1) there is a dearth of such unsupervised interpolation methods for OCT images; 2) We have tried many supervised interpolation techniques. Because of the lack of ground truth data which is dense along both the OCT slow axis and the OCT fast axis, these methods fail to capture a continuous change of the anatomy across the OCT volume. This leads to such supervised methods to perform poorly on this task. As such, we have taken the decision not to present such results.

R3C4: Lack of clarity

A: We note that both R1 & R2 found the paper to be clear. We hope that our responses have also helped improve the clarity.

R3C5: Architecture, Training Setup, & Reproducibility

A: As we plan to make our code available, we anticipate that other researchers will be able to use our code and replicate our results.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top