Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Optical Coherence Tomography Angiography (OCTA) and its derived en-face projections provide high-resolution visualization of the retinal and choroidal vasculature, which is critical for the rapid and accurate diagnosis of retinal diseases. However, acquiring high-quality OCTA images is challenging due to motion sensitivity and the high costs associated with software modifications for conventional OCT devices. Moreover, current deep learning methods for OCT-to-OCTA translation often overlook the vascular differences across retinal layers and struggle to reconstruct the intricate, dense vascular details necessary for reliable diagnosis.To overcome these limitations, we propose XOCT, a novel deep learning framework that integrates Cross-Dimensional Supervision (CDS) with a Multi-Scale Feature Fusion (MSFF) network for layer-aware vascular reconstruction. Our CDS module leverages 2D layer-wise en-face projections, generated via segmentation-weighted z-axis averaging, as supervisory signals to compel the network to learn distinct representations for each retinal layer through fine-grained, targeted guidance. Meanwhile, the MSFF module enhances vessel delineation through multi-scale feature extraction combined with a channel reweighting strategy, effectively capturing vascular details at multiple spatial scales. Our experiments on the OCTA-500 dataset demonstrate XOCT’s improvements, especially for the en-face projections which are significant for clinical evaluation of retinal pathologies, underscoring its potential to enhance OCTA accessibility, reliability, and diagnostic value for ophthalmic disease detection and monitoring. The code is available at https://github.com/uci-cbcl/XOCT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4977_paper.pdf

SharedIt Link: https://rdcu.be/eHaYM

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04965-0_65

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/uci-cbcl/XOCT

Link to the Dataset(s)

https://ieee-dataport.org/open-access/octa-500

BibTex

@InProceedings{KhoPoo_XOCT_MICCAI2025,
        author = { Khosravi, Pooya AND Han, Kun AND Wu, Anthony T. AND Rezvani, Arghavan AND Feng, Zexin AND Xie, Xiaohui},
        title = { { XOCT: Enhancing OCT to OCTA Translation via Cross-Dimensional Supervised Multi-Scale Feature Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {695 -- 705}
}

Reviews

Review #1

Please describe the contribution of the paper

This study introduces XOCT, a deep learning framework that improves Optical Coherence Tomography Angiography (OCTA) imaging. XOCT combines Cross-Dimensional Supervision (CDS) and Multi-Scale Feature Fusion (MSFF) to enhance vascular reconstruction by targeting distinct retinal layers and refining vessel delineation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This approach effectively captures detailed vascular structures across different retinal layers by combining 2D and 3D information. It enhances the accuracy of vascular reconstruction while maintaining the fine details necessary for reliable diagnosis in retinal imaging.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The authors use a kernel size of only 5, which makes me doubtful about their claim that the proposed method can capture global vascular connectivity. Why not use self-attention instead?
2. The method utilizes multiple loss functions, yet the balancing of these functions is not adequately addressed. The lack of detailed explanation and experimental validation is a significant gap. While space constraints may be a factor, this remains a critical issue in the training process.
3. The method is validated on only one dataset, which raises concerns about its generalizability.
4. The absence of detailed training experiments makes reproducing the method challenging.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method lacks comprehensive ablation studies and the necessary details for reproducibility.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors’ rebuttal sufficiently addresses my previous concerns.

Review #2

Please describe the contribution of the paper

There are two main aspects of this paper, one is the addition of Multi-Scale Feature Fusion (MSFF) network and the other is the Cross-Dimensional Supervision (CDS) to the OCT-OCTA translation task. The goal of the MSFF is to integrate the global and local information from multiple scales by employing the isotropic and anisotropic convolution kernels for local features adapted to vessel structures, and depth-wise large-kernel convolutions for global vessel connectivity. For vascular details in OCTA imaging, the paper also applies CDS by adding a retinal layer segmentation weighted projection module. The paper also combines multiple loss functions to enhance vessel fidelity across dimensions: pixel-wise, adversarial, perceptual for 2D and 3D.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-organized and concise. The use of specialized modules to enhance vessel consistency across multiple dimensions presents an interesting strategy for improving OCT-OCTA translation. Notably, the paper includes both quantitative evaluations—covering the entire volume and various projections—and qualitative assessments of the generated vessel structures. The qualitative results are particularly compelling, demonstrating strong potential for high-quality OCTA reconstruction from OCT data. Furthermore, the ablation study highlights significant improvements in the en-face projections due to the proposed modules.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

It is important to note that, although the use of MSFF in OCT-OCTA translation is novel, the paper lacks sufficient background and references related to existing multi-scale fusion approaches. Additionally, the key differences between the proposed method and other multi-scale strategies (e.g., [1]) are not clearly articulated. While the reported results show noticeable improvements over the baselines, there are inconsistencies when compared to recently accepted work, such as MuTri at CVPR 2025 [2], which uses the same datasets. These discrepancies also extend to the original TransPro paper [3]. In particular, the MAE scores for 3D evaluation presented in Table 1 of this paper differ from those reported in Table 1 of [2] for TransPro and Pix2Pix3D. A clearer explanation is needed to address this inconsistency. Furthermore, it remains unclear why the proposed modules show little to no improvement in the 3D evaluation metrics, whereas noticeable gains are observed in 2D evaluations.

[1] Sun, L., Shao, W., Zhu, Q., Wang, M., Li, G., & Zhang, D. (2023). Multi-scale multi-hierarchy attention convolutional neural network for fetal brain extraction. Pattern Recognition, 133, 109029.

[2] Chen, Z., Wang, H., Ou, C., & Li, X. (2025). MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation. arXiv preprint arXiv:2504.01428.

[3] Li, S., Zhang, D., & Li, X. Vessel-Promoted OCT to OCTA Image Translation by Heuristic Contextual Constraints. Published online March 12, 2023.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a novel application of multi-scale fusion and cross-dimensional supervision leveraging retinal layers from OCT in the OCT-OCTA translation task. The combined use of these two modules appears well-motivated and demonstrates improvements over baseline methods. However, there are discrepancies in the reported baseline MAE values when compared to two other papers addressing the same task using identical datasets (see Weaknesses). Further clarification from the authors would be valuable to resolve these inconsistencies and enhance confidence in the proposed approach.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have adequately addressed the concerns regarding MAE discrepancies with prior work, the novelty of the MSFF module, and the rationale for the limited improvements in 3D metrics. These clarifications should be incorporated into the final version of the paper to enhance transparency and ensure a more complete understanding of the contributions.

Review #3

Please describe the contribution of the paper

The paper presents image-to-image translation between OCT and OCTA images by implementing XOCT, a novel deep learning framework combining cross-dimensional supervision and multi-scale feature fusion (MSFF) for reconstructing layer-wise vasculature information. The model leverages en-face projections from segmentation-weighted z-axis averaging, enabling the network to learn unique representations for each retinal layer, while the MSFF module allows for multi-scale feature extraction.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Strengths include the multi-pronged model architecture employing layer segmentations for cross-dimensional supervision, multi-scale feature fusion, and a perceptual composite loss made up of the L1 loss for pixelwise accuracy, adversarial loss for anatomical realism, and perceptual loss for high-level structural fidelity.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Method is compared to multiple SOTA models and achieves enhanced performance on all but 3D volumes? The dataset of OCT 500 is relatively small. Validation on another dataset for generalization would improve the paper’s value.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Strengths include the multi-pronged model architecture employing layer segmentations for cross-dimensional supervision, multi-scale feature fusion, and a perceptual composite loss made up of the L1 loss for pixelwise accuracy, adversarial loss for anatomical realism, and perceptual loss for high-level structural fidelity. Method is compared to multiple SOTA models and achieves enhanced performance on all but 3D volumes? The dataset of OCT 500 is relatively small. Validation on another dataset for generalization would improve the paper’s value.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

concerns adequately addressed

Author Feedback

We thank the reviewers (R1, R2, R3) for their positive feedback on XOCT’s novelty, high vasculature fidelity, and strong qualitative results. We will revise accordingly. [R3] 3D MAE Discrepancy Compared to TransPro/MuTri: The higher 3D MAE (0.078) reported in TransPro and adopted by MuTri stems from an implementation issue in their evaluation: MAE was computed on uint8 data type, causing overflow errors (e.g., abs(1 - 2) = 255). We replicated this by applying uint8-based MAE to TransPro’s output and obtained a similar inflated value (0.079). In contrast, we use float32 to calculate and get smaller MAEs (3.3/255=0.013 for Transpro), ensuring fair and accurate comparison.

[R1, R2] Dataset/Generalization: OCTA-500 is currently the only public dataset available for 3D OCT-to-OCTA translation and is also the only dataset used by prior works such as MultiGAN, TransPro, etc. It includes both 3mm and 6mm subsets, and we evaluate on both to demonstrate generalization across acquisition settings. We are collecting a private dataset for broader validation and plan to release it. This direction will be emphasized in revision.

[R2] Reproducibility: Due to space limits, we provide additional training and loss configuration details here. Training: XOCT is based on a modified 3D Pix2Pix backbone. The generator (G_3D) and discriminators (D_3D, D_2D) are trained adversarially using Adam optimizer (lr 1e-4, batch size 1, 300 epochs). Loss Configuration: Following TransPro, we assign equal weights to 2D/3D adversarial and equal weights to 2D/3D L1 losses to reduce hyperparameter tuning. A grid search over L1 weights [1, 2, 5, 10, 20] and perceptual weights [0.5, 1, 2, 5] (with adversarial fixed at 1) showed best results with L1 = 10, Perceptual = 1. We will include these details and results in revision and release code and pretrained model upon acceptance.

[R2] Kernel Size for Vascular Connectivity: Vascular connectivity is key for accurate OCTA reconstruction and a large kernel is preferred to capture these connectivities. Network with kernel size 5 provides a large enough receptive field relative to the input region. Larger kernels (7, 9) offered marginal MAE improvements (19.14, 19.09 on 3M Full_Proj) but increased computation due to cubic scaling. We selected kernel size 5 as a balance between accuracy and efficiency; ablation results will be included in revision. We also tested Swin Transformers as a self-attention alternative, but they underperformed in capturing fine, dense vessels, likely due to limitations in modeling local patterns using transformer structure. Full 3D self-attention also incurs prohibitive computational cost. We will tone down the global connectivity claim in revision and clarify these design choices.

[R3] MSFF Background/Novelty: Our MSFF module is tailored to OCTA reconstruction. It uses 1D directional kernels (3×1×1, 1×3×1, 1×1×3) as edge detectors to capture elongated vessels and larger 3D kernels (5×5×5) for broader vessel connectivity. In contrast to prior works (e.g., Sun et al.), which focus on segmentation and use different kernel designs, our MSFF is specifically aligned for 3D vascular reconstruction. We will expand Sec. 3.2 to highlight this discussion.

[R1, R3] 3D Improvements: The strong 2D improvements arise from CDS, which provides layer-wise guidance aligned with retinal anatomy. This improves vascular continuity in projections and yields better 2D metrics. In 3D, we use same losses as Pix2Pix3D and BBDM3D*, but baseline models tend to generate blurred vasculature to reduce voxel-wise error due to the intensity distribution bias. In contrast, our method produces sharper, anatomically faithful vessels via CDS and MSFF. However, 3D MAE, dominated by overall intensity differences, may not fully reflect these structural gains. Despite similar 3D MAE scores, our qualitative results clearly show improved vessel clarity. We will clarify this and consider structure-aware metrics as future work.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I strongly recommend to reject this paper. First of all, the paper want to predict a vessel contrast from a non-contrast image, which is generally prone to hallucination. The paper does not provide and argument why the technique should work at all. In the results vessels are missing in all OCT->OCTA images, clear evidence of hallucination. Clinically relevant cases and pathologies are not explored such as micro anyeurisms. The claim that reference 2 mentions high cost of the technology is wrong. The claim does actually not even appear in that reference. OCTA is simply a repeated OCT scan and subsequent noise analysis. Simple to capture with a standard OCT device. So why hallucinate what can easily be measured?

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

XOCT: Enhancing OCT to OCTA Translation via Cross-Dimensional Supervised Multi-Scale Feature Learning

Author(s):