Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cardiovascular disease (CVD) remains the leading cause of death worldwide, requiring urgent development of effective risk assessment methods for timely intervention. While current research has introduced non-invasive and efficient approaches to predict CVD risk from retinal imaging with deep learning models, the commonly used fundus photographs and Optical Coherence Tomography (OCT) fail to capture detailed vascular features critical for CVD assessment compared with OCT angiography (OCTA) images. Moreover, existing methods typically classify CVD risk only as high or low, without providing a deeper analysis on CVD-related blood factor conditions, thus limiting prediction accuracy and clinical utility. As a result, we propose a novel multi-purpose paradigm of CVD risk assessment that jointly performs CVD risk and CVD-related condition prediction, aligning with clinical experiences. Based on this core idea, we introduce OCTA-CVD, the first OCTA dataset for CVD risk assessment, and a Vessel-Aware Mamba-based Prediction model with Informative Enhancement (VAMPIRE) based on OCTA enface images. Our proposed model aims to extract crucial vascular characteristics through two key components: (1) a Mamba-Based Directional (MBD) Module that captures fine-grained vascular trajectory features and (2) an Information-Enhanced Morphological (IEM) Module that incorporates comprehensive vessel morphology knowledge. Experimental results demonstrate that our method can surpass standard classification backbones, OCTA-based detection methods, and ophthalmologic foundation models. Our codes and the collected OCTA-CVD dataset are available at https://github.com/xmed-lab/VAMPIRE.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3405_paper.pdf

SharedIt Link: https://rdcu.be/eG4Ee

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_63

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/xmed-lab/VAMPIRE

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanLeh_VAMPIRE_MICCAI2025,
        author = { Wang, Lehan AND Wang, Hualiang AND Ou, Chubin AND Chen, Lushi AND Liang, Yunyi AND Li, Xiaomeng},
        title = { { VAMPIRE: Uncovering Vessel Directional and Morphological Information from OCTA Images for Cardiovascular Disease Risk Factor Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {649 -- 659}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a novel framework for assessing the risk of cardiovascular disease, as well as individual risk factors, based on OCTA Images. The two main technical pillars are a Mamba-based model for extracting features from vascular trajectories and an “information-enhanced morphological” module which uses a vision-language model to extract descriptions of vascular morphologies. Results on an internal dataset are compared to several standard architectures, as well as to recent domain specific foundation models. A smaller selection of competitors is also evaluated on an external dataset, and an ablation investigates the relative contributions of the two main modules described above.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper addresses an interesting task that is relevant to MICCAI
- The proposed framework involves modern methodology that has not previously been applied to the problem at hand
- Results suggest a benefit over the previous state of the art, and of both proposed building blocks
- In addition to their technical contribution, authors collected a new dataset that they promise to make available to the scientific community
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- A key idea of the proposed approach is to trace segmented vessels to put the corresponding image patches into a linear order in which they can be presented to a state space model (SSM). It is not really discussed how this approach deals with bifurcations in the vessel tree, how the intermediate background patches that separate the branches should be chosen, and whether it is important to always present the different arcades in the same order (and, if so, how this is ensured).
- Similarly, even though the ablation demonstrates a benefit of the proposed “vessel-following” strategy over using Mamba with simple linear or diagonal scanning, it does not provide a direct comparison in which the same information would be processed by a non-SSM (e.g., transformer) that would not require imposing a - somewhat artificial - linear ordering in the first place.
- The second key idea is to add information from a vision language model. The description of this is very brief; e.g., the use of “learnable prompts” is mentioned, but it is not clear how those prompts are learned, or what are examples of the resulting prompts.
- Similarly, even though the ablation suggests a small benefit from the proposed IEM in terms of final F1 scores and AUC, I would be curious to learn more about how accurate the descriptions that it has provided actually were
- Even though a comparison to many competing methods is provided, it is unclear how fair that comparison is in terms of effort that has gone into hyperparameter tuning, number of parameters / computational effort etc.
- Some technical details (e.g., the exact post-processing of vessel segmentation maps in 2.2) are glossed over; however, this should not be such a big concern given that authors promise to make code available
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Please double-check the manuscript for typos, e.g., “trajetories”
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Even though this submission addresses an interesting problem, I see it at the borderline for the MICCAI main conference. In particular, I am not fully convinced of the proposed vessel-following strategy, which is at the core of this work. Focusing on vessels and including a positional encoding without imposing an (in my opinion artificial) linear ordering still seems like a simpler and possibly more effective alternative. However, considering the overall level of contribution (technical and data) and experimental effort in this work, I believe authors should be given the opportunity of a rebuttal.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Given the massive increase in trainable parameters that is mentioned in the rebuttal, I am not convinced that the reported benefit compared to the baseline is really due to architectural innovations, and not simply due to the overall increase in network capacity. I believe this should be confirmed by adding a fair comparison to a matched (more powerful) baseline before publication.

Review #2

Please describe the contribution of the paper

This work aims to detect the 10 year risk of cardiovascular disease (CVD) as well as four of its blood test related risk factors (glucose, cholestrol, triglycerides and blood pressure) from retinal OCTA images which are acquired in a non-invasive manner. Predicting these risk factors in addition to the binary low/high 10-year CVD risk improves the performance from 83.3 to 87.5 %.A new architecture comprising VAMPIRE blocks has been proposed. Each VAMPIRE block is similar to a transformer layer with the Self-attention block replaced by Mamba and the feed forward layer replaced by cross-attention with text embedding which describes the vessel morphology. The key contributions are (a) A new ordering of the tokens which follows the main vessel branches in the Mamba layer was proposed and found to outperform the linear and diagonal token ordering (b) the feed-forward layer in transformer is replaced by cross attention with text embeddings from a frozen text encoder with learnable prompts which takes a chatgpt generated text describing the vessel morphology as input. The text generation was guided with output predictions from an OCTA disease classifier.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method explores two main contributions : (a) A new ordering of the tokens which follows the main vessel branches in the Mamba layer was proposed and found to outperform the linear and diagonal token ordering (b) the feed-forward layer in transformer is replaced by cross attention with text embeddings from a frozen text encoder with learnable prompts which takes a chatgpt generated text describing the vessel morphology as input. Extensive evaluation in the results section has successfully justified the utility of both these contributions in improving performance. Additionally, predicting the blood test based risk factors in addition to the binary CVD risk has been shown to improve the CVD risk performance in addition to improving the model’s interpretibility.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The method requires many pre-requisite algorithms such as vessel segmentation to guide the ordering of tokens in Mamba and a disease classifier to generate the text required for cross-attention in the IEM module.
2. The proposed method replaces the Self-attention layer in each transformer block with Mamba which is computationally efficient (O(N log N) instead of O(N^2)). However, the IEM block comprising cross attention with a text encoder with learnable prompts makes the proposed architecture much more computationally expensive than a standard transformer block which employs a feed forward layer instead.
3. The Mamba based directional module proposes to order the input tokens by following the main vessel segmentations in the image. This refinement to the token ordering is very specific to the OCTA images and cannot be easily generalized to other anatomical structures which do not have extensive vasculature. Moreover, the dependence of performance on the ordering of tokens might be the result of using a uni-directional mamba. An evaluation of the effect of ordering for bidirectional architecture such as bi-Mamba should have been performed as they may be less sensitive to the token ordering.
4. It is not clear what retinal disease classes are detected by the classifier which is used to generate the text with chatgpt. Are these disease classes in any way related to CVD?
5. The method performs cross attention with text embeddings of synthetically generated text descriptions generated with chatgpt prompted with predictions from a disease classifier. Can this chatgpt text description generation be avoided and a cross attention be directly performed with learnable embeddings corresponding to each disease class predictions from the classifier?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

How is the GT for the 10 year risk of CVD obtained. Is it obtained by actually assessing a patient’s visit after 10 years or based on a rule-based calculation using various risk factors from the current visit? If the 10-year risk is calculated from the current risk factors, the age and gender of a patient may play a critical role in evaluating the GT for 10-year risk .A baseline performance of a simple logistic regression using only the age and gender of the patient would have provided a lower bound on the performance, and allowed the readers to gauge the amount of improvement that can be achieved by using the OCTA scans over using age and gender information alone.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. This work has moderate technical contribution
2. solves an important problem as retinal OCTA has been relatively less explored especially for the detection of systemic diseases such as CVD.
3. Extensive evaluation which justifies the proposed technical contributions
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.
1. The task of predicting future risk of systemic diseases such as CVD from retinal OCTA images is an interesting and open area of research
2. The technical contribution of this paper is moderate but well justified by the experimental results. The main criticisms of the proposed methodology are: (a) Patch ordering based on vessel direction: (i) this is specific to the imaging modality and application and is not a generic method that could be applied to any medical application. (ii) While performance of mamba or any auto-regressive model should be dependent on the ordering of patches, the same is not true for transformer self-attention where the ordering among the patches is irrelevant and spatial information is only passed as positional encodings. Still, the proposed method is able to outperform a standard transformer block, which is surprising. The alignment of text-embedding with the image features: Alignment of the image-based model’s features with text features is meaningful. But this has several possible flaws and loopholes: (i) the text is not provided by a clinician but artificially generated from a disease classifier’s predictions where the classifier is trained on other retinal diseases such as CVD which affect the vasculature but are not related to the cardiovascular diseases. Thus the generated text-pairs would be very noisy. (ii) This module increases the computational complexity, and memory requirements of the proposed architecture considerably. Is this text alignment really required at each of the N blocks. Could simple FC layers be used in some layers and the text-alignment only in a few of the N layers (for eg. only the last layer features)?
I recommend this paper for a poster presentation at MICCAI.

Review #3

Please describe the contribution of the paper

The authors introduce OCTA-CVD, the first OCTA dataset designed for cardiovascular disease (CVD) risk assessment, along with VAMPIRE—a Vessel-Aware Mamba-based Prediction model. VAMPIRE leverages OCTA enface images to extract critical vascular features using two main components: (1) a Mamba-Based Directional (MBD) Module that captures detailed vascular trajectory patterns and (2) an Information-Enhanced Morphological (IEM) Module that integrates rich vessel morphology information.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The focus on using non-invasive OCTA imaging for early CVD risk stratification has significant potential for real-world clinical application, especially in preventive care settings.
- The introduction of OCTA-CVD as the first dataset specifically designed for cardiovascular disease risk prediction using OCTA imaging fills an important gap in the field and provides a valuable resource for future research.
- The proposed VAMPIRE model integrates both vascular trajectory and morphology-aware modules, offering a well-motivated architecture tailored to the specific characteristics of retinal vasculature in OCTA images. A thorough evaluation and ablation study is performed.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper does not clearly describe how OCTA images were processed or annotated for CVD risk assessment. It’s also unclear whether labels are derived from clinical outcomes, risk scores, or proxy indicators.
- The manuscript lacks an explanation of how OCTA images from different retinal layers are treated (e.g., combined, selected, or modeled separately) and whether predictions are made at the eye level or aggregated at the patient level.
- The paper does not include a dedicated limitations section, which would be valuable for providing transparency regarding the potential weaknesses or constraints of the proposed approach.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- In the architecture description, the term “segment” is used when referring to splitting the input image into smaller regions. For clarity and alignment with common terminology in vision transformers and similar models, consider using “split” or “patchify” instead.
- It is unclear whether the positional encodings are added to the patch embeddings or concatenated. Clarifying this would help in understanding how spatial information is incorporated. Additionally, if the final patch embedding dimension is denoted as D, does this value include the positional encoding component?
- OCTA images often include enface views of different retinal layers. How are these layers treated in the model? Are they: Concatenated into a single multi-channel image, Processed separately and then fused later, Or predicted on individually?
- Are both eyes from each patient included in the dataset, and if so, how is that handled? Specifically, are predictions made at the eye level or aggregated to produce patient-level predictions?
- Are the CNN backbones pretrained on ImageNet before fine-tuning on OCTA-CVD, or are they trained from scratch? Also, are the same hyperparameters (e.g., learning rate, batch size, optimizer settings) applied consistently across all experimental comparisons?
- The proposed model shows significantly better performance on the second dataset compared to the first. It would be helpful if the authors could elaborate on the possible reasons for this discrepancy. Are there differences in data quality, label distribution, population characteristics, or imaging protocols that might explain the performance gap?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While there are some areas for improvement, including clarifying certain architectural details and incorporating a limitations section, the overall contribution remains valuable and impactful. The work’s novelty, clinical relevance, and potential for future applications outweigh the minor issues raised. Given these factors, the paper should be accepted, even without a rebuttal, as it introduces both theoretical and practical advancements to the field.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors properly addressed the reviewers’ comments, and as mentioned earlier, the work’s novelty, clinical relevance, and potential for future applications support the decision to accept.

Author Feedback

Thank reviewers for their insightful feedbacks. Below, we address reviewers’ concerns. We will revise the main paper accordingly. [R1/2] Computational cost: We admit that the IEM module added approximately 50M parameters and 2GB in GPU memory compared with the baseline. However, its main goal is to enhance the model’s ability to capture vascular features. Since it incorporates meaningful vessel morphology and improves performance, the lightweight overhead appears acceptable. [R2/3] Experiment setting: All the experiments adopt identical data processing, data augmentation, batch size, and running epoch. We use the optimal hyperparameter and optimizer for previous models from original papers. Pretrained weights derived from ImageNet are adopted for CNN-based models. [R1/3] GT annotation: The 10-year CVD risk is derived from Cox regression function based on risk factors. While age/gender are critical factors, a simple model with only these inputs performs worse than OCTA-based CNNs. This demonstrates the value of OCTA images in providing additional vascular biomarkers. [R1] Pre-requisite algorithms: They are preprocessing steps in our proposed modules. We apply free-launch models that are publicly available. No additional training is required on our dataset. Bidirectional architecture: Our key focus is to encourage the model to leverage vessel directional information, which has been proved to be effective with uni-directional structure. Bidirectional scanning is a feasible alternative, and we will consider performing the evaluation in the final version if permitted. Disease classifier: The diseases include DR, AMD, and CNV, which exhibit similar vascular changes with CVD, e.g. enlarged FAZ or reduced vessel density. Thus, prompting MLLM to describe vessel morphology with extra guidance based on disease helps improve the generation quality. While directly introducing disease prediction seems more straightforward, the main objective of IEM module is to improve the model’s perception of vessel morphology with rich and descriptive contexts, which could not be inferred from only disease types. [R2] Vessel traversal: (1) Bifurcations: We use depth-first search (DFS) to traverse the vessel tree. Starting from a single pixel, we explore neighbor pixels in a fixed order (left-to-right, top-to-bottom). If a neighbor belongs to the same vessel branch, traversal continues; otherwise, it backtracks. Thus, each bifurcation is explored as deeply as possible. (2) Background patches: After vessel traversal, unvisited patches are reated as background patches. They are inserted between vessel patches based on the relative location in the original image. (3) Consistent order: The order is controlled via sorting the neighbor pixels before pushing onto the stacks as mentioned above. Advantages over non-SSM models: We have already included the comparison in Table 4 of the original paper. “Baseline” uses standard transformer block without linear ordering while “w/MBD” applies vessel-aware scanning strategy in a Mamba-based block. The performance increase of 6% in F1 score and 3% in AUPR demonstrates that the proposed method enables the model to capture more comprehensive vessel information, leading to better performance. Learnable prompts: Trainable parameters optimized during training. Generated description: We manually review a subset of the descriptions and find around 80% accurately reflect vascular changes, such as vessel distortion or irregular FAZ. We will release the generated descriptions with our code. [R3] OCTA layers: OCTA images from SVC, DVC, CC layers are concatenated along the channel dimension to form a three-channel input. Evaluation: Eye-level results are reported. Positional encoding: As illustrated in Section 2.1, the positional encoding is combined with patch embedding before fed into the sequential blocks. Performance gap: The second dataset has a more balanced label distribution, which may be the reason for better performance.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper received mixed reviews, specifically the comparison to a baseline with an equal amount of parameters is a fair point of concern. Still, overall, I find the proposed contribution interesting novel and recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper initially received mixed reviews. While the authors did a good job addressing some of the comments, R2 still has valid concerns. I tend to lean towards R2’s recommendation. The authors concede that the proposed module added approximately 50M parameters and 2GB in GPU memory compared with the baseline. However, the performance improvement is not commensurate.

back to top

VAMPIRE: Uncovering Vessel Directional and Morphological Information from OCTA Images for Cardiovascular Disease Risk Factor Prediction

Author(s):