Abstract

Accurate diagnosis of microvascular obstruction (MVO) in acute myocardial infarction (AMI) patients typically relies on Cine Cardiac Magnetic Resonance Imaging (CMR) (video sequences) and Late Gadolinium Enhancement (LGE) CMR (images). However, LGE imaging is contraindicated in approximately 20% of AMI patients with chronic kidney disease, underscoring the need for Cine CMR as a standalone diagnostic alternative. Although recent advancements in deep learning have improved video data processing, current methods fail to adequately capture complementary temporal motion features. This limits their efficacy and poses significant challenges for MVO segmentation with Cine CMR, as MVO regions are defined by dynamic motion rather than clear boundaries or contrast on Cine CMR. To address this limitation, we propose a Spatiotemporal-Sensitive Network that integrates static and motion encoders to effectively process Cine CMR. Further through a guided decoder utilizing the rich spatiotemporal information and an uncertainty-driven refinement leveraging uncertainty maps and low-level features, our method enhances segmentation accuracy and refines boundary delineation. Extensive experiments on 621 Cine CMR demonstrate superior performance over competing methods with a Dice score of 0.56 in Cine CMR-based MVO identification and highlight its potential to advance video analysis in clinical settings. The code is available at https://github.com/MICCAI25-MVO-Segmentation/miccai25-mvo-seg.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0219_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MICCAI25-MVO-Segmentation/miccai25-mvo-seg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YuYan_SpatiotemporalSensitive_MICCAI2025,
        author = { Yu, Yang and Kok, Christopher and Wang, Jiahao and Cheng, Jun and Leng, Shuang and Tan, Ru San and Zhong, Liang and Yang, Xulei},
        title = { { Spatiotemporal-Sensitive Network for Microvascular Obstruction Segmentation from Cine Cardiac Magnetic Resonance } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {540 -- 550}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a new deep learning method for segmenting microvascular obstruction (MVO) in cine cardiac MRI videos, without using contrast-based imaging like LGE CMR. This is important because contrast is not safe for all patients, especially those with kidney disease. The model has two branches. One processes frames to get spatial details (static encoder). The other processes frame differences to capture temporal context (motion encoder). These features are combined using a guided decoder with attention. On top of this, they use an uncertainty-driven refinement module to improve boundary quality. The method is tested on a private dataset of 621 scans and performs better than several existing image- and video-based methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The clinical motivation is clear and strong. Contrast-based methods are not usable for all patients, so this paper addresses a real need. The model design makes sense for cine MRI. MVO has weak boundaries but visible motion differences, so using both structure and motion separately helps. The uncertainty refinement is a good idea, especially for improving edge quality where predictions may be unclear. The ablation is well done and shows the value of each module. The method performs better than both image-based and video-based baselines. The paper also studies how the model behaves when fewer or more video frames are available.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The dataset has 621 cine MRI videos but only 125 subjects. There is no mention of subject-wise stratification, which could lead to data leakage if scans from the same subject appear in both train and test sets. The authors should clarify this.

    2. The absolute Dice score is still modest (around 0.55), which is not very high. While it improves over baselines, it shows this is still a hard problem, and there’s a long way to go for reliable clinical deployment. However, there is no failure analysis or examples where the model struggles, which would make the paper more complete.

    3. In fact, without the uncertainty refinement, the model achieves 0.5229 Dice, which is nearly the same as Vivim. So the improvement mainly comes from the uncertainty module (+0.0327 Dice), which is not analyzed or visualized. Adding edge-based metrics such as Hausdorff distance would strengthen the evaluation.

    4. There is no ablation for the combination of spatial + motion encoder without the guided decoder. This is important to isolate how much benefit the decoder and attention fusion gives.

    5. Only a single data split is used. For private clinical datasets, cross-validation is even more important to show robustness and generalization.

    6. The dataset is not public. Releasing the data or atleast some sample subset would be helpful for the community, especially given the clinical importance of the problem.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Can you confirm if scans from the same subject appear in both training, validation and testing sets? Have you done any internal cross-validation to check generalization? Could you provide any visualization or analysis of the uncertainty maps? Could you add an ablation with spatial + motion encoders but without the guided decoder, to isolate its contribution?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a meaningful and clinically motivated method, and the model design is strong. However, the lack of clarity around potential data leakage issue is serious. If subject-wise splitting was not used, then the evaluation may be overly optimistic due to data leakage. The authors should clarify this clearly. The overall performance is modest in absolute terms, and a key part of the gain relies on the uncertainty refinement module, which is not fully analyzed. If the authors can clarify the data split and address these concerns, I would be open to raising my score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have clarified the most critical concern: subject-wise data splitting (which rules out data leakage). They also provide evidence of statistical significance and have committed to including edge-based metrics, failure case analysis, and more complete baselines. These are important and should be clearly reflected in the camera-ready version. While the Dice score is still modest, the method is well-motivated and the paper makes a useful contribution. I recommend acceptance, provided the promised additions are made in the final version.



Review #2

  • Please describe the contribution of the paper

    The authors propose a deep learning approach for diagnosis of microvascular obstruction (MVO) in acute myocardial infarction (AMI) patients that combines image segmentation using Resnet and encode a “motion detection” part based on frames difference, into the same network architecture learning device. These two macro-blocks are also followed by an uncertainty-driven refinement scheme. Their approach is compared with image static techniques and also video-oriented techniques from SOA (UNet, UNet++, TransUNet, AFB-URR, DPSTT, PNS+, Vivim), over a dataset that seems to be private from the institution where it was collected and passed through acceptance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Combining frames-delta with static-image and uncertainty in the learning architecture is a relevant addition to SOA

    The authors comparison with other video-oriented approaches seems to return improved results.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    the dataset (and the algorithm) is not public (for reproduction of the experiments by peers)

    there is no statistical analysis to counter the null hypothesis that the relatively small improvement we observe over at least the best video-based prior approach is statistically significant

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Frames-delta, static-image and uncertainty are well-combined in the approach to improve segmentation. On the other hand, the improvement in the results seems small and without a complete statistical significance analysis.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This work presents a new method to detect microvascular obstruction in cine CMR without using contrast agents. It uses two separate networks to learn static structural features as well as dynamic motion features in the data. These features are integrated using a guided attention mechanism that helps the model focus on important motion cues. The method also includes uncertainty estimation to refine segmentation along the boundary regions. Overall, it offers a safe, contrast-free way to identify MVO in AMI patients with kidney disease.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method tackles microvascular obstruction detection in cine CMR instead of the conventional LGE CMR, which is a significant clinical step. 2.The use of residual frames is a clever trick to capture motion while remaining computationally light. It echoes the principles of optical flow without actually computing it. 3. The uncertainty-based refinement mechanism gives a way to flag ambiguous regions—very appropriate for cine CMR where the pathology is inherently fuzzy, while also mimicing how radiologists work in this scenario.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The evaluation appears somewhat limited, as several SOTA methods—such as nnU-Net and SwinUNetR—are not included for comparison. 2. Although the reported metrics provide a general performance overview, incorporating boundary-based metrics (e.g., Hausdorff Distance, ASD) is especially important in clinical applications to assess the precision of segmentation.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The evaluation lacks comparison with several state-of-the-art models, and the set of evaluation metrics should include atleast one boundary based metric.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed both of my concerns in their reponse.




Author Feedback

We thank reviewers R1, R2, and R3 for their supportive and encouraging feedback on our work, including the novel and practical dual-branch and uncertainty refinement design (R1, R2, R3), extensive validation (R1, R2), and strong clinical motivation and impact (R2, R3). Our responses, organized by topic, are provided below. R1 W1, R2 W6: Dataset and Algorithm Release. The scripts, along with detailed instructions, will be released upon paper acceptance. While the original dataset used in this study is proprietary, collected exclusively from a partner hospital for specific usage, it cannot be publicly shared. R1 W2: Statistical Significance. The Wilcoxon Signed-Rank Test was utilized to further evaluate model performance by comparing our proposed method with the best video-based method Vivim. The resulting p_values (0.006 for Dice, 0.009 for Jaccard) indicate a statistically significant improvement over the best baseline. We will also add a more comprehensive statistical analysis for all the baseline comparison to the revised version. R2 W1: Data Partition. We confirm that our data partitioning follows a subject-wise stratification strategy, ensuring that scans from the same subject are never included in both the training and test sets. This clarification will be added to the revised version. R2 W2: Failure Analysis. We acknowledge that there is still room for improvement in this preliminary study, particularly due to the limited dataset. To illustrate this, we will include an extra example of MVO segmentation, highlighting a case where our proposed method encounters challenge due to the undetectable sizes and subtle motion differences. R2 W3, R3 W2: Adding Edge-based Metrics and Further Elaboration for Uncertainty Scheme. A comprehensive analysis on Hausdorff Distance (HSD) metric was conducted where our proposed method still shows the best performance comparing to the best video-based method Vivim (7.239 versus 8.570), it will also be added to the revised version to strengthen the evaluation. The refined uncertainty map (grayscale) will be included in the revised version as well to demonstrate the effectiveness of the proposed uncertainty scheme, which significantly enhances segmentation quality on the ambiguous boundaries. R2 W4: Ablation Study on the Guided Decoder Alone. We did not show the results for this ablation study as the guided decoder is tightly integrated with the motion encoder. Its role is to reduce and refine feature maps by selecting the most relevant information through an attention mechanism. Without this design, the model struggles to effectively fuse the single spatiotemporal features extracted from the video sequence with the multiple spatial features from the residual sequence. We previously validated this by directly adding spatiotemporal video embeddings to the further processed residual embeddings after temporal convolution, followed by a conventional decoder. This approach resulted in degraded performance compared to the motion encoder + guided decoder configuration. We will add this finding to the revised version for a more thorough analysis of each module’ contribution. R2 W5: Data Splits and Robustness. Given the time and computational constraints, we prioritized a fixed training and test split to ensure consistency and reproducibility, which is critical for establishing a clear baseline for future comparisons. Nonetheless, we recognize the potential benefits of cross-validation for further validating model robustness and generalization, and we plan to explore this in future studies. R3 W1: Expanded Baseline Comparison. We selected UNet, UNet++, and TransUNet as our primary baselines, as these models are commonly used in similar studies on medical video analysis. As suggested by the reviewer, we will include extra baselines (e.g. SwinUNet) in the revised version for a more comprehensive comparison.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work is well motivated by clinical requirements of MVO segmentation from non-contrast cine MRI. The present method combines the spatiotemporal sensitive network and the uncertainty refinement work, and obtains effective performance with convincing validations. Main concerns about the dataset usage, evaluation metric, and statistical test are well explained by the author.



back to top