Abstract

The segmentation of the hepatic vasculature in surgical videos holds substantial clinical significance in the context of hepatectomy procedures. However, owing to the dearth of an appropriate dataset and the inherently complex task characteristics, few researches have been reported in this domain. To address this issue, we first introduce a high quality frame-by-frame annotated hepatic vasculature dataset containing 35 long hepatectomy videos and 11442 high-resolution frames. On this basis, we propose a novel high-resolution video vasculature segmentation network, dubbed as HRVVS. We innovatively embed a pretrained visual autoregressive modeling (VAR) model into different layers of the hierarchical encoder as prior information to reduce the information degradation generated during the downsampling process. In addition, we designed a dynamic memory decoder on a multi-view segmentation network to minimize the transmission of redundant information while preserving more details between frames. Extensive experiments on surgical video datasets demonstrate that our proposed HRVVS significantly outperforms the state-of-the-art methods. The source code and dataset will be publicly available at \href{https://github.com/scott-yjyang/xx}{https://github.com/scott-yjyang/HRVVS}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0929_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/scott-yjyang/HRVVS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YaoXin_HRVVS_MICCAI2025,
        author = { Yao, Xincheng and Yang, Yijun and Guo, Kangwei and Xiao, Ruiqiang and Zhou, Haipeng and Tao, Haisu and Yang, Jian and Zhu, Lei},
        title = { { HRVVS: A High-resolution Video Vasculature Segmentation Network via Hierarchical Autoregressive Residual Priors } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {265 -- 275}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a hepatic vasculature segmentation dataset labelled for the task of semantic segmentation for both image and video segmentation. It also introduces a video segmentation network called High-resolution Video Vasculature Segmentation Network (HRVVS), combining several latest computer vison concepts such as Visual auto-regressive modelling, Multi-task guided multi-view attention network and Global semantic-guided sub-image feature weight allocation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1- The paper introduces a hepatic vasculature segmentation dataset in clinical video scenarios. 2- It also presents a segmentation network called High-resolution Video Vasculature Segmentation Network (HRVVS). 3- The proposed HRVVS method is compared with several existing image-based and video-based segmentation networks on the introduced dataset. 4- The manuscript mentions that the dataset will be released upon acceptance, which would be a valuable contribution to the research community.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1- For the introduced dataset the expertise of labelling team are not mentioned and also its not clear how many annotators labelled each instance. 2- Figure-2 of the paper which contains the HRVVS is not well explained in the Method section, for example while reading section 2.2 the reader is introduced to many key modules of the paper like VAR branch , multi-view branch and some encoder without giving specific explanation and context of each component. 3- In section 2.3, it mentions ‘As shown in Fig 2, the MSIM module effectively combines ….’, its not clear by only showing in figure-2 confirms the effectiveness of MSIM, it needs more experiments, more details and results for this claim. 4- In column-2 of Table1 there are some venues mentioned against each paper but they are not correct, for example SLT-Net is published in ‘Computers in Biology and Medicine’ not in CVPR, and ISNet is published in CVPR not in ECCV. 5- In the motivation the paper mentions about , ‘Discontinuities between frames and abrupt positional transformations’, however its not discussed in the results section how the proposed methods solves these issues.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method section lacks clarity and coherence, making it difficult to understand how the different modules interact. Additionally, while the motivation section outlines several important challenges, the connection between these challenges and the proposed solution is either weak or missing entirely.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal has addressed the majority of my concerns, and I believe the introduction of the new high-resolution dataset will be a valuable resource for the research community. Additionally, the application of the newly developed autoregressive approach in the domain of medical image segmentation is both interesting and promising, based on the results presented in this paper. This work has the potential to open new research directions in the field.



Review #2

  • Please describe the contribution of the paper

    develop a high-resolution video segmentation model for hepatic vasculature, introduce the first high-resolution video hepatic vasculature segmentation dataset under surgical scene

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    introduce a novel algorithm for surgical video segmentation, introduce a new dataset

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    the experiment is only performed on the one proposed HRVVS dataset; the comparison method is not enough, please add more comparison method such as mamba-based methods

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The dataset is valuable for the development of CAI; The proposed algorithm combines VAR and dynamic memory to deal with the temporal information

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    Paper introduces a novel high-resolution video segmentation framework for hepatic vasculature in surgical scenes, leveraging visual autoregressive priors and a dynamic memory decoder. It also presents the annotated dataset for this task.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method uses a pretrained VAR model as residual priors in the encoder.
    2. The MSIM and DWFM modules combine features across frames using cross-attention.
    3. The authors introduce Hepa-SEG, a high-resolution dataset for hepatic vasculature segmentation in surgical videos.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limited novelty
      • While the integration of VAR, MSIM, and DWFM is well-executed, each component is inspired by prior work (e.g., cross-attention in memory-based segmentation [Liang et al., TMM 2020]; patch-weighted fusion [Liang et al., arXiv 2025]).
      • The novelty lies more in the combination and application rather than in any individual innovation.
      • VAR is a recent method, and while its use in segmentation is new, the idea of using pretrained representations as priors has precedent.
    2. Limited comparison with related surgical segmentation task
      • The baseline methods are mostly general image/video segmentation models or models for polyp/ultrasound segmentation.
      • While the authors plan to release a new dataset, the surgical segmentation task itself is not new, and the work lacks sufficient comparison with existing surgical video segmentation literature.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the paper’s technical novelty is moderate and the comparison with prior surgical video segmentation work is limited, it presents a clear contribution through the creation of a high-resolution (1080×1920) surgical video dataset for hepatic vasculature segmentation. Most existing datasets are low-resolution and not suited for fine vessel analysis, making this dataset a valuable resource.

    If accepted, the dataset has the potential to serve as a benchmark for future video segmentation research in medical and surgical imaging.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely appreciate reviewers’ constructive feedback, which improves our manuscript. Below, we address all major concerns.

#R1 Dataset and Comparisons Thanks. We focus on Video Vasculature Segmentation. Our collected dataset is the only well-annotated one for surgical vasculature segmentation without other benchmarks available for comparison. We follow you to additionally compare 3 mamba methods Mamba-Sea(TMI25), SCSegamba(CVPR25), and LGRNet(MICCAI24) on our dataset. Our HRVVS surpasses these by 9.34, 7.01 and 2.65 on Dice(%). Vivim,which we have compared,is also a mamba-based method.Altogether, we now compare 11 methods, robustly validating HRVVS’s performance.

#R2 1.Dataset Annotation Expertise 8 hepatobiliary-pancreatic surgeons ensured annotation quality: 4 junior surgeons (≥10 cases,2+years) annotated GP/HV in laparoscopic images; 2 mid-level surgeons (≥50 cases,5+years) conducted preliminary reviews; 2 senior surgeons (≥200 cases) developed annotation guidelines, and performed post-annotation audits via each frame verification.

2.Model Architecture(Fig. 2) Inspired by VAR’s encoding and autoregressive generation, HRVVS integrates: VAR Branch: Pretrained VQ-VAE/VAR with CNN-based adapters (input adaptation, Transformer finetuning, cross-scale alignment) using residual connections. Multi-View Branch: Swin-B Transformer processes 4 local/1 global views for hierarchical cross-scale fusion. MSIM: Preliminary fusing local, global, and multi-scale historical features from our memory bank by multi-head cross-attention before the decoder. DWFM: An attention-based module in the last layer of decoder for patch-wise local and global feature fusion by patch weights from the previous frame information. This fusion smooths boundaries in each local feature and mitigates discontinuity in segmentations.

3.MSIM Validation MSIM enables pre-decoder spatiotemporal fusion. Ablation in Table 2 shows removing the whole module reduces Dice by 0.73%. Further, new experiments on only ablating history features in MSIM show the Dice reduced by 0.51%. This ablation validates its effects in retrieving temporal information for consistent segmentation.

4.Table 1 Venue Thanks. The venues in Tab 1 are correct but with wrong references: SLT-Net: Cheng et al.,Implicit Motion Handling for Video Camouflaged Object Detection(CVPR22). ISNet: Qin et al.,Highly Accurate Dichotomous Image Segmentation(ECCV22). We will fix them.

5.Discontinuities Mitigation MSIM and DWFM enhance inter-frame continuity via hierarchical spatiotemporal modeling. MSIM’s multi-view pyramids and MHCA for cross-frame global associations, effectively mitigates segmentation discontinuities caused by anatomical and imaging variations. DWFM decomposes features into patch units with a reference from current and previous global features, fusing current/historical global features to form a recursive memory model that suppresses boundary fragmentation. Their cross-layer interaction and temporal memory improve vascular segmentation continuity in high-resolution surgical videos. Ablations removing historical guidance in MSIM/DWFM decrease Dice by 0.51%/2.24%, validating memory-driven effectiveness.

#R3

  1. Novelty Our model introduces three key innovations: (1) First use of VAR as a segmentation encoder via task-specific adapters, enabling cross-domain transfer from image generation to segmentation; (2) MSIM’s multi-scale memory fusion for dynamic spatiotemporal feature association, surpassing static fusion methods; (3) DWFM’s integration of historical segmentation into dynamic patch weighting for recursive temporal memory, distinct from Liang et al.’s static frameworks. Experiments validate these advancements in high-resolution video segmentation.

  2. Surgical Literature Comparison HRVVS outperforms SurgicalSAM/MATIS by 11.48%/3.49% in Dice. Unlike prior work on surgical instruments or 3D vasculature segmentation, it’s the first method for intraoperative hepatic vasculature segmentation.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    Please address the major weaknesses of the paper.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces HRVVS, a high-resolution video segmentation model for hepatic vasculature, and the new Hepa-SEG surgical video dataset. Reviewers initially praised the dataset and the novel VAR-based and memory-driven modules but noted insufficient comparisons, unclear architecture descriptions, and missing annotation details. After the rebuttal, all three reviewers provided positive scores and their major concerns were addressed. Given the solid performance improvements, dataset value, and mostly resolved critiques, I recommend acceptance.



back to top