Abstract

Digital Subtraction Angiography (DSA) sequences serve as the foremost diagnostic standard for cerebrovascular diseases (CVDs). Accurate cerebrovascular segmentation in DSA sequences assists clinicians in analyzing pathological changes and pinpointing lesions. However, existing methods commonly utilize a single frame extracted from DSA sequences for cerebrovascular segmentation, disregarding the inherent temporal information within these sequences. This rich temporal information has the potential to achieve better segmentation coherence while reducing the interference caused by artifacts. Therefore, in this paper, we propose a spatio-temporal consistency network for cerebrovascular segmentation in DSA sequences, named DSNet, which fully exploits the information of DSA sequences. Specifically, our DSNet comprises a dual-branch encoder and a dual-branch decoder. The encoder consists of a temporal encoding branch (TEB) and a spatial encoding branch (SEB). The TEB is designed to capture dynamic vessel flow information and the SEB is utilized to extract static vessel structure information. % The Dynamic Frame reWeighting (DFW) module is designed to select frames from DSA sequences dynamically in the TEB skip connection. To effectively capture the correlations among sequential frames, a dynamic frame reweighting module is designed to adjust the weights of the frames. In bottleneck, we exploit a spatio-temporal feature alignment (STFA) module to fuse the features from the encoder to achieve a more comprehensive vascular representation. Moreover, DSNet employs unsupervised loss for consistency regularization between the dual output from the decoder during training. Experimental results demonstrate that DSNet outperforms existing methods, achieving a Dice score of 89.34\% for cerebrovascular segmentation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1357_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xie_DSNet_MICCAI2024,
        author = { Xie, Qihang and Zhang, Dan and Mou, Lei and Wang, Shanshan and Zhao, Yitian and Guo, Mengguo and Zhang, Jiong},
        title = { { DSNet: A Spatio-Temporal Consistency Network for Cerebrovascular Segmentation in Digital Subtraction Angiography Sequences } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript introduces DSNet, a novel network enhancing cerebrovascular segmentation in DSA sequences by integrating both spatial and temporal information. It features innovations like a dual-branch encoder and dynamic frame weighting, significantly improving segmentation accuracy and performance over traditional methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Integration of Spatio-Temporal Information: DSNet uses both spatial and temporal information through its dual-branch architecture, which processes dynamic flow and static structure information separately.
    2. Dynamic Frame Weighting: The paper introduces a mechanism that adjusts the significance of each frame based on its relevance. This method helps reduce redundancy and focuses processing on more informative frames, enhancing the efficiency and accuracy of the segmentation process.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited Dataset Size: The use of a dataset comprising only 70 DSA sequences, despite being subjected to 5-fold cross-validation, may indeed be a limitation for robustly evaluating the DSNet model. A larger dataset would typically provide a more comprehensive assessment of the model’s generalizability and robustness across a wider range of cases and variations. Small datasets can sometimes lead to overfitting, where a model might perform exceptionally well on the given data but fail to generalize to new, unseen datasets. Expanding the dataset or supplementing it with additional external validation could help address this issue.
    2. Comparative Model Selection: The comparison of DSNet, which utilizes both spatial and temporal information, against models traditionally designed for static image segmentation highlights an important oversight. For a fair assessment of DSNet’s efficacy in leveraging temporal information for improved segmentation, it would be more appropriate to compare it against other models that also consider temporal dynamics. Although temporal information is not commonly used in medical image segmentation, it is extensively utilized in other areas such as video processing and motion analysis. Incorporating comparative models that handle temporal data, even if through a simple implementation, would provide a clearer picture of how effectively DSNet uses this information compared to possible alternatives.
    3. Lack of Computational Efficiency Analysis: The absence of a detailed discussion on computational efficiency is a significant gap, especially given the potentially increased complexity of DSNet’s architecture. With more parameters due to its dual-branch structure and dynamic frame weighting, it’s crucial to assess whether the performance improvements justified the additional computational cost. A comparison of computational resources required, such as memory usage, processing time, and power consumption, between DSNet and standard models would be informative. This analysis would help determine if the improvements are due to the novel architecture or merely from an increase in model capacity.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    To enhance the paper, consider expanding the dataset to improve generalization, comparing DSNet with other temporal-based models for a clearer performance assessment, and discussing the computational efficiency to evaluate practical deployment potential.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation on the overall score for this paper is based on two main concerns: the small dataset of only 70 DSA sequences and inadequate comparative experiments with non-temporal models, which may not sufficiently demonstrate the proposed method’s effectiveness.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces DSNet, a spatio-temporal consistency network for cerebrovascular segmentation in DSA sequences. ​One of its main contributions is the ability to fully utilize the temporal information in DSA sequences, achieving accurate cerebrovascular segmentation by capturing dynamic vessel flow and static vessel structure information. The proposed network integrates innovative modules such as the Dynamic Frame reWeighting module, Spatio-Temporal Feature Alignment module, and unsupervised optimization strategy, effectively enhancing the segmentation coherence and reducing interference caused by artifacts. Moreover, DSNet has demonstrated superior performance compared to existing methods, showcasing its potential for practical use in clinical applications.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative Use of Temporal Information: The paper introduces DSNet, a spatio-temporal consistency network for cerebrovascular segmentation in DSA sequences, which effectively captures dynamic vessel flow and static vessel structure information . This innovative use of temporal information sets DSNet apart, providing a comprehensive understanding of cerebrovascular dynamics and enhancing segmentation coherence . Novel Network Modules: DSNet incorporates the Dynamic Frame reWeighting module and Spatio-Temporal Feature Alignment module to capture sequence frame correlations and align spatio-temporal features, demonstrating a unique approach to integrating time dynamics and spatial structure . Clinical Feasibility Demonstration: The paper showcases the clinical feasibility of DSNet, highlighting its potential for practical use in clinical applications. This is demonstrated through ethical approval, informed consent, and implementation on specialized hardware, emphasizing the real-world application of the proposed method . Superior Performance & Evaluation: ​DSNet outperforms existing methods in cerebrovascular segmentation, achieving a Dice score of 89.34% and excelling in maintaining vascular connectivity with a clDice score of 86.26%. The comprehensive evaluation with state-of-the-art methods demonstrates the robustness and effectiveness of DSNet in cerebrovascular segmentation .

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Dependency on Proprietary Datasets: The paper does not explicitly address the potential dependency on proprietary datasets and annotations, which can be a critical weakness. This is a relevant concern given the scarcity of accurately annotated data and the urgent need for developing multi-modality label-efficient deep learning techniques. Addressing Dynamic Vessel Deformations: The paper does not comprehensively address the segmentation of vessel deformations and the associated challenges, which other works have identified as a significant aspect of segmentation for cerebrovascular diseases[1]. [1]Meijs, M., Patel, A., van de Leemput, S.C. et al. Robust Segmentation of the Full Cerebral Vasculature in 4D CT of Suspected Stroke Patients. Sci Rep 7, 15622 (2017). https://doi.org/10.1038/s41598-017-15617-w

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. It is recommended to use publicly available datasets to test the effectiveness of a network, or to publicly share private datasets and algorithm code to validate the reproducibility of the algorithm.
    2. How do the Time Encoding Branch (TEB) and the Space Encoding Branch (SEB) function differently? The author should explain this clearly. 3.In the Space Encoding Branch, is the input single-frame image randomly selected, or is there another selection method? The author should clearly explain this. 4.In the comparative experiments, although the performance was improved, the author did not compare them with a network specifically designed for DSA segmentation.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Innovative Methodology: The introduction of DSNet and its unique approach towards utilizing spatio-temporal information in cerebrovascular segmentation. Dependency on Proprietary Datasets: The paper does not explicitly address the potential dependency on proprietary datasets and annotations, which can be a critical weakness. This is a relevant concern given the scarcity of accurately annotated data and the urgent need for developing multi-modality label-efficient deep learning techniques.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    The author’s rebuttal did not address some of my concerns, so I have decided to reject it.



Review #3

  • Please describe the contribution of the paper

    The authors present a network architecture with two proposed layers and a loss function to specifically incorporate the temporal information present in Digital Subtraction Angiography. They use a two-branch architecture to encode the temporal image sequence and the spatial MIP image and combine this information for the respective decoding stages. Both encoder outputs are used in an unsupervised way as pseudo labels for the other encoder. Their novelty includes the fusion layer of both branches and a temporal weighting for the skip-connections.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method exploits two DSA-inherent properties (1. possibility to easily generate MIP vessel images and 2. the temporal nature of the approach) and combines them in the training process. They present a very solid evaluation including multiple SotA baselines and metrics, an ablation study for all proposed features and qualitative examples of their results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not mentioned in the paper how many images are used for the training. Furthermore, it is unclear what the input to all the baseline models is, making a fair comparison difficult to interpret for the reader.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors use an in-house dataset with unclear specifications of the training set. The method itself is described in a clear way for possible reimplementation, however due to the extent and complexity of the network architecture, providing the architecture in an open-source fashion would be highly encouraged.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In general, the paper is well written and contains a complete evaluation. To improve the manuscript, the authors should address the following open questions: The authors mention that the batch size B and temporal dimension T are merged into N when training the TEB. Additionally, network is trained with a batch-size of two. Can the two samples of the batch influence each other / share information during the training process?

    It is mentioned in the motivation that considering sequences as video data introduces unwanted redundancy. However, in your method you also multiple subsequent frames of the DSA procedure. Are you using significantly less images due to the resampling or how are you reducing the unwanted redundancy in your method? Additionally, how is the temporal resampling implemented for your input. Is it just a linear interpolation? If yes, how would you assess the influence of the topological/spatial incorrectness of the resulting interpolated vessel paths? A mayor concern is the lack of information about the models training input. The paper does not mention the essential information of training (and validation) set size. This is crucial to evaluate the generalizability / data hungriness of the model. Furthermore, the input of the baselines is not mentioned: Do they receive individual DSA images, the MIP image or temporal sequences? This is a highly important information for a fair comparison of the models to assess the amount of information per model. Your backbone is already superior to the nnUnet. As this is already an achievement, it would be important to mention how the individual backbone is defined, e.g. how are features from both branches combined here? An interesting sub-analysis would include the computed weights for the individual frames in your reweighting layer. Are some frames (e.g. later frames) more important than others? This information could lead to further reduce the input size (in the temporal dimension). Considering real-world applications: Can you comment on the diversity of the data, e.g., are the training data all taken from exactly the same position? How much would different protocols and maybe different scanner positions influence the generalizability of your results?

    Last, the presented method is very modality-specific. Are there other application fields where your developed methods could bring a benefit or is your method limited to DSA?

    Minor mistakes: In 2.2 first paragraph: “Dimensional T” instead of “dimension T” Additional “.” in before “Table . 2.” in section 3.3

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors present a very modality-specific architecture, however provide a thorough and complete analysis of the network components with a good methodological description. After satisfying clarification about the used amount of training data and input of the baselines in the rebuttal, I would recommend acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal was able to answer some of my questions but in my opinion did not fully convince me regarding the data processing questions (interpolation, data acquisition, redundant information). I cannot fully align with the other reviewers’ criticism regarding the small dataset size. Even though, it is of course a small amount of data, collecting a dataset in this area is extremely tedious and they have already taken extra efforts to increase the dataset size compared to other works.

    In total, my opinion remains unchanged, leaning towards accept.




Author Feedback

We appreciate your positive feedback on our technical novelties (e.g., “novel network”-R1, “innovative use of temporal information”-R3, and “exploit DSA-inherent properties”-R4), and the effectiveness (e.g., “superior performance”-R3, and “solid evaluation”-R4). We kindly request that you reconsider our work in light of the following points:

1.Dataset and dataset size(R1,R3,R4) Our dataset includes multi-device, multi-disease DSA sequences with both sagittal and coronal views at multi-resolutions to ensure data distribution complexity. It contains 70 sequences: 20 for an independent test set and 50 for 5-fold cross-validation. Neurosurgeons meticulously annotated the dataset, a challenging and time-consuming task that often requires 2-3 clinical days per sequence. Existing studies rely on smaller private DSA datasets, such as [3] with 20 images and [4] with 30 images, and unfortunately, there are currently no publicly available DSA datasets. Thus, we intend to release a larger DSA sequence dataset, including more than the 70 sequences used in this study.

2.Comparative experiment(R1,R3,R4) DSA sequences dynamically display vessel flows, capturing partial contrast agents in each frame. Existing methods use only single-frame inputs due to their network designs, which fails to cover entire cerebrovasculature and pathological features and are not open-sourced for comparison. To ensure a fair comparison with other SOTA methods, it is important to use the same input as our method. TransUnet achieves a Dice of 85.00 on DSA sequences, lower than with single-frame input. nnUnet shows subtle changes with a Dice of 87.64. This suggests these methods struggle to extract complete sequential features, while our method fully leverages spatiotemporal information through the delicately designed DFW and SFTA. We also benchmark against a recent sequence-based method, CAVE, which achieves a Dice of 84.39, much lower than the 89.34 achieved by our method.

3.Computational efficiency(R1) Not only does structural design increase parameters, but channel size plays a role. For example, UNet has a maximum of 1024 channels, while our method has 512 channels and fewer parameters than UNet. We computed parameters/FLOPs for models using sequence inputs: U-Net (31.0M/219.4G), TransUnet (105.9M/128.6G), nnUNet (7.4M/54.8G), CAVE (83.2M/4513.2G) and our DSNet (17.05M/236.3G). Despite a slight increase, our method performs the best, demonstrating its effectiveness and superiority.

4.Function and input of the TEB and SEB(R3) The TEB and SEB have the same structure but differ in inputs. The TEB extracts temporal flow information from a sequence, while the SEB extracts spatial contextual information from a MIP image.

5.Influence of two samples in the batch(R4) In TEB, sequences with dimensions (BTCHW) are input, merging B and T into N for convolution operations. This technique used in many studies like CAVE and STCN, maintains the independence of different samples.

6.Data resample and vessel deformation(R3,R4) We resample sequences using the nearest neighbor algorithm to preserve continuity. During DSA sequence imaging, slight head movements can cause vessel displacement among frames. To address this, we use non-rigid registration to match vascular topology before resampling, as reported in Sec.3.

7.Backbone, other application and redundancy(R4) Our backbone is a straightforward dual-branch encoder-decoder. We believe this dual-branch structure can be applied to TOF-MRA vessel segmentation by using previous slices to guide the current slice. Moreover, ‘redundancy’ refers to repeated vessels in neighboring frames, causing redundant segmentation and annotation, particularly in video segmentation.

8.Importance of different frames(R4) DSA sequences depict blood flow from the bottom to the top and center outward. We believe that initial frames are more significant for large vessels at the bottom, while later frames are more relevant for fine vessels at the end.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This manuscript mainly investigates the way to enhance the segmentation of DSA with temporal information. The reviewers agree that the paper is novel because it concerns a critical application. The main concerns of the reviewers are the quality of the labeled dataset and the method’s technical novelty. The authors made several clarifications during the rebuttal. I feel that the quality of the dataset should not be the main concern if its collection is difficult. I agree with one reviewer that the paper presents an overall task-specific method with good evaluation and strong motivation. I think it is sufficient for a conference publication, and I encourage the authors to extend the method to other applications to expand the impact of this method if they want to publish it in a journal afterward.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This manuscript mainly investigates the way to enhance the segmentation of DSA with temporal information. The reviewers agree that the paper is novel because it concerns a critical application. The main concerns of the reviewers are the quality of the labeled dataset and the method’s technical novelty. The authors made several clarifications during the rebuttal. I feel that the quality of the dataset should not be the main concern if its collection is difficult. I agree with one reviewer that the paper presents an overall task-specific method with good evaluation and strong motivation. I think it is sufficient for a conference publication, and I encourage the authors to extend the method to other applications to expand the impact of this method if they want to publish it in a journal afterward.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top