Abstract

Deep Vein Thrombosis (DVT) presents a high incidence rate and serious health risks. Therefore, accurate staging is essential for formulating effective treatment plans and enhancing prognosis. Recent studies have shown the effectiveness of Black-blood Magnetic Resonance Thrombus Imaging (BTI) in differentiating thrombus stages without necessitating contrast agents. However, the accuracy of clinical DVT staging is still limited by the experience and subjective assessments of radiologists, underscoring the importance of implementing Computer-aided Diagnosis (CAD) systems for objective and precise thrombus staging. Given the small size of thrombi and their high similarity in signal intensity and shape to surrounding tissues, precise staging using CAD technology poses a significant challenge. To address this, we have developed an innovative classification framework that employs a Global-Local Feature Fusion Module (GLFM) for the effective integration of global imaging and lesion-focused local imaging. Within the GLFM, a cross-attention module is designed to capture relevant global features information based on local features. Additionally, the Feature Fusion Focus Network (FFFN) module within the GLFM facilitates the integration of features across various dimensions. The synergy between these modules ensures an effective fusion of local and global features within the GLFM framework. Experimental evidence confirms the superior performance of our proposed GLFM in feature fusion, demonstrating a significant advantage over existing methods in the task of DVT staging. The code is available at https://github.com/xiextong/VDPF.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2909_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/xiextong/VDPF

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xie_VDPF_MICCAI2024,
        author = { Xie, Xiaotong and Ye, Yufeng and Yang, Tingting and Huang, Bin and Huang, Bingsheng and Huang, Yi},
        title = { { VDPF: Enhancing DVT Staging Performance Using a Global-Local Feature Fusion Network } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper developed a novel Computer-aided Diagnosis (CAD) system for Deep Vein Thrombosis (DVT) staging, utilizing a Global-Local Feature Fusion Module (GLFM) with a cross-attention mechanism. This approach improves the accuracy of DVT staging by effectively integrating global and local imaging features, demonstrating superior performance compared to existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The framework/architecture of the approach is well described. This paper marks the first application of Computer-aided Diagnosis technology in DVT staging , expanding the tools available for medical diagnostics in vascular health. The approach improves the specificity and reliability of diagnostics.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation lacks cross-validation. The absence of cross-validation can lead to an overestimation of the model’s accuracy and reliability; the model might overfit the training data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The caption of Fig. 1 needs sufficient and detailed information for the figure. Or maybe it is better to not use this figure and instead describe the challenges in the introduction session.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Not relevant to the MICCAI community (0)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    If authors can add cross validation to their results to avoid potential bias due to the single set of training/testing/validation data. The results should include the mean and standard deviation regarding each metric.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The concerns were partially addressed. The cross validation issue was explained but not addressed.



Review #2

  • Please describe the contribution of the paper

    The paper developed an innovative classification framework for clinical DVT staging prediction that employs a Global-Local Feature Fusion Module (GLFM) for the effective integration of global imaging and lesion-focused local imaging. Experimental evidence confirms the superior performance of the proposed GLFM.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1, First apply CAD technology in the field of DVT staging using BTI.

    1. Consider both global imaging and lesion-focused local imaging.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.How is lesion-focused local imaging obtained? Is it obtained by manual annotation or segmentation network? 2.Why the performance of resnet-50 in table 4 is better than Resnet50-DPF in table 2, considered Resnet50-DPF integrating both global and local information? 3.Comparison methods are relatively limited and are relatively old.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.How is lesion-focused local imaging obtained? Is it obtained by manual annotation or segmentation network? 2.Why the performance of resnet-50 in table 4 is better than Resnet50-DPF in table 2, considered Resnet50-DPF integrating both global and local information? 3.Comparison methods are relatively limited and are relatively old.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are some problems in experimental validation. See detailed comments in 10.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose an innovative classification framework that employs a Global-Local feature fusion module (GLFM) for integrating global and local (lesion-focused) imaging for DVT staging using Black-blood Magnetic Resonance Thrombus Imaging. The method involves slicing the 3D data into global and local images. Next, a ViT-based dual-branch backbone network extracts their respective features, which are fed to the fusion module comprised of self-attention and cross-attention blocks. The fused features are fed to a fully connected layer for classification into one of three DVT stages (acute, sub-acute, chronic). The authors compare the performance of their method to other classic classification approaches in 75 validation and 75 test cases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Leveraging both local and global information: The authors claim to have developed an innovative predictive framework which is the initial application of CAD technology in the field of BTI-based DVT staging. Additionally, the approach uses self and cross attention mechanism that aids in leveraging both global and local information which would help models focus on relevant areas aiding in better classification performance. Performance: The authors claim that their method outperforms standard classification approaches like ResNet and vision transformers in DVT staging.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of data: o Although after splitting the 3D data into slices, there appears to be >100K image slices, overall, the dataset is only comprised of 373 cases (shown in table 1). Additionally, in the validation and test sets, the number of cases for ‘acute’ and ‘chronic’ DVT stages are as low as 13 or 14 cases which is an extremely small sample size.
    o Additionally, the authors mention that the scores were generated by two radiologists. However, it is unclear whether the consensus between the two were used as ground truth, or if individual assessments were used to train the models. What does the inter-reader agreement look like? Is there subjectivity in staging between the two readers? Lack of some details regarding model implementation: o Although the authors report that their approach outperforms standard classification models like ViT and ResNet. The details of how these models were trained is missing. Were the 2D slices alone used to train these classic models? 3D data converted to 2D: o Clinically relevant information may be lost while converting 2D data to 3D.
    Performance not reported with respect to individual DVT stages: o The current results section does not show a confusion matrix of DVT stage prediction i.e., how many of the ‘acute’ DVT stages are misclassified as ‘sub-acute’? There may be potential biases towards stages with more data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    o It would be helpful if the authors gave additional details on how the ground truth was obtained (consensus of two radiologists or individual assessment) and some additional information regarding the agreement between the two. o It would be helpful if the authors can provide the details of how the classical models (ResNet, ViT, etc.) were trained. Were the 2D slices alone used to train these classic models? o It would be important to show the performance of the pipeline with respect to different stages of DVT using a confusion matrix instead of accuracy, i.e., for example, how many of the ‘acute’ DVT stages are misclassified as ‘sub-acute’? ● For future work, I would recommend: o Explore 3D models for DVT staging directly from 3D data. o Explore performance of algorithm with respect to sizes of thrombi (i.e., is the performance significantly lower as the size of thrombi gets smaller?). It would be helpful to report the smallest detectable size of thrombi that it can accurately classify. o For the extraction of local features, the image I(L) seems to be largely background since the size of thrombi are small. Would it help if the local images were zoomed-in patches of the thrombi?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors report a framework which is the initial application of CAD technology in the field of BTI-based DVT staging. The method leverages both global and local information to classify the images. The authors have reported the performance of different feature combinations (local and global), highlighting that this approach outperforms standard classification models. The addition of a few details as listed in the comments might help clarify a few details.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have provided additional details regarding 2D slice extraction from 3D data. They have also provided additional clarifications about F1 score extraction




Author Feedback

We deeply appreciate your valuable comments. As you mentioned, VDPF is an innovative classification framework (R4) and the first application of CAD technology in BTI-based DVT staging (R5), expanding diagnostic tools for vascular health (R4). We believe VDPF can greatly contribute to the CAD community. Special thanks to R5 for the direct acceptance and to all reviewers for your positive feedback: 1.Effectiveness (R4: “improves the specificity and reliability” R5: “ outperforms” ); 2.Nice written (R4: “well described” All reviewers appreciate the clarity and organization).

1.Limited description(R3&R5). Due to space constraints, we cannot perform detailed descriptions in the paper. For replication and understanding, we will make our source code publicly available once anonymity is lifted.

  • (R3-“local imaging obtained”) Local imaging is obtained by radiologist manually annotating bounding boxes around thrombus areas, retaining only these regions (Fig.3).
  • (R5-“the details of comparison models training”) We train the models using 2D images cropped around the lesion center, including following steps: 1) Slice the 3D volume to obtain 2D images; 2) Extend the manually annotated bounding boxes by 5 pixels; 3) Crop the extended bounding boxes and unify to 224x224. Remaining training parameters match those of VDPF (Section 3.1).
  • (R5-“ground truth obtained”) We indeed use radiologists’ consensus as the ground truth for training, aligning with clinical standards. BTI-based DVT staging correlates with thrombus signal intensity, vessel morphology, and surrounding tissue edema (Ref [8] and [13]). To address the inter-reader agreement, we calculated the Cohen’s Kappa coefficient, which showed substantial agreement between the two radiologists. For data without consensus, the radiologists discussed and decided collaboratively.

2.Concerns about the model’s robustness(R3&R4&R5). -(R3-“Comparison methods are relatively limited and old”) We compared well-known classical methods in the field of medical imaging, including the recently popular ViT, effectively supporting our performance. -(R4-“lacks cross-validation(CV)”) Due to the large sample size (100,000 slices) and time constraints, we do not perform CV. While CV is common, it may overestimate performance (DOI: 10.3390/s23136077). We use a randomized split for training, validation, and test sets. The test set directly reflects the model’s performance. -(R5-“confusion matrix”) Instead of a confusion matrix, we provide the F1-score, which balances precision and recall to sufficiently indicate potential biases. Accuracy trends across multiple models also highlight VDPF’s superior performance. -(R5-“Lack of data”) Collecting data for DVT is a common challenge in this field (Ref[5] includes only 43 cases). We calculate the required sample size using a One vs. Other approach with MedCalc (V.15.6.1.0), which statistically meets the hypothesis requirements. We also employ data augmentation and Focal Loss to mitigate small sample size and class imbalance issues.

3.Why is the performance of resnet-50 better than Resnet50-DPF(R3)? ResNet’s convolution mechanism does not focus on small targets like ViT. Additionally, the preprocessing differs between ResNet (cropped to enhance lesions) and Resnet50-DPF (without cropping). These factors prevent Resnet50-DPF from effectively combining local and global features, leading to lower performance than expected.

  1. Explanation of Fig. 1(R4). Fig.1 is explained as follows (Section 1, 4th paragraph, 3rd line): the high similarity of DVT to other tissues and its small size hinder the network’s focus. Lesion-centered cropping may also lose crucial information like muscle signals and edema, limiting the model’s feature learning for DVT staging.

5.Why use 2D data(R5)? We focus on the application of local imaging, regardless of whether it is 2D or 3D. Additionally, due to equipment limitations, we currently do not use 3D data for training.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper employs a Global-Local feature fusion module (GLFM) to improve the classification accuracy of DVT staging by slicing 3D images into global and local images and utilizing self-attention and cross-attention to fusion different levels of information. This fine-grained method significantly outperforms the existing methods. The concerns of the paper concentrate on unclear method descriptions and the model’s robustness. In the author’s rebuttal, these two questions were properly answered.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper employs a Global-Local feature fusion module (GLFM) to improve the classification accuracy of DVT staging by slicing 3D images into global and local images and utilizing self-attention and cross-attention to fusion different levels of information. This fine-grained method significantly outperforms the existing methods. The concerns of the paper concentrate on unclear method descriptions and the model’s robustness. In the author’s rebuttal, these two questions were properly answered.



back to top