Abstract

Accurate detection of aortic dissection (AD) in emergency settings is of significant importance, as misdiagnosis can significantly delay subsequent treatments and even endanger patients’ lives. Currently, non-contrast CT scans are standard protocols in emergency departments for patients with chest pain, yet their ability to detect AD remains limited. We introduce a novel multimodal contrastive learning framework designed to learn discriminative features from both contrast-enhanced CT and corresponding diagnostic reports. These features are then aligned with non-contrast CT scans through a multimodal contrastive learning approach. Specifically, we first segment and straighten the aorta to effectively apply attention to the aortic area. Finally, the pre-trained encoder is fine-tuned for the tasks of AD detection and lumen segmentation using non-contrast CT scans. Our experiments, conducted on a test dataset comprising 239 subjects (127 with AD and 112 without), demonstrated that the proposed framework achieves an accuracy of 0.958, an F1-score of 0.969, and an AUC of 0.983 in AD detection. These results surpass those of six state-of-the-art classification models. In lumen segmentation experiments, the framework achieves an average DSC of 0.705, outperforming others. These findings indicate that our proposed framework not only outperforms existing AD detection methods but also holds the potential to accurately localize false lumen using non-contrast CT scans alone.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4023_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaDuo_AMultimodal_MICCAI2025,
        author = { Zhang, Duoer and Xiao, Wenbo and Jiang, Chen and Qiu, Yuxuan and Feng, Zhan and Wang, Hong and Zheng, Yefeng and Zhu, Wentao},
        title = { { A Multimodal Contrastive Learning for Detecting Aortic Dissection on 3D Non-Contrast CT with Anatomy Simplification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15966},
        month = {September},
        page = {2 -- 12}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel multimodal contrastive learning framework for detecting aortic dissection using non-contrast CT scans. The authors’ approach employs contrastive learning to align features from NC-CT with those from contrast-enhanced CT and diagnostic reports. The pre-trained encoder is subsequently fine-tuned for AD detection and lumen segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The multimodal approach combining NC-CT, CE-CT, and text reports through contrastive learning represents a novel integration of complementary information. The anatomical simplification through aorta segmentation and straightening is an effective preprocessing step. The evaluation is comprehensive, including comparison with multiple state-of-the-art methods and detailed ablation studies.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper lacks details on the standardisation process using GPT-4o for text processing, making it difficult to assess the reproducibility. While using 5mm slice thickness is highlighted as clinically advantageous, there is no comparative analysis. The dataset comes from a single institution so there is limited proof of generalisability.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a novel and effective approach to a clinically significant problem with strong technical contributions. The multimodal contrastive learning framework produces impressibe results. The results demonstrate clear advantages over existing methods for both detection and segmentation tasks. There are some limitations in terms of reproducibility details (i.e. the text preprocessing details).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper
    • Novel Multimodal Pipeline: The paper proposes a 3D multimodal contrastive learning framework that leverages (1) non-contrast CT (NC-CT), (2) contrast-enhanced CT (CE-CT), and (3) textual diagnostic reports. By aligning these different modalities in a shared feature space, the method learns more discriminative representations for detecting aortic dissection (AD) from NC-CT alone.
    • Aorta Segmentation & Straightening: Before training, the authors segment and “straighten” the aorta, drastically simplifying the geometry and focusing the model on the critical aortic region.
    • Sub-task Networks: The learned features from multimodal contrastive pre-training are reused (frozen encoder) in two downstream tasks: (1) AD classification (detecting whether an aorta is dissected or not) and (2) true/false lumen segmentation.
    • Outperforming SOTA: On a test set (239 subjects), the proposed approach shows improved accuracy, sensitivity, and AUC for AD detection compared to multiple state-of-the-art baselines. It also achieves the best performance in segmenting the true and false lumens.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Effective Use of Multiple Modalities Incorporating CE-CT and corresponding diagnostic text (radiology reports) provides richer supervision signals than NC-CT alone. This multimodal training appears to yield more discriminative features.
    • Aorta-Focused Preprocessing By segmenting and straightening the aorta, the paper reduces extraneous content in the 3D volume. This anatomical simplification is well-motivated and helps the model focus on the most relevant structures.
    • Strong Experimental Results The method surpasses existing classification (ResNet variants, DenseNet, etc.) and specialized AD detection frameworks on multiple metrics, demonstrating notable improvements in both detection and lumen segmentation accuracy.
    • Ablation Studies The paper systematically breaks down the contributions of each component (NCCT–CECT alignment, text incorporation, aorta straightening) and shows why each step matters.
    • Potential Clinical Applicability Performing accurate AD detection on NC-CT (which is widely available, faster, and cheaper than CE-CT) addresses an important clinical gap, particularly for emergency settings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Dataset Scale & Diversity Although the paper’s test set is reasonably sized (239 subjects), the overall dataset (even including training) may be relatively small for large-scale deep learning and might limit generalization to more diverse clinical settings.
    • Complex Workflow The pipeline involves multiple steps (aorta segmentation, straightening, multimodal pre-training). Each step might introduce complexity or potential failure points, especially if integrated into a real-time or fully automated clinical workflow.
    • Textual Data Availability The approach relies on textual diagnostic reports paired with CE-CT scans. In practice, the consistency and format of these reports can vary widely. The benefit of the text modality may not always translate if standardized radiology reports are not available.
    • Limited External Validation The paper reports strong results on an internal dataset, but performance on external or multi-center data remains untested. Real-world generalization might need further studies.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • High Novelty and Significance: Multimodal contrastive learning (combining 3D imaging and textual reports) is a strong and relatively fresh contribution for aortic dissection detection on non-contrast CT scans.
    • Strong Empirical Performance: The large gains over state-of-the-art methods underscore the paper’s technical merit and practical potential.
    • Potential Clinical Impact: By narrowing the performance gap between NC-CT and CE-CT in diagnosing AD, the method could help prevent dangerous diagnostic delays—especially relevant in emergency departments.
    • Areas for Refinement: The relatively small dataset size, multi-step pipeline complexity, and lack of external validation temper the enthusiasm somewhat, preventing a perfect score.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a novel 3D multimodal contrastive learning framework for detecting aortic dissection (AD) and segmenting the lumen using non - contrast CT (NC - CT). The key innovative aspects are as follows: 1. It integrates NC - CT, CE - CT, and textual reports via contrastive learning. This approach enhances the alignment and representation of features, leveraging the complementary information from different data sources. 2. The framework features a pre-trained encoder architecture. It can support downstream tasks such as AD classification and lumen segmentation using 5-mm-thick NC-CT. This enables cost - effective emergency diagnostics, which is highly valuable in clinical settings. 3. The proposed framework outperforms state - of-the-art models. For example, it achieves an accuracy of 0.958 in AD detection and an average Dice Similarity Coefficient (DSC) of 0.705 in segmentation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The approach of using CE-CT and text reports to enhance NC-CT feature learning is novel. By doing so, it overcomes the limitations of single-modality methods, providing a more comprehensive data-driven solution.
    2. Concentrating on 5-mm-thick NC-CT is in line with real-world emergency room workflows. This reduces the need for contrast agents, which has significant practical and safety implications in clinical settings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The training dataset, consisting of only 580 subjects, has insufficient demographic and pathological variety. This could restrict the model’s ability to generalize across different patient groups.
    2. By excluding non-aortic regions like surrounding tissues, the study might be missing diagnostically important information. A broader anatomical scope could enhance diagnostic accuracy.
    3. The extensive use of pretraining (500 epochs) and reliance on high - end GPU resources (NVIDIA V100) may pose challenges for clinical adoption, as many clinical facilities may not have such resources readily available.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel multimodal framework and strong empirical evidence. Integrating text, CE-CT, and NC-CT is a notable advancement in AD diagnostics. The demonstrated use of 5-mm NC-CT, which meets emergency needs and offers a cost-effective alternative to CE-CT, is highly relevant.

    While there are major concerns like a limited dataset and lack of clinical validation, these do not overshadow the paper’s contributions. The authors should address these in future work. Overall, the paper provides a solid basis for enhancing AD diagnostics in emergency scenarios.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their insightful comments and are encouraged by the overall positive comments on our work.

R1 and R2 raised concerns about the single-institution data source. We fully acknowledge this limitation and have conducted some preliminary experiments on an external dataset as suggested. The results demonstrate an accuracy of approximately 0.91. We are currently testing our method on larger external datasets for future research.

R2 and R3 mentioned the limited dataset size for large-scale deep learning and insufficient demographic and pathological variety. To address this, we selected ResNet-50 as our backbone due to its relatively small parameter count. Additionally, we reduced geometric complexity by straightening the aorta and cropping the images. We agree that larger datasets would enable broader architectural exploration (e.g., Vision Transformers), which we will investigate as our dataset expands.

R3 also suggested a broader anatomical scope to avoid missing diagnostically relevant information. As we discussed in our manuscript, this may increase the computational costs and training difficulty.

We appreciate R1’s request for clarification on GPT-4o-based text processing. The workflow is illustrated in Figure 2, including the LLM prompting strategy. We will add a detailed description in the main text.

R1 also requested the analysis of 5mm slice thickness. This is a clinical advantage stems from our emergency settings and therefore we did not explore it separately.

R2 mentioned the textual data availability. We introduced the powerful LLM tool to maximize the use of clinical reports, and include the data in our research as much as possible. This is indeed an unavoidable problem when growing our datasets.

R2 mentioned that multiple steps may introduce complexity or failure in an automated clinical workflow. We prioritized clinically validated tools to ensure robustness. Our implementations demonstrated stable performance, but we will further validate them in diverse clinical settings.

R3 mentioned the heavy pre-training and GPU resources we used. The entire training was conducted independently on a computational platform. Clinical deployment requires significantly fewer resources.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top