Abstract

Esophageal fistula (EF) is a critical and life-threatening complication following radiotherapy treatment for esophageal cancer (EC). Albeit tabular clinical data contains other clinically valuable information, it is inherently different from CT images and the heterogeneity among them may impede the effective fusion of multi-modal data and thus degrade the performance of deep learning methods. However, current methodologies do not explicitly address this limitation. To tackle this gap, we present an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF detection before radiotherapy. Our MDC model comprises a clinical data encoder, an adaptive 3D Trans-CNN image encoder, and a dual-layer cross-attention (DualCrossAtt) module. The Image Encoder utilizes both CNN and transformer to extract multi-level local and global features, followed by global depth-wise convolution to remove the redundancy from these features for robust adaptive fusion. To mitigate the heterogeneity among multi-modal features and enhance fusion effectiveness, our DualCrossAtt applies the first layer of a cross-attention mechanism to perform alignment between the features of clinical data and images, generating commonly attended features to the second-layer cross-attention that models the global relationship among multi-modal features for prediction. Furthermore, we introduce a contrastive learning-enhanced hybrid loss function to further boost performance. Comparative evaluations against eight state-of-the-art multi-modality predictive models demonstrate the superiority of our method in EF prediction, with potential to assist personalized stratification and precision EC treatment planning.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0148_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_AMultiInformation_MICCAI2024,
        author = { Zhang, Jianqiao and Xiong, Hao and Jin, Qiangguo and Feng, Tian and Ma, Jiquan and Xuan, Ping and Cheng, Peng and Ning, Zhiyuan and Ning, Zhiyu and Li, Changyang and Wang, Linlin and Cui, Hui},
        title = { { A Multi-Information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Prediction of esophageal fistula before radiotherapy is critical for patients with esophageal cancer. In this study, the authors propose an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF prediction. Specifically, the MDC model comprise a clinical data encoder, an adaptive transformer and CNN image encoder where global deep-wise convolution layer is used to extract more effective multi-view features with less redundancy. Subsequently, a dual-layer cross attention module is introduced to align multi-modal data and capture global relationships. Besides cross entropy loss for classification, the authors introduce a contrastive loss to pull features cluster features of the same class closer and push features of different classes away. Experimental evaluations on 553 EC patients demonstrate the superiority of the MDC model over several existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Predicting esophageal fistula before radiation therapy is of clinical significance.
    2. The proposed MDC model using both CT images and clinical data is reasonable and seems reproducible. The organization of the paper is good, and the paper is well-written.
    3. The overall experimental setup is comprehensive and ablation study demonstrate the effectiveness of the key components of the proposed model.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. No ethical statement in the paper.
    2. The description on the dataset is confusing, especially on clinical data and data augmentation. Adding more details in the supplementary material is appreciated.
    3. Several parameters (𝜔, m) in the Loss function are not clear. Which value did the authors set them to. Please add some details.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    none.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Is the study approved by the IRB? If yes, please add ethical statement in the paper.
    2. It is strange that the resolution of the CT images of esophageal cancer patients is 51251227. Is there something wrong, or the authors conducted image preprocessing?
    3. The data augmentation is performed only on CT images? How the data augmentation was performed? It is confusing that 3142 samples (1674 EF + 1468 non EF) are generated using data augmentation from 553 EC patients (367 EF + 186 non EF). Also, the process of converting 34 variables into 78 dimensions is unclear. Does inputting all of these variable into the model introduce redundancy and irrelevance, why not select some key clinical features?
    4. Several parameters (𝜔, m) in the Loss function are not clear. Which value did the authors set them to. Please add some details.
    5. The first cross attention shown in figure 1 is not corresponding to the equation (3).
    6. In the introduction, the authors stated that “Our work is critical as early prediction of this radiation therapy caused EF enable radiologists to develop more personalized treatment plans, thereby improving the quality of life for patients with EC”. As I know, treatment plans for esophageal cancer is not made by radiologists.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    No ethical statement; The description on the dataset and parameters in the loss function.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel method to address the classification of Esophageal Fistula, which is highly relevant to the prognosis of Esophageal cancer. Specifically, it combines CNN and Transformer architecture to extract both local and global features and integrates tabular data to enhance classification task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    As mentioned in the contribution section, the method proposed in this paper is effective and reasonable. The multimodal model that combines textual and image information has the potential to better perform prognosis tasks. Additionally, the paper includes extensive experimental analysis and comparisons with other state-of-the-art methods to demonstrate the effectiveness of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The two main technological advancements in the paper, combining CNN and Transformer structures[1] and integrating textual information in a multimodal model[2], have already been proposed for prognostic tasks, which suggests that this paper may tend towards being an incremental work. [1].Hexin Dong, et al. “Improved Prognostic Prediction of Pancreatic Cancer Using Multi-phase CT by Integrating Neural Distance and Texture-Aware Transformer”. MICCAI 2023. [2].Ding, Kexin, et al. “Pathology-and-genomics multimodal transformer for survival outcome prediction.” MICCAI, 2023.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.See the weakness, which is my major concern. 2.The experimental results in Table 1 show that the proposed method has a significantly larger standard deviation than other methods, which requires clarification. 3.The formatting of the equations in the paper (eq5, eq6) needs further improvement.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I believe that the paper is acceptable. However, there exist certain flaws regarding innovation and experimental design. Thus, further deliberation and discussion are recommended.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper describes a novel system to predict esophageal fistula from CT and tabular clinical data in patients undergoing radiotherapy to treat esophageal cancer. Images and tabular data are passed through separate encoder pipelines tailored to each data type, and then fused in a unique dual-layer cross-attention module. The resulting fused features are used for classification, and also sent to a contrastive loss function which forces the network to map patients in the same group (having EF or not) to closer feature embeddings. The method outperforms other methods in the literature, and an ablation study demonstrates the added value of various aspects of the design of the architecture, including the dual-layer cross-attention module.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper describes what I understand to be a unique approach to solving an important clinical problem, and compares favorably to other methods in the literature.

    It compares the method to a large number of other methods which makes the results more convincing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the data and problem is lacking. In particular, it is not completely clear to me which problem the paper is solving? Are the data all pre-treatment, and therefore the system is predicting cases that will develop EF after treatment? Or is the imaging immediately post-treatment, and predicting cases that will develop complications (EF) in the future? Or is the data from after the development of EF, and diagnosing EF in those cases? How were the ground truth diagnoses obtained? The lack of details makes it harder to judge the results and clinical importance of the contribution.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The algorithm seems well-specified enough to be reproducible. If possible, I would encourage the authors to consider releasing code to facilitate easier reuse of the dual-layer cross-attention mechanism.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    First and foremost, I think the details about the data being used need to be more clearly specified, as I discussed in the answer to question 6.

    For the ROC curves in Fig 2 it would be helpful to clarify how these were obtained? Are these for a single fold out of 5? Are they somehow computed from all folds?

    For the comparison with other methods, it is not stated whether the given numbers are based on implementations by the authors on the same dataset? I believe that’s the case, but it would help to state it more clearly. Furthermore, it would be helpful to know if the models were run on exactly the same 5 folds (i.e. the same partition was used for every model).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper seems technically sound and of interest to the community, but as I discussed above, I feel the statement of the problem being solved is insufficiently clear. If the authors can clarify exactly what problem they are solving with what data, I think it is worthy of acceptance.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top