Abstract

Esophageal cancer is one of the most common types of cancer worldwide and ranks sixth in cancer-related mortality. Accurate computer-assisted diagnosis of cancer progression can help physicians effectively customize personalized treatment plans. Currently, CT-based cancer diagnosis methods have received much attention for their comprehensive ability to examine patients’ conditions. However, multi-modal based methods may likely introduce information redundancy, leading to underperformance. In addition, efficient and effective interactions between multi-modal representations need to be further explored, lacking insightful exploration of prognostic correlation in multi-modality features. In this work, we introduce a multi-modal heterogeneous graph-based conditional feature-guided diffusion model for lymph node metastasis diagnosis based on CT images as well as clinical measurements and radiomics data. To explore the intricate relationships between multi-modal features, we construct a heterogeneous graph. Following this, a conditional feature-guided diffusion approach is applied to eliminate information redundancy. Moreover, we propose a masked relational representation learning strategy, aiming to uncover the latent prognostic correlations and priorities of primary tumor and lymph node image representations. Various experimental results validate the effectiveness of our proposed method.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1370_paper.pdf

SharedIt Link: https://rdcu.be/dV18N

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_44

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1370_supp.pdf

Link to the Code Repository

https://github.com/wuchengyu123/MMFusion

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wu_MMFusion_MICCAI2024,
        author = { Wu, Chengyu and Wang, Chengkai and Zhou, Huiyu and Zhang, Yatao and Wang, Qifeng and Wang, Yaqi and Wang, Shuai},
        title = { { MMFusion: Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {469 -- 479}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper’s contributions include the development of the MMFusion model for lymph node metastasis diagnosis in esophageal cancer, which integrates CT images, clinical measurements, and radiomics data through a multi-modal heterogeneous graph-based conditional feature-guided diffusion model. Additionally, the research introduces the Multi-tissue Masked Relational Learning (MMRL) strategy for exploring and learning prognostic-related relationship priorities and interactive information among multi-tissues. Furthermore, the paper presents an optimization method that utilizes Binary Cross-Entropy (BCE) and Mean Square Error (MSE) loss functions for model training and alignment in the MMRL strategy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    One of the main strengths of the paper is the introduction of the Multi-tissue Masked Relational Learning (MMRL) strategy, which leverages intra-tissue self-attention and cross-tissue relational mask self-attention for representational learning to explore the priority modeling of inter-tissue relationships and the interactive learning of prognostic information. This strategy is novel in its approach to learning prognostic-related relationship priorities and interactive information among multi-tissues, addressing a gap previously overlooked in research. Additionally, the paper’s use of the Conditional Feature-guided Diffusion (CFD) process, which refines the diagnostic process by utilizing features aggregated from various data sources, including CT images, clinical, hematological, and radiomics data, is a strong aspect of the work. The CFD process simulates the data generation process, progressively reconstructing noise distributions to accurately capture the complete multi-modal conditional distribution of inputs, thereby improving predictive accuracy and reliability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the main weaknesses of the paper is the lack of comparison with existing methods in terms of clinical feasibility and real-world applicability. While the paper demonstrates the effectiveness of the proposed MMFusion model in diagnosing lymph node metastasis, it does not provide a direct comparison with other existing diagnostic methods commonly used in clinical practice. This omission limits the assessment of how the proposed model would perform in a real clinical setting and hinders the evaluation of its practical utility and feasibility. Additionally, the paper does not thoroughly discuss the potential limitations or challenges associated with implementing the proposed MMFusion model in a clinical environment, such as data acquisition, computational resources required, or integration with existing diagnostic workflows. Addressing these aspects would provide a more comprehensive understanding of the practical implications and limitations of the proposed approach.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Regarding the paper’s reproducibility, it is important to note that the detailed descriptions of the MMRL strategy, optimization methods, and experimental procedures provided in the paper contribute to enhancing the reproducibility of the study. The inclusion of specific details about the model architecture, loss functions, dataset description, and evaluation metrics allows other researchers to replicate the experiments and validate the results. Additionally, the ablation study conducted to evaluate the effectiveness of the proposed sub-modules in the MMFusion model provides insights into the individual contributions of each component, further supporting the reproducibility of the findings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    To enhance this work, it is suggested that the authors can further explore different backbone network structures and experimentally validate them in conjunction with relational masking techniques. The performance of the model can be improved by continuing to optimize the loss function design of the model, especially in terms of the BCE and MSE loss functions in the non-diffusion part. In addition, the authors may consider expanding the dataset size or introducing other types of medical imaging data to verify the generalization ability and robustness of the model. These improvements are expected to further enhance the quality and usefulness of the study.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The reasons for the suggestion include that the performance of the model can be further improved by optimizing the loss function design of the model, especially in the BCE and MSE loss functions of the non-diffusion fraction. In addition, expanding the size of the dataset or introducing other types of medical imaging data to verify the generalization ability and robustness of the model is also a key factor to improve the quality of the work. Combined with the relational masking technology for experimental verification, different backbone network structures can be further explored, so as to improve the performance of the model.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces a new framework for diagnosing lymph node metastasis in esophageal cancer using a multi-modal dataset. The method begins with the Multi-tissue Masked Relational Learning (MMRL) strategy, which extracts relevant representations from tumor and lymph node masks. It then utilizes a heterogeneous graph-based model to integrate various data types, including tumor and lymph node representations extracted through MMRL, along with blood, clinical measurements, and radiomics data, thereby modeling the correlations between different modalities. A unique feature of this method is the Conditional Feature-guided Diffusion process, which effectively explores the complex relationships between the features extracted from different modalities and reduces redundancy, aiming to output more accurate predictions of lymph node metastasis. The method’s validation on 1,354 cases confirms its potential to significantly enhance the accuracy of lymph node metastasis diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper delineates several significant strengths in its approach to diagnosing lymph node metastasis in esophageal cancer using a multi-modal dataset:

    1.Innovation in Intent and Algorithm Design: The intent and algorithmic design of this paper are highly innovative. While many existing methods attempt to leverage multi-modal data fusion for diagnosing clinically relevant issues, the actual outcomes are often barely satisfactory. This involves two critical challenges: a) effectively mining the complementary information between multi-modal data, and b) efficiently eliminating interference factors among multi-modal data. In this article, the authors propose innovative solutions to these critical challenges. The use of Multi-tissue Masked Relational Learning (MMRL) strategies and heterogeneous graphs to extract inter-modal relational representations is creative. It allows the model to capture complex prognostic correlations and supports a comprehensive understanding of disease dynamics. Moreover, the introduction of the Conditional Feature-guided Diffusion (CFD) process to minimize information redundancy across multi-modal features represents a novel methodological advancement. 2.Robust Validation on a Large-Scale Clinical Dataset: The framework has been thoroughly validated on a large-scale clinical dataset comprising 1,354 cases, where it exhibited superior performance across several metrics including accuracy, precision, and F1 score. This robust validation not only proves the efficacy of MMFusion but also underscores its potential applicability and impact in clinical settings.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper on the MMFusion framework for diagnosing lymph node metastasis in esophageal cancer using a multi-modal dataset exhibits substantial innovation but also presents several weaknesses:

    1.Lack of Experimental Detail: A significant weakness is the inadequate detail regarding how key experimental components were managed, such as the acquisition and application of lymph node and tumor masks. Given that annotating lymph node masks from CT images is highly time-consuming and labor-intensive, it remains unclear how these masks were generated, and why only three lymph node masks were inputted. In addition, the gold standard for lymph node metastasis is extremely challenging, and the authors are advised to further interpret the gold standard for lymph node metastasis in such large data. It is not clear whether the gold standard refers to the level of individual lymph nodes or the patient level. 2.Clarity on Multi-Modal Data Handling: The methods used to acquire and integrate multi-modal data, such as radiomics, blood, and clinical measurements, need clearer exposition. The paper does not fully explain how these different types of data are processed and integrated into the heterogeneous graph model. For instance, it is not specified whether radiomics data were obtained through image processing methods or extracted using another deep learning model. 3.Ablation Study Results and Comparison with SOTA Methods: Concerns also arise from the experimental comparisons. For example, the ablation study indicates that even the most basic version of the proposed model (Base1) outperforms several state-of-the-art (SOTA) methods discussed. This raises questions about whether the experimental setups were consistent across all tested methods. Clarifying this would help validate the superiority claims made by the authors.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provided the code via an anonymous link, but the large-scale clinical data used was not made public, and it seems that such a large-scale data set is also difficult to make public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.Clarification of Experimental Details: The paper lacks detailed explanations on the acquisition and application of lymph node and tumor masks, which are crucial components of the framework. Given the labor-intensive process of annotating lymph node masks from CT images, it is essential to specify how these masks were generated and integrated into the model. Furthermore, the process for determining the gold standard for lymph node metastasis—whether at the individual lymph node level or the patient level—needs clear definition, especially considering the large dataset of over 1,000 cases used in the study. 2.Detailed Explanation on Data Handling: The methodology for obtaining and integrating multi-modal data, including radiomics, hematological, and clinical measurements, should be elaborated. Specifically, the process for acquiring radiomics data, whether through image processing methods or via another deep learning model, should be explicitly stated. This clarification will help in understanding the preprocessing steps and how these diverse data types are incorporated into the heterogeneous graph model. 3.Consistency in Experimental Comparisons: There are ambiguities regarding the experimental setup, particularly in the ablation studies where even the base model appears to surpass several state-of-the-art methods. It is necessary to confirm if the experimental conditions were consistent across all methods compared. Detailed information on the specific implementations and settings used in these comparisons would substantiate the claims of superiority and provide a fair benchmarking against existing methods. 4.Correction of Errors in Visualizations: It is noted that in Figure 1, the terms in the heterogeneous graph network appear to be mislabeled—’Hematology_T’ and ‘Hematology_N’ should replace ‘Radiomics_T’ and ‘Radiomics_N’. Correcting these labels will prevent confusion and improve the accuracy of the documentation supporting the graphical models used.

    In addition, it is recommended that the authors introduce an external validation dataset to more effectively evaluate the robustness and clinical applicability of the proposed method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation for this paper is positive due to its clear intention, substantial innovation, and excellent empirical performance on a large dataset, highlighting its value in the field. The paper is well-written, with a clear presentation of its goals and methods, making it accessible and informative. The introduction of the MMFusion framework represents a significant innovation in medical imaging for diagnosing lymph node metastasis in esophageal cancer. The validation on a dataset of over 1,354 cases, with superior performance metrics, underscores its efficacy and potential clinical impact.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors provided a Multi-modality Diffusion Model for Lymph Node Metastasis Diagnosis in Esophageal Cancer, which included MMRL, HGA, and CFD algprithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The noval diagnosis methods. The best predict results with other methods in the large dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think it’s good without main weaknesses.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Not only the code, and the pre-trained model can be provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In Table 1, it would be beneficial to provide the network architectures or algorithm names of other people’s methods to form a better comparison. For example, by adding an additional column of content.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is very comprehensive in terms of algorithm innovation, datasets, and experimental results, and the outcomes are quite good. I believe it is an excellent piece of work on multimodal information fusion for disease diagnosis.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Response to Reviewer # 1: Thank you very much for recognizing our work. Your valuable comments will be crucial for us to improve the quality of our work in the future. Currently, this study on multimodal lymph node metastasis diagnosis for esophageal cancer is unique and rare, and such multimodal datasets are globally scarce and lack attention. Combining these CT lymph node and tumor modalities and annotating multimodal datasets require a significant amount of work. Therefore, at this preliminary stage, due to time and length constraints, we have presented our work to this extent. In future work, as you suggested, we will consider a more comprehensive comparison in terms of clinical feasibility and real-world applicability, including comparisons with other LNM diagnostic methods. Additionally, we will consider multi-center validations and include similar modality data for other lesions. This will deepen our research on the data. For the second point regarding the discussion of limitations, we have added this in the camera-ready version of the paper. Critically, we will further test the impact of backbone network on model performance and further explore the utility of the loss function part in future work. Thank you again for your review effort.

Response to Reviewer # 3: Thank you very much for recognizing our work. Your valuable comments will be crucial for us to improve the quality of our work in the future. The masks in the CT images were annotated by doctors from Sichuan Cancer Hospital. Regarding the reason of input of three largest masks, we selected the three largest lymph nodes (90×90×22; 68×68×16; 48×48×8) because the remaining lymph nodes are too small and provide insufficient information. Furthermore, the unified selection of three lymph node regions for feature extraction is conducive to the maximum efficiency of the model, and larger lymph nodes can provide valuable prognostic information to the maximum extent and provide minimal redundancy. The gold standard for lymph node metastasis in large dataset is at patient level, which is the same as our paper. Besides radiomics data, other data were clinically collected by Sichuan Cancer Hospital, and the radiomics data were extracted using the Pyradiomics package. We will supplement the details of the data acquisition process and the ablation study in the camera-ready version of the paper and correct any textual errors. Thank you again for your review effort.

Response to Reviewer # 4: Thank you for your recognition of our work. We will further extend our work in the near future. Thank you very much!




Meta-Review

Meta-review not available, early accepted paper.



back to top