Abstract

Accurate identification of patients who achieve pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) is critical before surgery for guiding customized treatment regimens and assessing prognosis in breast cancer. However, current methods for predicting pCR primarily rely on single modality data or single time-point images, which fail to capture tumor changes and comprehensively represent tumor heterogeneity at both macro and micro levels. Additionally, complementary information between modalities is not fully interacted. In this paper, we present M2Fusion, pioneering the fusion of multi-time multimodal data for treatment response prediction, with two key components: the multi-time magnetic resonance imagings (MRIs) contrastive learning loss that learns representations reflecting NAC-induced tumor changes; the orthogonal multimodal fusion module that integrates orthogonal information from MRIs and whole slide images (WSIs). To evaluate the proposed M2Fusion, we collect pre-treatment MRI, post-treatment MRI, and WSIs of biopsy from patients with breast cancer at two different collaborating hospitals, each with the pCR assessed by the standard pathological procedure. Experimental results quantitatively reveal that the proposed M2Fusion improves treatment response prediction and outperforms other multimodal fusion methods and single-modality approaches. Validation on external test sets further demonstrates the generalization and validity of the model. Our code is available at https://github.com/SongZHS/M2Fusion.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0979_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0979_supp.pdf

Link to the Code Repository

https://github.com/SongZHS/M2Fusion

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_M2Fusion_MICCAI2024,
        author = { Zhang, Song and Du, Siyao and Sun, Caixia and Li, Bao and Shao, Lizhi and Zhang, Lina and Wang, Kun and Liu, Zhenyu and Tian, Jie},
        title = { { M2Fusion: Multi-time Multimodal Fusion for Prediction of Pathological Complete Response in Breast Cancer } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose the integrated use of multi-time and multi-modality learned features to predict treatment response in cases of breast cancer. A contrastive learning module compares pre and post-treatment MRI, while vision transformers extract features from whole slide images. The extracted features are fused using an attention-based orthogonal fusion module.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The rationale for the proposed method is interesting
    • The proposed framework is novel in its integration of multi-time and multi-modal features.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There are some details missing about the implementation on the manuscript
    • The ROC curves should be presented and the results discussion should be more elaborate, for example focusing on occurring false positives.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although the proposed framework is interesting, the presented results are not fully convincing of its merits. Result discussion is clearly insufficient, given the nature of the tackled problem. The obtained ROC curves are not presented. The authors only compare AUC values, which are not that high to begin with. Thus, a more elaborate discussion on the obtained results would be essential to fully understand the value of the obtained results. For example, further insights on false positives obtained at the optimal operating point could be provided: Were there many clear non-pCR cases identified as pCR? If this was the case, can the authors identify any issues that may lead to that?

    The conclusion section does not provide much more insight that what is given in section 4.2.

    In section 4.1 (data collection), the authors should provide more details on data annotation (e.g., how many experts provided pCR labels for both datasets? Inter-observer variability?)

    Still in section 4.1 (data collection): the authors mention that cohort A was randomly split into training and validation subsets. Assuming the reported results were obtained in single runs (i.e., no cross-validation was performed), the authors should at least provide information on how many pCR / non-pCR cases were used in training and validation. Also, it was not clear to the reviewer if the same training/validation split was used for every instance of the ablation study and method comparison.

    The manuscript could also be improved in terms of writing. Some of the identified issues are:

    • The abstract should be self-contained. Not all acronyms are defined at its first usage in the abstract (e.g. MRI).
    • The reviewer does not understand the usage of “interact/interacted”. Is this a typo on “integrate/integrated”? See abstract and pg. 2 (2nd full paragraph).
    • Pg. 1, last two lines: “single modal” -> “single modality”
    • Pg. 2, end of section 1: “To the best of our knowledge (…)”. This sentence needs to be rewritten. For example, something like: “To the best of our knowledge, this work is the first to use multi-time multi-modal data simultaneously for treatment response prediction (…)”.
    • “radiomics features” -> “radiomic features”
    • Section 2.2: VQA acronym not defined
    • Fig.1: the use of L_contra to denote multi-time contrastive learning is not consistent with the rest of the manuscript (L_cont)
    • Pg. 4, last two lines: from my understanding, “f_post, f_post” should be “f_pre, f_post”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The obtained ROC curves are not presented. The authors only compare AUC values, which are not that high to begin with. Thus, a more elaborate discussion on the obtained results would be essential to fully understand the value of the obtained results. For example, further insights on false positives obtained at the optimal operating point could be provided.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed all my concerns in a clear way. If the proposed additions are included in a final version of the paper, its quality will be greatly improved.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method to identify patients who achieve a pathological complete response (pCR), defined as the absence of all signs of cancer in tissue samples removed during surgery or biopsy after treatment with neoadjuvant chemotherapy. This is crucial in guiding treatment for breast cancer patients. Unlike typical studies that use a single modality image from one timepoint, this approach combines data from different modalities (multimodal) and different time points (multitime). A novel module for incorporating multimodal data such as MRI and WSI is proposed. The method is evaluated using datasets from two hospitals, involving 375 and 204 patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written, presenting a clear and well-structured communication of the problem, related works, and methods. The novelty lies in the multi-modal fusion and multi-time contrastive loss, adding a fresh perspective to the field. The work is well-presented, including comprehensive figures and tables. Notably, the authors conducted a dedicated ablation study to investigate the individual components of the proposed method and verified the method on an external validation set. It was commendable to see that external validation data from a different medical centre was used, reinforcing the robustness of the findings.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Equations make it challenging to follow the content. Clear and concise formatting of these elements on separate lines might be beneficial. -The results presented in Table 2 primarily compare the proposed model with single modal models, concatenation methods, and other basic fusion-based methods. However, it would be interesting to see comparisons with more advanced attention-based fusion methods, similar to the proposed attention-based fusion module. -The experiment section should specify the number of runs conducted. Are the numbers in the tables average results or one-time results? Including the standard deviation might offer a more comprehensive view of the results. -Furthermore, some parts of the methodology, such as bilinear pooling or attention scores for WSI embeddings, are not discussed in detail. Finally, a comparison with other works in PCR prediction would enrich the analysis. No statistical evaluation of results

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors mention, “Our code is available at XXX.”. I assume the link would be made available if the paper is considered for publication. 

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although the paper intriguingly focuses on multi-time, multi-fusion methods, its stated primary objective is predicting pathological complete response in breast cancer. However, the results are mostly compared with other multi-fusion methods, which makes it difficult to evaluate the main motivation. Additionally, considering other contrastive learning methods (e.g., MoCo, SimCLR), and applying them to Concat and other fusion methods could provide further insight into the model performance. When forcing non-PCR features to be the same, the focus on the tumor rather than other parts in the MRI needs to be assured. Feature visualization might assist in this.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written. However, while the motivation focuses on PCR prediction, the results are not compare it with other PCR prediction methods. Additionally, claims of superior results lack statistical performance support. The mathematical content should be better structured for a clearer understanding of the workflow. On the positive side, the paper includes an additional test set from a different hospital and claims independent testing to verify the generalizability of the method. Therefore, the paper may be considered for publication, depending on the rebuttal from the authors.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The revisions and responses provided by the authors are satisfactory. They have enhanced the clarity of the language, figures, and equations in the paper. Additionally, they have included more baseline comparisons, which effectively underscore the advantages of their proposed method. Moreover, the authors have offered more comprehensive details about the implementation and the experimental setups. Given these improvements, the paper is now in a position to be considered for acceptance.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a model to combine pre- and post-treatment MRI information with WSI images to predict the pathological complete response after neoadjuvant chemotherapy in breast cancer.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strong points of this paper are:

    • it is very well written, the information is clear and well articulated in general.
    • On a technical level, it has the added value of bringing together two types of data that have been used in other recent studies for the same purpose but separately: multimodal data (in this case MRI and WSI) and time-varying data (MRI before and after NAC).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The work’s weak points are:

    • The description of a clear objective for the work is lacking. The main objective is to determine the pCR directly from a pre-treatment MRI image?
    • There is a lack of information about the data (which is very important and decisive in interpreting the results).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • In the abstract the authors mentioned: “Our code is available at XXX”.
    • The submission does not mention open access to data

    Although there are several unknown layers in the model described in the paper, with the availability of the code it is my understanding that the model will be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I really enjoyed reading the paper, it’s a hot topic, it brings the novelty of combining several variables, it’s clear and well structured. However, there are a few points that need to be improved/answered:

    • What is the main objective of this work? Why is this work important in the midst of so many others like it? The information on a pCR goes beyond “an independent predictor for improved outcomes and longer survival”. It can influence the therapy itself.
    • What is the distribution of breast cancer types in the dataset? This information is important because depending on the type, the response is different. Therefore, caution should be exercised when interpreting results “blinded” to the type of cancer. There may even be a bias.
    • In page 4: “where fpost, fpost share similar”. Perhaps an extra fpost?
    • In section 3.2: how are the regions “automatically segmented”?
    • In 4.2, comparison to other methods: Were these models trained on the exact same data?
    • There were 579 patients, i.e. 579 data sets (pre and post MRI, WSI and label)?
    • Which programming language was used?
    • The manufacturers of the MRI scanners were the same? Another important information.
    • Has there been any pre-processing of the data beyond the dimension?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the weaknesses I’ve mentioned, in my opinion it’s a work that tackles a hot topic, brings the novelty of combining several variables (multimodal and multitime), is clear and well-structured. If the authors answer my questions, it could be a candidate for acceptance to MICCAI 2024.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank to all reviewers for the valuable feedback. We elaborate on their comments below. 1.Typos and errors (R1, R3 & R4) Re: The manuscript has undergone thorough examination and revision to increase the language’s clarity [ f_pre and f_post in page 4, acronyms in the article (MRI & VQA), L_cont in the Fig.1, improper words, and sentence revise]. Also, we have reformatted the equations. 2.Missing data details (R1, R3 & R4) Re: We have provided patients information [cancer molecular subtypes (HER2+, HR+/HER2-, and triple-negative), age, cTNM stage] in each dataset. MRI scanners’ manufacturers are also involved (Cohort A-Philips 1.5T Achieva and 3.0 T Ingenia, GE 1.5T Optima MR360, Siemens 3.0 T Verio; Cohort B-GE 3.0T SIGNA Pioneer). For enrolled 579 patients, each patient includes MRI before and after treatment, WSI biopsy and pCR label. The number of pCR / non-pCR cases in training and validation are available. The details of pCR labels and interobserver variability are provided in the supplementary files. 3.Data preprocessing (R1) Re: Beyond dimension, we have done N4 bias field correction, resampling, histogram normalization and z-score on MRI data. Further, tissue regions in WSI are automatically segmented by OTSU’s method following Faisal et al. 4.Network and Training Info (R1, R3 & R4) Re: The Network about bilinear pooling or attention scores for WSI features are elaborated. All of the experiments were conducted on the same training/validation split under python language. The numbers in the tables are average results based on five runs and SDs are calculated. 5.Work objective (R1) Re: Patients achieving pCR could benefit from breast-conserving surgery, even omitting surgery instead of breast mastectomy. Accurate assessment of pCR before surgery is essential for tailoring surgery plans and could select patients with good prognosis in advance, which is an urgent need. However, the gold standard of pCR depends on the pathological results of surgical specimens. So the main objective of this work is to predict pCR based on pre-surgery data and assist guiding surgery strategy. Proposed M2Fusion to handle multi-time multimodal data is unique to this work, adding a fresh perspective to the field. 6.Comparisons (R3) Re: HMCAT (Li et al.) in Table 2 is an attention-based fusion methods, which is not superior than M2Fusion. Earlier, we also compare another attention-based model (Zhou et al.), whose results are 0.7037 and 0.6949, lower than M2Fusion. For other works in pCR prediction, MLDRL (Yue et al.) and Concat (Shah et al.) are conducted. The former did not converge (NaN loss) and the latter is shown in Table 2. MoCo and SimCLR are based on self-supervised learning. However, we hope to end-to-end predict pCR. Thanks a lot for R3’s brilliant idea, we will explore it in our future study. 7.Evaluation (R3 & R4) Re: Due to page limitation, we did not show ROC curves in the article. We will supplement them. AUCs are compared by Delong’s test. We also show Grad-CAM heatmap on pre and post MRI to confirm focused regions. 8.Insufficient discussion (R4) Re: Compared to other multimodal fusion methods and single modality methods, M2Fusion yields statistically significant improvement (Delong’s test) for pCR prediction with an AUC of 0.7346 in the internal validation set and 0.7992 in the external test set. Besides, FPVs are 0.2941 and 0.2759 respectively, lower than other methods. We have carefully check incorrect prediction cases. Some non-pCR cases are indeed identified as pCR. Breast cancer experts were consulted. They admitted that these are truly tough to identify. After NAC, tumors show regression and residual tiny tumors are hard to detect on MRI. Also, the surrounding tissue of the tumor may undergo significant changes after NAC, such as fibrosis, or necrosis, which may mask or mimic residual tumor tissue, making detection harder. We also have revised the conclusion and provided insight that what is given in Sec.4.2.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Comprehensive rebuttal and sufficient changes to be implemented, addressing questions of segmentation approach, heterogeneity in dataset, cohort splits for training/validation. Additional quant results (sens/spec), stat analysis should also be included in final manuscript.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Comprehensive rebuttal and sufficient changes to be implemented, addressing questions of segmentation approach, heterogeneity in dataset, cohort splits for training/validation. Additional quant results (sens/spec), stat analysis should also be included in final manuscript.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have provided a clear rebuttal addressing the concerns raised. There is a strong consensus among the reviewers, so I recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have provided a clear rebuttal addressing the concerns raised. There is a strong consensus among the reviewers, so I recommend acceptance.



back to top