Abstract

In many diseases including head and neck squamous cell carcinoma (HNSCC), pathologic processes are not limited to a single region of interest, but instead encompass surrounding anatomical structures and organs outside of the tumor. To model information from organs-at-risk (OARs) as well as from the primary tumor, we present a Hierarchical Multi-Organ Graph Network (HoG-Net) for medical image modeling which we leverage to predict locoregional tumor recurrence (LR) for HNSCC patients. HoG-Net is able to model local features from individual OARs and then constructs a holistic global representation of interactions between features from multiple OARs in a single image. HoG-Net’s prediction of LR for HNSCC patients is evaluated in a largest yet studied dataset of N=2,741 patients from six institutions, and outperforms several previously published baselines. Further, HoG-Net allows insights into which OARs are significant in predicting LR, providing specific OAR-level interpretability rather than the coarse patch-level interpretability provided by other methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3878_paper.pdf

SharedIt Link: https://rdcu.be/dV17J

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_30

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3878_supp.pdf

Link to the Code Repository

https://github.com/bmi-imaginelab/HoGNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Bae_HoGNet_MICCAI2024,
        author = { Bae, Joseph and Kapse, Saarthak and Zhou, Lei and Mani, Kartik and Prasanna, Prateek},
        title = { { HoG-Net: Hierarchical Multi-Organ Graph Network for Head and Neck Cancer Recurrence Prediction from CT Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {317 -- 327}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose the use of GNNs to predict cancer recurrence in H&N. The model has two levels, one for intra-organ modelling based on voxel intensities and positions, and a second patient level, where a second graph is built from all organs. The information is then processed via graph attention blocks, global average pooling and a final linear layer to perform the binary classification (specified at the 2-years mark). The study was evaluated on three datasets, namely, the publically available datasets RADCURE, HNPET and HN1. The approach was compared two other previous approaches and one reimplementation of radiomics-based method. The authors also built two useful DenseNet baselines for OAR+Tumor and full image. The authors also report results with and without clinical information.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written. The topic is very relevant and the approach is well explained. Illustrations and presentation of results is very good as well. The approach is interesting and builds on solid subcomponents.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My main concern with the paper is the lack of missing literature and benchmarking. Below some papers that were not listed:

    This paper also used GCN but no hierarchical approach: Kazmierski, M. and Haibe-Kains, B., 2021. Lymph node graph neural networks for cancer metastasis prediction. arXiv preprint arXiv:2106.01711.

    This recent one seems to be very close to this paper (also featuring a large dataset and outperforming results presented here for two of the datasets): https://www.redjournal.org/article/S0360-3016(24)00198-6/fulltext Quoting: “Graph radiomics with clinical features resulted in AUCs of 0.834 and 0.806 for D1 and D2, respectively. Traditional radiomics with clinical features resulted in AUCs of 0.819 and 0.784 compared to clinical features alone achieving AUCs of 0.808 and 0.784.”

    This last paper also uses a GNN building on supervoxels, instead of 26-voxel blocks as proposed here. Results presents in that study also seem better than the ones presented in this submission, while the approach is much simpler, which raises some concerns about the usefulness and superiority of the method. Authors criticize the biological soundness of using supervoxels, but I have difficulties to believe this is ultimately an issue.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I liked very much this article. My main concern is novelty and performance gains with respect to some missing works. In terms of novelty, GNNs have been used for H&N recurrence prediction. The difference seems to be the organ-specific sub graphs then merged into a single one for final inference. unfortunately, the results do not seem to be better than those reported by a simpler approach using supermodels and GNNs (to be noted, also on the same datasets).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Doubtful superiority of the approach to simpler version using supermodels and GNNs

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have clarified my questions. I changed my score accordingly.



Review #2

  • Please describe the contribution of the paper

    In this paper the authors propose a hierarchical graph based approach (HoG-Net) to holistically model Organs at Risk (OAR) and tumor for recurrence prediction of HNSCC tumors.

    Firstly, the model extracts local imaging features from multiple OARs from each patient and encodes each into a graph representation. Then a patient-level SuperGraph is created, encoding global relationships between different OARs of a single patient. Evaluation was performed on a large dataset for HNSCC locoregional recurrence prediction (N=2,741) and using CT scans.

    The model uses graph convolution and graph attention mechanisms in order to extract local imaging features from multiple OARs from each patient and encodes each into a graph representation. Results with different publicly available datasets outperform state of the art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of establishing relationships with multiple organs and local recurrence is crucial to understand the underlying mechanisms involved in the treatment failure. This may help in the planification of more personalized treatments taking into account different organs. Indeed, the possibility to unveil local changes in specific OARs as a result of HNSCC pathological development and the global interactions between primary tumors and their anatomical environments.

    The Graphs analysis seems to be a suited methodology for the pathology understanding.

    The high number of patient data used (N=2,741) and the proposed methodololgy which outperforms several previously published state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think that if the intention was to offer the possibility to look for local changes in specific OARs, a final graph establishing relationships across all the OAR should have been proposed. THe authors limited the final super graph analysis to the the global interactions between only the primary tumors and the respective OARs.

    A final graph showing all the interactions could have been proposed and enrich the conclusions about the Regions that are appearing as related with the primaru tumours. i.e. if the contiguous structures of the retropharynx, larynx, and esophagus, etc appear as important for the model attention, what is the relation between them across all the patients?

    The imporvement compared with the litterature is slighlty better, but what is surprising is the contribution of clinical data to the prediction. Which suggests that information conveyed by clinical assessment is crucial in the prediction and the importance of image stay roughly the same?

    The first part of the methodology (extracting features from images) do not need a graph. A node represents a voxel, and GCN extracts features from the images. But what is the real difference between a clasical CNN and the obtention of final features to be fed into a supergraph?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Provided that the code and test data are released, the paper seems to be easily reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    A final graph establishing relationships across all the OAR could improve the importance of this paper. THe simultaneaous activation of different OAR may bring some improvements to the prediction. This improvement should be easy to implement as the super graph represents each one of the OAR.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the contribution is important and may help to unveil the complex mechanisms in Head and Neck cancer, a final push to explain the results and compare with different methods in the first part may largely improve the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    not convinced by the rebuttal



Review #3

  • Please describe the contribution of the paper

    Authors propose HoG-Net which is a novel graph-based approach for holistically modeling local and global imaging features from OARs and the primary tumor. It addresses limitations of previous methods by flexibly handling varying patient anatomies and providing OAR-level interpretability. HoG-Net showed promising performance on a large multi-institutional dataset for HNSCC recurrence prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty: The paper proposes a novel hierarchical graph neural network called HoG-Net to model both local and global features from OARs and the primary tumor for predicting locoregional tumor recurrence in head and neck cancer patients. This approach is unique and addresses the limitations of previous methods that primarily focused on the primary tumor region. Flexibility: The OARenc module in HoG-Net can handle varying sizes and numbers of OARs across patients, which is a challenging issue in medical image analysis. Interpretability: HoG-Net provides interpretability by allowing the identification of significant OARs that contribute to the prediction of tumor recurrence, which could be valuable for clinical applications. Large-scale Evaluation: The approach is evaluated on a large dataset of 2,741 patients from multiple institutions, which is the largest study to date for this task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    OAR Segmentation Dependency: The performance of HoG-Net relies on accurate OAR segmentations, which were obtained using a pre-trained nnUNet model in this study. The availability and quality of OAR segmentations could be a potential limitation in practical applications. Also, the performance of HoG-Net relies on representation of the OARs which is represented by coordination of the OAR segmentation. However, the coordination of the OAR segmentation is affected by many factors like image resolution, MRI Field of View (FOV), and different patients etc. There may be limitations to the stability of model performance. Computational Complexity: The hierarchical graph structure and the use of multiple graph convolution and attention layers may lead to increased computational complexity, which could be a concern for real-time applications or resource-constrained settings. The size of OARs may affect model performance and computational resources. Lack of Comparison with Other Graph-based Approaches: The paper compares HoG-Net with CNN-based and radiomic approaches but does not include comparisons with other graph-based methods for medical image analysis, which could provide additional context for the performance evaluation. No statistical evaluation of the results: lack of the statistic comparison of performance between the proposed method and previous studies. Lack of clarity: No detailed information was available for the graph construction of the primary tumor.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    OAR Segmentation Dependency: The performance of HoG-Net relies on accurate OAR segmentations, which were obtained using a pre-trained nnUNet model in this study. The availability and quality of OAR segmentations could be a potential limitation in practical applications. Also, the performance of HoG-Net relies on representation of the OARs which is represented by coordination of the OAR segmentation. However, the coordination of the OAR segmentation is affected by many factors like image resolution, MRI Field of View (FOV), and different patients etc. There may be limitations to the stability of model performance. Computational Complexity: The hierarchical graph structure and the use of multiple graph convolution and attention layers may lead to increased computational complexity, which could be a concern for real-time applications or resource-constrained settings. The size of OARs may affect model performance and computational resources. Lack of Comparison with Other Graph-based Approaches: The paper compares HoG-Net with CNN-based and radiomic approaches but does not include comparisons with other graph-based methods for medical image analysis, which could provide additional context for the performance evaluation. No statistical evaluation of the results: lack of the statistic comparison of performance between the proposed method and previous studies. Lack of clarity: No detailed information was available for the graph construction of the primary tumor.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Extend the binary classification task to time-to-event analysis or multi-class classification to capture the full progression of the disease, as mentioned in the paper. Explore incorporating additional OARs and modeling the interactions between OARs, and add more connections between OARs and the primary tumor like the related location or progress. Investigate the impact of OAR segmentation quality on the model’s performance and explore ways to make the approach more robust to segmentation errors or incorporate uncertainty estimation for OAR segmentations. Explore the applicability of the hierarchical graph network approach to other types of cancers or medical imaging tasks where modeling both local and global features from different anatomical structures could be beneficial. Investigate the computational efficiency of the approach and explore techniques for model compression or acceleration, as the hierarchical graph structure and multiple graph convolution and attention layers could be computationally expensive.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study presents a novel, interpretable, and efficient architecture. The study is also fully validated.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their comments. We appreciate that each agreed HoG-Net addresses a significant clinical problem with a “novel, interpretable, and efficient architecture.” Below we address the key critiques.

  1. Comparison with graph-based approaches Kazmierski [1] and Bae [2] (R1, R3)

We thank R1 for mentioning these - they will be included in the updated introduction. Please note that [1] is a 2021 arXiv preprint, does not provide code to reproduce, and studies a different outcome task. [2] is a conference abstract published after the MICCAI deadline. Neither was peer reviewed as a full paper. We have communicated with the authors of both studies and assert that HoG-Net possesses the following advantages:

a. Holistic modeling of a large set of commonly available OARs (see 5) rather than less common lymph nodes [1] or the supervoxeling method proposed in [2] which requires the number of supervoxels, clustering algorithm, sparsification extent, and radiomic feature selection parameters to be carefully tuned. b. Improved interpretability of specific OARs implicated in model predictions, and is extensible to any task with multiple pre-defined ROIs. [2] provides a coarse level of interpretability to non-specific anatomical regions and found unusually high activations in the brain, which should not be implicated in HNSCC.
c. Validated on a held out cohort of 395 patients from datasets different from the training set compared to 0 [1] and 121 [2] patients, reducing overfitting concerns. On patients from the independent test set, HoG-Net slightly outperforms [2].

To the best of our knowledge, no other graph-based approaches have been proposed for radiotherapy imaging analysis.

  1. Question regarding inter-OAR graph relationships (R2, R3)

The reviewer is correct that SuperGraph can be easily created to model inter-OAR attention. Doing so, we found that model performance was slightly diminished, and attention trends mirrored those presented in the original submission. Due to MICCAI guidelines, we cannot show these additional results but can provide them in the supplementary if the AC/reviewer requests. Because these OAR-OAR interactions do not have a well-defined pathological motivation, we leave further exploration of this interesting direction to future work.

  1. Importance of clinical data to results (R2)

As observed by R2 and corroborated by previous work [3,4], clinical features are significant for this task. Table 1 demonstrates that HoG-Net outperforms all baselines in the clinical+imaging setting, particularly F1 scores (0.651 vs 0.584, 0.582 vs 0.518, 0.658 vs 0.595) for ours vs. the top baseline across HN1, HNPET, and RADCURE.

  1. Vallières et al. “Radiomics strategies…” Scientific Reports 2017
  2. Mateus et al. “Image based…” Scientific Reports 2023

  3. CNN vs GCN feature extraction (R2)

To clarify, we propose OARenc to accommodate arbitrarily large differences in OAR dimensionality from patient to patient within a single batch. This would not be possible with a conventional CNN without cropping or resizing, potentially learning superfluous background features or losing OAR information.

  1. OAR segmentation dependency (R3)

OAR segmentations are acquired during routine radiation treatment planning allowing for easy inference in clinical settings. These structures are not commonly used in analysis of radiation-treated cancers; HoG-Net’s ability to leverage them is a strength and novelty of our work. HoG-Net generalized across data from 6 different institutions, suggesting robustness to OAR segmentation variability.

  1. Computational complexity (R3)

HoG-Net has <100k trainable parameters, comparable to graph methods in the hyperspectral imaging domain and ~80x less than CNN baselines including DenseNet-121 (~8M parameters). Training times on the same GPU are ~6 hours for HoG-Net and ~2 hours for DenseNet-121 for 100 epochs. Inference for HoG-Net is <2 seconds per case, in line with DenseNet-121.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Three reviewers accepted this paper both before and after the rebuttal (one changed from Reject to Accept, one from Accept to Reject), indicating that the paper generally has merit. The main merit seems to lie in the moderately innovative use of graph modeling for OAR in cancer recurrence prediction. However, in the medical field, especially in cancer prognosis, accuracy is paramount. Reviewer #3 was not convinced after the rebuttal, not only because the authors found his suggested experiments ineffective but also (may) due to his concerns about the importance of clinical data to the results. Indeed, the experimental results (Table 1) have issues, and strictly speaking, the innovation proposed in this paper might lack added clinical value. For example, in the Image+Clinical setting, the AUC improvement over Mateus et al. is very limited and might not be statistically significant. Note that in clinical settings, AUC is generally a better indicator of model performance than F1, since F1 depends on different thresholds. Moreover, in the Image Only setting, the AUC of this paper’s results is not obviously higher than DenseNet’s, and on some datasets, it is even lower.

    However, achieving significant clinical progress through technical innovation in medical imaging prediction/prognosis tasks is very difficult. Despite its limitations, this paper explores new approaches and can be recommended for acceptance. However, the interpretation of the results should be approached with caution.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Three reviewers accepted this paper both before and after the rebuttal (one changed from Reject to Accept, one from Accept to Reject), indicating that the paper generally has merit. The main merit seems to lie in the moderately innovative use of graph modeling for OAR in cancer recurrence prediction. However, in the medical field, especially in cancer prognosis, accuracy is paramount. Reviewer #3 was not convinced after the rebuttal, not only because the authors found his suggested experiments ineffective but also (may) due to his concerns about the importance of clinical data to the results. Indeed, the experimental results (Table 1) have issues, and strictly speaking, the innovation proposed in this paper might lack added clinical value. For example, in the Image+Clinical setting, the AUC improvement over Mateus et al. is very limited and might not be statistically significant. Note that in clinical settings, AUC is generally a better indicator of model performance than F1, since F1 depends on different thresholds. Moreover, in the Image Only setting, the AUC of this paper’s results is not obviously higher than DenseNet’s, and on some datasets, it is even lower.

    However, achieving significant clinical progress through technical innovation in medical imaging prediction/prognosis tasks is very difficult. Despite its limitations, this paper explores new approaches and can be recommended for acceptance. However, the interpretation of the results should be approached with caution.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Based on the rebuttal, most of the concerns was carefully answered, but R2 claimed to be not convinced by the rebuttal and changed his/her score from 4 to 3, because the author answered to the advice for additional experiments with full-connected OAR graphs as it cannot improve the results. Due to MICCAI guidelines, the authors can’t provide additional results and the paper can be accepted without it. Moreover, the authors are willing to provide these additional results in the supplementary. Strength: The authors propose a hierarchical graph-based approach (HoG-Net) to holistically model Organs at Risk (OAR) and tumor for recurrence prediction of HNSCC tumors. The idea of establishing relationships with multiple organs and local recurrence is crucial to understand the underlying mechanisms involved in the treatment failure. Weakness: Ablative study on the super graph was missing and raised concerns.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Based on the rebuttal, most of the concerns was carefully answered, but R2 claimed to be not convinced by the rebuttal and changed his/her score from 4 to 3, because the author answered to the advice for additional experiments with full-connected OAR graphs as it cannot improve the results. Due to MICCAI guidelines, the authors can’t provide additional results and the paper can be accepted without it. Moreover, the authors are willing to provide these additional results in the supplementary. Strength: The authors propose a hierarchical graph-based approach (HoG-Net) to holistically model Organs at Risk (OAR) and tumor for recurrence prediction of HNSCC tumors. The idea of establishing relationships with multiple organs and local recurrence is crucial to understand the underlying mechanisms involved in the treatment failure. Weakness: Ablative study on the super graph was missing and raised concerns.



back to top