
Many multi-modal tumor segmentation methods have been proposed to localize diseased areas from the brain images, facilitating the intelligence of diagnosis. However, existing studies commonly ignore the relationship between multiple categories in brain tumor segmentation, leading to irrational tumor area distribution in the predictive results. To address this issue, this work proposes a Multi-category Region-guided Graph Reasoning Network, which models the dependency between multiple categories using a Multi-category Interaction Module (TMIM), thus enabling more accurate subregion localization of brain tumors. To improve the recognition of tumors’ blurred boundaries, a Region-guided Reasoning Module is also incorporated into the network, which captures semantic relationships between regions and contours via graph reasoning. In addition, we introduce a shared cross-attention encoder in the feature extraction stage to facilitate the comprehensive utilization of multi-modal information. Experimental results on the BraTS2019 and BraTS2020 datasets demonstrate that our method outperforms the current state-of-the-art methods.

    Authors propose to model the dependency between multiple tumour categories to enhance the segmentation of tumour subregions. Also, to further refine tumour contours, their model captures semantic relationships between regions and contours via a graph learning branch to emphasize these relationships. The proposed approach was validated on the BRATS datasets of 2019 and 2020, demonstrating superiority w.r.t. other state of the art methods.

    • Authors present a clear motivation for including the projection to the domain of graphs to learn relations and preserve structure.
    • Technical novelty: The provessive integration of contours and region information in several layers of the methodology to exploit its relationships.
    • Strong evaluation: The comparison with other state-of-the-art approaches and the ablation study allow to assess the strengths of the proposed approach in the tumour segmentation task.
    • Lack of details in the reprojection of graph features: authors did not mention how they projected the graph features back into image space (final segmentation).
    The submission does not provide sufficient information for reproducibility.

    The authors should make the code for their proposed method available to facilitate easier replication of their results.

    • It would be pertinent to explain why the fusion of contour and tumour regions information contributes to a decrease in the scores for WT, and to generally highlight the advantages of utilizing graphs neural networks for learning relationships over other representations like covariances.
    • Furthermore, the manuscript would benefit from carrying out comparisons with other models that explicitly exploit relations such as GCNs.
    • Page 8, Table 3 contains a typographical error: “Prposed”
    • The pipeline contains a typographical error: “Cross Attetention”
    Weak Accept — could be accepted, dependent on rebuttal (4)

    The progressive fusion of contours and tumour information seems like a promising way to refine the tumour limits. The results show that the proposed approach is superior to other approaches, and they assesed the contribution of each of the proposed components.

    Confident but not absolutely certain (3)

    The paper proposes a multi-modal and multi-category guided Graph Reasoning Network as a step towards robust contour-aware brain tumor segmentation. The authors conduct experiments on BRATs adult tumor dataset and compare their method against 5 different SOTA methods.

    -The paper emphasizes a significant clinical challenge of contour delineation of tumor regions. -Ablation studies as well as solid baseline comparison add to the technical rigor of the paper -The paper is well written, with a brief but concise description of each component proposed by the authors.

    Major comments: -The BRATS 2020 dataset likely contains some new cases and updates compared to the BRATS 2019 release, but they cover overlapping but distinct patient cases and imaging data. Therefore, comparing the methods of both datasets might yield similar results. -Reproducibility concern: the paper might be technically challenging to reproduce due to insufficient details.

    Very Good

    The submission does not provide sufficient information for reproducibility.

    It would be great if the authors clarified more information about training (how long models were trained for, and how many epochs?) Is the method proposed in the paper more computationally heavy than a standard UNet? It would be beneficial if authors considered sharing GitHub repository, although that is not a requirement.

    Detailed comments:
    -Please improve the clarity of the methodology section; in particular, please define what BCF and CRF in formulas (5) and (6). -Please clarify the abbreviations used in Figure 1 in figure legend. I suggest removing “A” since figure does not contain subfigures -It is worth noting that the average score of our proposed method is the highest among all methods. -> please correct to “average dice score”
    -Please conduct statistical significance testing to back up the following claim in section 3.2 “Although the Dice score in the WT region slightly trails behind Nestedformer and Eoformer, it notably surpasses other methods by a significant margin.”

    Future directions (NOTE: Authors do not need to address this for the paper revision, use this as suggestions for journal extension):

    • It would be interesting to see how the model will behave on small tumor regions. -Are there any correlations or degradation/improvement of Dice/HD95 that authors observe based on MRI image quality? -What about nnUnet comparison? Its an important SOTA (https://www.nature.com/articles/s41592-020-01008-z)
    Weak Accept — could be accepted, dependent on rebuttal (4)

    Overall, the paper presents a promising approach for robust contour-aware brain tumor segmentation. The technical rigor, including the ablation studies and baseline comparisons, is commendable. With the suggested improvements to address the reproducibility concerns and enhance the clarity of the methodology, I believe the paper can be accepted for publication.

    Somewhat confident (2)

    This work proposes a brain tumor segmentation method by developing a region-guided graph reasoning network. The method generates two segmentation branches: one focusing on tumor regions and another based on their contours, propagating knowledge from region to contour to further improve the overall segmentation. Both representations of contour and region segmentations are then fed into a graph convolution network attached with Transformer attention.

    • The paper is well written.
    • There is a technical novelty including region-based graph reasoning network.
    • Comparison against recent works.
    • Some parts of methodology need more explanations.
    • Incomplete discussion on limitations.
    Very Good

    The submission does not provide sufficient information for reproducibility.

    1. Authors are encouraged to include their code implementation and add a link in the manuscript.
    2. In Figure 1, in the Multi-category Region-guided Graph Reasoning block, there are two graph branches. Is the top one exclusively predicting the contours, while the bottom one predicts the tumor regions? If so, how do contour predictions assist in improving overall brain tumor segmentation? More explanations are needed. Additionally, what about propagating knowledge from contour to region? Could this improve the overall segmentation performance?
    3. It is mentioned that “This helps reduce the computational cost.” How does the cross-attention encoder reduce computational cost?
    4. More details about G are needed. Try to expand Eq. (1) and describe it mathematically if possible. The part of the Graph Convolution Network (GCN) lacks details. Also, add the definition of the abbreviation "GCN" on page 5.
    5. In Eq. (4), where is G in Figure 1? It would be helpful if symbols used in this equation were added to the figure.
    6. Is the proposed framework trained end-to-end? Please state that clearly.
    7. What is the difference between L and L' in Eq.(7)? The same clarification is needed for L.
    8. How are the contour ground-truth prepared?
    9. In Figure 2, add prediction results of the highest competitors Nestedformer and Eoformer and include the Dice scores for all predictions superimposed over images.
    10. Sigma2 in Eq. (2) is not defined; probably it is for sigmoid.
    11. In the Abstract, state “Transformer” in the Multi-category Interaction Module (TMIM).
    12. In Figure 1, “Flair” should be “FLAIR” in the input data. Also, caption of Figure 1: Remove “A.”
    13. Table 3: There is a typo. “Prposed” should be “Proposed.”
    Accept — should be accepted, independent of rebuttal (5)

    • Technical novelty
    • Writing skills and paper presentation.
    • Application.
    Very confident (4)

