Abstract

Magnetic Resonance (MR) imaging plays a vital role in clinical diagnostics and treatment planning, with the accurate segmentation of MR images being of paramount importance. Vision transformers have demonstrated remarkable success in medical image segmentation; however, they fall short in capturing the local context. While images of larger sizes provide broad contextual information, such as shape and texture, training deep learning models on such large images demands additional computational resources. To overcome these challenges, we introduce a shallow attention feature aggregation (SAFA) module to progressively enhance features’ local context and filter out redundant features. Moreover, we use feature interactions in a resolution expansion guidance (REG) module to leverage the wide contextual information from the images at higher resolution, ensuring adequate exploitation of small class features, leading to a more accurate segmentation without a significant increase in FLOPs. The model is evaluated on two dynamic MR datasets for speech and cardiac cases. The proposed model outperforms other state-of-the-art methods. The codes are available at https://github.com/Yhe9718/SANGRE.



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3565_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/Yhe9718/SANGRE

Link to the Dataset(s)

ACDC dataset is available at https://www.creatis.insa-lyon.fr/Challenge/acdc/ Speech MRI dataset is available at https://zenodo.org/records/7595164

BibTex

@InProceedings{He_SANGRE_MICCAI2024,
        author = { He, Ying and Miquel, Marc E. and Zhang, Qianni},
        title = { { SANGRE: a Shallow Attention Network Guided by Resolution Expansion for MR Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a shallow attention feature aggregation module to enhance local context and filter redundant features. It also introduces resolution expansion guidance module to enhances the feature map.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This article proposes a hybrid structure combining Transformer and CNN for more accurate medical image segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The manuscript is suspected of leaking the author’s institution, which should absolutely not be allowed. 2) The proposed shallow attention mechanism combined with transformer does not demonstrate superior innovation. 3) Why does the absence of SAFA in Table 2 have no impact on the parameter size?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    NA

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Thanks the authors for their contribution to improving medical image segmentation. However, the apparent failure to meet anonymity requirements and the lack of novelty make it difficult for me to accept the manuscript.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Authors propose SANGRE, a new semantic segmentation architecture. SANGRE essentially uses up-sampled and original image to extract both local and global information via newly designed modules (SAFA and REG). SAFA uses element-wise multiplication and concatenation to capture additional local features and REG uses upsampled images and low-res segmentation for better segmentation performance. Authors validate the method on two segmentation tasks, outperforming baseline methods from literature and simpler architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Accuracy: SANGRE’s main strength is that the leverage of the high-res images leading to a more accurate segmentation without significant increase in FLOPs.
    • Extendability: The newly designed modules can be directly plugged in to transformer structure easily with minor changes to the model architecture.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited comparison to state-of-the-art: Authors compare to PVT-GCASCADE, but not to MERIT-GCASCADE, which mentioned in the same paper and has better performance on ACDC dataset. Also, nnUNet is another benchmark deserves to be compared.
    • Limited novelty: the idea of SAFA is similar as the gn conv mentioned in HorNet with some modification
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Clarity: the paper is clear and easy to read
    • Need more validation on small object: the segmentation improvement on RV of ACDC dataset might not be regarded as the better performance on small object segmentation, extra validation might be needed for convincing.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results of the new architecture outperforms the SOTA but the architecture is not so novel and the

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision
    1. The authors have clarify the standard they select the SOTA and I think their explanation is fair.



Review #3

  • Please describe the contribution of the paper
    1. This paper proposes a shallow attention feature aggregation (SAFA) module to enhance features’ local context and filter out redundant features.
    2. A feature interactions in a resolution expansion guidance (REG) module is introduced to leverage the wide contextual information from the images at higher resolution, ensuring adequate exploitation of small class features, leading to a more accurate segmentation.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a Shallow Attention Network Guided by Resolution Expansion(SAN-GRE). The main strength is that this paper designs a method targeting medical image segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Dataset selection There are a few open source medical image segmentation datasets, such as the Synapse multi organic segmentation dataset and the brain tuber segmentation dataset from the the Decathlon challenge. It is recommended that the author add experiments on these datasets.
    2. Some state of art methods, such as “Multi scale hierarchical vision transformer with cascaded attention coding for medical image segmentation”, were missed in the experimental comparison
    3. The code is not open.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.This paper lacks the state-of-art model in Comparison of the performance, such as “Multi scale hierarchical vision transformer with cascaded attention coding for medical image segmentation”, etc.

    1. The paper lacks the analysis of suboptimal performance.
    2. Source code is not open, and not provided in supporting materials.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. This paper proposes a Shallow Attention Network Guided by Resolution Expansion(SAN-GRE).
    2. This paper neglects the key state-of-art method.
    3. Experiment neglects the important dataset.
    4. Source code is not open.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal and the comments from the other reviewers, I decided to keep my score unchanged (Weak Accept).




Author Feedback

Reply to R1 - Anonymity requirement: We would like to clarify that we used the Lecture Notes in Computer Science template. To adhere to the space restriction, we did not make any changes to the default template’s title. The affiliations that appear on the manuscript are not ours but the ones in the template.

Reply to R1 – Model without SAFA size: For the ablation study without the SAFA module, we upsampled the encoded features and concatenated them with the series of features from the higher resolution image after the transpose convolutional layers. The concatenated features were then sent to a convolutional layer before interacting with the high-resolution image. The proposed network without SAFA has 25.868M parameters, while the full model has 25.874M parameters.

Reply to R1 and R4 – Novelty of SAFA: Thank you for raising this concern. We would like to clarify that the inspiration for SAFA stems from our observation that the first layer of the transformer encoder contains the most information about boundaries and texture. To effectively utilize this local information, we introduced the SAFA module, which multiplies features from shallow to deep layers, progressively refining the local context by leveraging features from the shallower layers of the encoder. The g^n Conv module proposed in HorNet has a recursive design with large kernel gated convolutions. The g^n Conv module is used to achieve higher-order interactions, primarily aimed at capturing long-term interactions. In contrast, SAFA focuses on multi-scale feature fusion and features’ local information enhancement. These differences in both methodology and purpose distinguish our approach from that of the g^n Conv module.

Reply to R4 and R5 - Lack of validation and analysis: Thank you for your feedback regarding the need for additional validation @R4 and a thorough analysis of suboptimal performance @R5. We recognize the importance of these aspects in strengthening our research. We will incorporate comprehensive validation and detailed performance analysis in our future work.

Reply to R4 and R5 - Comparison to other SOTA methods: In Table 2, the proposed model is compared with state-of-the-art (SOTA) models that have a similar size of parameters and FLOPs. We greatly appreciate the reviewers’ suggestions regarding the inclusion of MERIT-CASCADE, Parallel-CASCADE, and MERIT-GCASCADE as other SOTA methods on the ACDC dataset. However, it is important to note that these models have significantly higher parameter sizes and FLOPs compared to our proposed network. For example, Parallel-MERIT has 148M parameters and 33G FLOPs, achieving a dice score of 92.32 on the ACDC dataset. Similarly, MERIT-GCASCADE has 132M parameters and 25G FLOPs, with a dice score of 92.23 on the ACDC dataset. Due to these substantial differences in parameter sizes and FLOPs, we did not include their performance in the model comparison. Moreover, our proposed network achieves a dice coefficient of 92.29, which is comparable to the aforementioned SOTA methods, while utilizing 82.52% and 80.40% fewer parameters than Parallel-MERIT and MERIT-GCASCADE, respectively. Additionally, our model requires 83.39% and 78.08% less FLOPs than Parallel-MERIT and MERIT-GCASCADE, respectively.

Reply to R4 and R5 - Data access and data selection: The data used in the paper is publicly available. The cardiac dataset is the well-known ACDC dataset [20] that has been used for a previous MICCAI challenge, while the speech dataset was recently published and is the only one to include both speech MRI and ground truth segmentation [18,19]. Those data sets were selected as: 1) Our extended group has an interest in speech imaging; 2) We wanted to validate on different types of dynamic MRI and cardiac MRI segmentation is also clinically relevant. We appreciate R5’s recommendation of additional datasets and will explore them in our future work.

Reproducibility: We will make the code available in the final version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    accepts

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    accepts



back to top