Abstract

In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder which is composed of two branches, modulated deform block and vision Mamba block. We also design a multi-view context module in the bottleneck layer to explore the multi-view contextual content. Thanks to the extracted features of the encoder, which include content-adaptive local and efficient global information, the vision Mamba decoder finally generates high-quality MR images. Moreover, we introduce a contrastive edge loss to promote the reconstruction of edge and contrast related content. Quantitative and qualitative experimental results indicate that our approach on IXI and fastMRI datasets achieves competitive performance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3762_paper.pdf

SharedIt Link: https://rdcu.be/dV5C4

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_24

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3762_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Ji_DeformMamba_MICCAI2024,
        author = { Ji, Zexin and Zou, Beiji and Kui, Xiaoyan and Vera, Pierre and Ruan, Su},
        title = { { Deform-Mamba Network for MRI Super-Resolution } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {242 -- 252}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper develops a Deform-Mamba network for MRI super-resolution. The network is composed of a modulated deform block and a vision Mamba block. Besides, a multi-view context module and the contrastive edge loss are also proposed to improve the reconstruction results. Experimental results indicate that the proposed method achieves competitive performance on IXI and fastMRI datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Vision Mamba Block is firstly introduced in MRI SR. Vision Mamba model has advantages in modeling long-range dependencies effectively with linear complexity. This is an attempt to apply the Vision Mamba model to MRI SR.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1)The overall performance of the proposed model is not competitive compared with existing SOTA methods. The performance on the Fastmri dataset seems well, but even worse on the IXI dataset. For the result of IXI x2, the PSNR performance is 1.01dB lower than HAT. As the author explained “Our method using Mamba with linear computational complexity takes less time to train as fewer parameters used in our architecture.”But they didn’t show the running time or computational complexity of their model compared with HAT. So the current results do not prove that this is a good approach for MRI SR. (2)As shown in Table 1, the proposed MVC module and CELoss provide little performance improvement.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1)I wonder to know why the model performs well on fastMRI, while even worse on IXI. Maybe you should discuss this. (2)The comparison of the running time or computational complexity of their model compared with existing SOTA is necessary.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is technically possible, but the experiments are not enough. The overall performance is not competitive compared with existing SOTA methods. The main advantage of the Mamba is its high efficiency, but the author didn’t show that in their experiments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors’ response has solved some of my concerns. However, the performance of the model on IXI dataset is not competitive. My final decision is weak accept.



Review #2

  • Please describe the contribution of the paper

    The paper introduces the Deform-Mamba Network, an innovative architecture for enhancing the resolution of MRI images. The novelty lies in combining deformable convolution capabilities with the Mamba model to capture both local and global features effectively. The introduction of a multi-view context module and a contrastive edge loss further contributes to its distinctiveness by focusing on edge and contrast enhancements in the reconstructed images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The integration of modulated deform blocks and vision Mamba blocks is a unique approach that harnesses both local adaptiveness and global information extraction, setting it apart from traditional CNN and Transformer-based methods.
    2. The paper successfully leverages multi-view contextual information at the bottleneck layer, enhancing the model’s capability to interpret complex MRI data.
    3. The experimental setup, including tests on IXI and fastMRI datasets, is robust, demonstrating the model’s effectiveness through competitive performance metrics compared to existing methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The discussion on how the proposed method distinctly outperforms existing Transformer-based models needs more depth. Specific scenarios where Deform-Mamba excels could be highlighted more explicitly.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Consider expanding on the clinical implications of your findings, possibly including feedback from radiologists or incorporating a pilot clinical study.
    2. Enhance the comparative analysis by discussing specific cases or data types where Deform-Mamba provides clear advantages over other high-performance models.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The decision stems from the paper’s innovative methodological contributions and strong experimental validation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    A Deform-Mamba encoder has been developed, consisting of two distinct branches: the modulated deform block and the vision Mamba block. In addition, a multi-view context module is integrated into the bottleneck layer to effectively analyze multi-view contextual content. Leveraging the capabilities of this encoder, which adaptively processes local content and efficiently manages global information, the vision Mamba decoder is able to generate high-quality MR images. Furthermore, a contrastive edge loss has been introduced to enhance the reconstruction of edge-related and contrast-specific content.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    2) Incorporates a modulated deform block and a vision Mamba block, enhancing the encoder’s ability to handle complex imaging data effectively. 3) Positioned within the bottleneck layer, this module facilitates the exploration and integration of multi-view contextual content, enriching the encoder’s perceptual field. 1) Contrastive Edge Loss, this loss function specifically aims to improve the reconstruction of edges and contrast, crucial for achieving higher fidelity in MR images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This study introduces the novel Mamba model, specifically designed for MR image reconstruction, which represents an interesting advancement. However, there are areas of concern that require further elaboration to solidify the contributions of this work. Firstly, regarding the contrastive edge loss, although the authors assert that this loss enhances edge texture and contrast in MR images, the analysis provided (particularly of E_i) is insufficient to demonstrate its effectiveness fully. More detailed explanations and supportive data are necessary to validate this claim. Secondly, the experimental validation appears limited, with a scant presentation of quantitative results, especially at the X4 upsampling multiplicity. Expanding the range of experiments to include various upsampling factors would provide a more comprehensive assessment of the model’s performance and better demonstrate the effectiveness of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The details are as mentioned above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work introduces the Mamba model to the field of MR imaging, marking a significant step in evaluating the model’s validity within the community. Additionally, the integration of multi-branching and global perception modules in the proposed model enhances its ability to capture global information, potentially improving the overall effectiveness of MR image reconstruction. These features contribute to the model’s robustness and its capacity to handle complex imaging tasks.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ feedback and have addressed their comments to improve our paper. As reviewers pointed out, our Deform-Mamba network is an innovative architecture that introduced the Mamba vision system to SR for the first time. We hope our revisions address their concerns and kindly ask a re-evaluation of our score. R#1: Thanks for your positive comments. We take your suggestion on how the proposed method outperforms transformer-based models into account in final version. Compared to Transformer-based methods, our approach is suitable for both small and large datasets. Transformers usually require large datasets, risking overfitting on smaller ones. Our method shares parameters between time steps, learning from less data and preventing overfitting. For large datasets, Transformers need substantial computational power and struggle with ultra-large datasets, while our method is more efficient, requiring fewer computational resources. Regarding your comment about extending clinical implications of our results, we plan to test our method on more clinical datasets with other types of imaging, such as PET and CT. We have close collaboration with radiologists and plan to conduct studies with our hospital partners, some of whom are authors of this article. We add this point in our conclusion. R#3: 1)Thanks for your positive comments. As you point out, vision Mamba model has advantages in modeling long-range dependencies effectively with linear complexity. Apologies for not highlighting this advantage in our paper. Mamba’s hardware-aware algorithm processes data linearly with sequence length, significantly boosting computational speed. The self-attention in Transformers has a computational complexity of O(n²d), while Mamba’s complexity is O(nd), where n is the sequence length and d is the token embedding dimensionality. [9] has been proven that Mamba has shown superior performance compared to Transformers. In our experiment, we compared our approach to the Transformer-based HAT model in computational complexity and running time. HAT’s cost in Multi-Adds for a 64x64 input is about 5x higher than ours, our method is 2x faster in training, showing superior efficiency. We add this information to our final version. 2)Concerning the proposed MVC module and CELoss, the performance of the pure vision Mamba network has already surpassed most methods. The MVC module at the bottleneck layer can enhance the model’s capability to interpret complex MRI data, while CELoss can enhance the reconstruction of edge-related and contrast specific content(Fig.2). Although each module contributes a little, the combination of different modules resulted in competitive performance. 3)Concerning the two MRI datasets, IXI dataset is used for brain, and fastMRI for knee. Knee images feature simple textures and fairly clear contours, with data evenly distributed. Brain images are complex, with indistinct white and gray matter contours both physically and physiologically, occupying 2/3 of the image center. The rest is noise-filled black background. The difference in the average values of these two zones is very large, thus slightly disrupting Mamba’s attention mechanism. We will add this discussion in final version. R#4: Thanks for your positive comments. For your two concerns, we clarify point by point. First, regarding the CELoss, we designed three convolution kernels to extract features where E1 enhances horizontal and vertical edges via neighborhood differences, E2 targets diagonal edges through diagonal differential calculations, and E3 boosts local contrast by comparing the central pixel with all neighborhoods. In this way, the CELoss enhances edge texture and contrast in MR images. More details are added in final version. Second, regarding experimental validation, as we cannot perform further experiments, we invite you to view our supplementary material in the first submission in which quantitative results on 4x and 2x upsampling are provided.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper develop deform-mamba for MRI super-resolution. The three reviewers agree to (weak) accept this paper. I agree to accept it, too. However, I’m not sure if the proposed mamba indeed outperform the transformer-based or CNN-based SOTA methods or not. Maybe it takes time to draw a conclusion.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper develop deform-mamba for MRI super-resolution. The three reviewers agree to (weak) accept this paper. I agree to accept it, too. However, I’m not sure if the proposed mamba indeed outperform the transformer-based or CNN-based SOTA methods or not. Maybe it takes time to draw a conclusion.



back to top