Abstract

Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2611_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Finegrained_MICCAI2024,
        author = { Yan, Zhongnuo and Yang, Xin and Luo, Mingyuan and Chen, Jiongquan and Chen, Rusi and Liu, Lian and Ni, Dong},
        title = { { Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper propose a novel method for freehand 3D US reconstruction, which used SSM to capture the long term dependency, and a fusion strategy to make the best use of the information from multi-IMU. Finally, online learning is to enlarge the generalisation performance on unseen data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper used SSM module to extract long-term dependency, which has not been used in this field before. The fusion strategy is also a novel idea for incorporating various IMU information with various noise rate. The experiment part is extensive including ablation study and comparison with a number of SOTA.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Although detailed description has been illustrated in the paper, it is better to make notations for formulating the method, such that workflow of the pipeline can be clearly shown. Some techniques lacks of description, which makes it hard to follow.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The definition of FiMA should be illustrated when first introduced
    2. What are coarse-grained and fine-grained? Can these two be quantified. Why the results from this paper refers to as fine-grained while other works refers to coarse-grained?
    3. When acquiring data, is the position of IMU the same as that in Fig. 1?
    4. To my knowledge, some public data set is available in this US reconstruction field and could provide a benchmark for evaluation, why not use that data set?
    5. In Sec. 2.1, formulating the method using notations, together with equations, could help reader to understand better of the workflow of the method.
    6. In Fig. 3, what do the “forward direction” and “backward direction” mean?
    7. In sec. 2.2, the author claim that the fusion model can help find the correct temporal information. Due to the lacking of “underlying true” acceleration/angle information, how to demonstrate the effectiveness of the proposed fusion model. I am also interested in how the acceleration/angle looks like after correction from fusion model.
    8. Sometimes the loss based on translation and rotation is somewhat unstable. I suggest add/use transformed points distance as one of the loss functions, as recommended in many pose regression problem.
    9. Statistic testing is missing.
    10. The authors proposed several novel network structures, such as ReMamba and fusion model. However, the motivation behind this is missing. Why build the network this way rather than others?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper could be more clear if notation is well used. Some technique is not well defined such as forward and backward direction, coarse-grained and fine-grained. The motivation behind the proposed method is missing. The network structure is not well motivated as well.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Although the authors address most of the concerns, this paper is not structured well to be published, and the paper is not permitted to make substantial changes.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a deep learning-based method called FiMA for freehand 3D ultrasound reconstruction. By combining the temporal information of the ultrasound sequence with mining fine-grained spatio-temporal information with multiple IMUs to improve the alignment and reconstructed US volume. In this way, an approach to State Space Model is proposed to mine multi-scale spatio-temporal information, integrated with a proposed adaptive fusion strategy to combine the IMUs’ information, and finally, a new online alignment strategy that utilizes information from IMUs as pseudo-labels is presented. The results demonstrate remarkable improvements over existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    One of the paper’s significant strengths lies in the introduction of FiMA as a pioneering approach for freehand 3D ultrasound reconstruction. FiMA’s innovation stems from its integration of multiple IMUs and its utilization of the SSM approach. By leveraging these advanced techniques, FiMA effectively addresses the longstanding challenge of accurately reconstructing ultrasound volumes. Notably, the results presented in the paper convincingly demonstrate the tangible improvement FiMA brings to freehand ultrasound reconstruction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the paper are:

    1. Unclear clinical celevance: The paper lacks clear motivation regarding the clinical relevance of the developed method, specifically in terms of how freehand ultrasound reconstruction can be utilized in clinical environments. Providing concrete examples or discussing potential clinical applications would strengthen the paper’s impact and relevance to practitioners.
    2. Incomplete methodology description and discussion:The methodology section lacks important information, such as details on the training process, dataset creation, acquisition parameters (e.g., number and positioning of IMUs, ultrasound framerate), and potential sources of error in temporal alignment between ultrasound and IMU data. Additionally, there is a lack of discussion and analysis of these methodological aspects and their impact on results.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would like to extend my congratulations to the authors for their impressive work and significant contribution to the field of freehand ultrasound reconstruction. The paper presents an innovative and promising approach in freehand ultrasound reconstruction, introducing a novel method with promising results. However, there are areas where methodological rigor could be improved. Providing more detailed information on the training process, dataset creation, and acquisition parameters would enhance the reproducibility and transparency of your study. Additionally, clarifying the clinical relevance of your method and discussing potential real-world applications would strengthen the conceptual framework and make it more impactful for clinicians and researchers.

    Regarding the document, I have some suggestions:

    The paper mentions the use of multiple IMUs, but it is important to know how many were used and whether their positioning was studied. Furthermore, clarification is needed on how temporal alignment between ultrasound frames and IMU data was achieved, as well as the error associated with this alignment. When referring to constructing datasets from another source, more information should be provided about the datasets not described in your paper, including the number of data points and any differences from those used in your study. In Figure 5, it would be beneficial to include the ground-truth segmentation and provide information about how the segmentation was performed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an innovative method for freehand 3D ultrasound reconstruction, highlighting its potential to advance the field, but it would benefit from enhanced methodological clarity, stronger emphasis on clinical applicability, and discussion of the results to maximize its impact.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes FiMA which utilizes the state-space model to learning long-range dependencies in the 3D reconstruction of US images. The model combines spatio-temporal multi-modal information through adaptive fusion modules which are then processed by state space models. The method was quantitatively and qualitatively validated on arm and carotid datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written, and the ideas are clearly presented. The topic of freehand 3D US reconstruction is important and timely. The idea of applying state-space model to capture long-range dependencies for the challenging taks of 3D reconstruction from US images is novel and appears effective. Experiments are extensively conducted, and the results show significant improvement over prior methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors should conduct more extensive survey of existing literature. The number of cited papers is only 11 (with recurring first authors). If the authors feel the list is sufficient, e.g., there are indeed only a few related work because this field is under-explored, please provide explanation.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please refer to weaknesses for concerns
    • What does FiMA stand for?
    • In Eq. (2), two different losses (L1 and Pearson) are combined. Shouldn’t there be some hyperparameter coefficient \lambda on one of the functions to balance the losses?
    • I belive each citation should list all authors, not just abbreviated list using et al.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is novel and clearly presented. The paper is well-written with strong experimental results.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers (R) for reviewing and recognizing our work. Motivation and details of our proposed method are clarified thoroughly, and the writing will be improved.

Q: FiMA (R1, R3) A: FiMA stands for Fine-grained Context and Multi-modal Alignment.

Q: Fine-grained and coarse-grained (R3) A: Fine-grained features represent shallow, high-resolution features. Coarse-grained features represent deep, low-resolution features.

FiMA simultaneously capture spatio-temporal information on multi-scale features, rather than capturing fine-grained spatial information and coarse-grained temporal information separately.

Q: Motivation (R3) A: FiMA’s motivation is to improve the reconstruction performance through multi-scale spatio-temporal learning and temporal information mining of multiple IMUs. We propose ReMamba, Adaptive Fusion Strategy, and Online Alignment Strategy to address the challenges in them.

1) ReMamba addresses the challenge of long-range dependency from the large number of patches during spatio-temporal information mining. It addresses this challenge with the long-range dependency management capability of the state space model (SSM).

2) Adaptive Fusion Strategy addresses the problem of IMU noise impact in mining temporal information from multiple IMUs. It fuses multi-IMUs with adaptive weights to extract appropriate information through mining the correlations between images and IMUs.

3) Online Alignment Strategy further mines the information of IMUs for unseen data, it uses IMU information as pseudo-labels to capture appropriate temporal features and improves the reconstruction performance on unseen data.

Q: Details of forward / backward direction (Section 2.1) (R1, R3) A: Forward direction represents the direction along the length of the input features with shape (batch, length, channel). Backward direction represents the reverse direction along the sequence length.

Q: Adaptive Fusion Strategy (Section 2.2) (R3) Q1: Effectiveness A1: Adaptive Fusion Strategy significantly improves on all metrics compared to direct fusion (Table 1).

Q2: Details A2: Fusion module does not directly correct the acceleration/angle, it projects the acceleration/angle of multi-IMUs into high-dimensional features, and explores their correlations with the image features, thereby mining additional temporal information to improve reconstruction performance. (Section 2.2 and Figure 4).

Q: Loss Function (R1, R3) A: We refer to previous work [Ref1] which did not use balancing coefficient. The loss based on translation and rotation reported better results than Transformed Points Distance loss [Ref1].

Q: Dataset (R3, R4) Q1: Details of the dataset A1: We use 4 IMUs as shown in Figure 1. The acquisition frequency is 30 fps, and for each frame, we acquire ultrasound image, measurements of 4 IMUs simultaneously.

Q2: Public data A2: The reason we did not use public data is that, to the best of our knowledge, no public data currently available contains information of additional IMUs.

Q: Statistical analysis (R3) A: FiMA significantly outperforms the compared methods in almost all metrics for both arm and carotid datasets (t-test, p<0.05). The standard deviations have been provided in Table 1.

Q: Clinical significance (R4) A: Freehand 3D ultrasound allows acquiring 3D volumes of vessels, plaques and other anatomical structures where 3D probe is restricted. Our method requires only ultrasound images and affordable IMU for high-precision reconstruction, and in the future, IMU can even be integrated inside the probe to minimize the impact to the sonographer.

Q: Segmentation mask of vessels (R4) A: We manually annotated the segmentation masks of vessels for the qualitative results.

[Ref1] Luo, M., Yang, X., et al. “Recon: Online learning for sensorless freehand 3d ultrasound reconstruction.” Medical Image Analysis (2023)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Rebuttal addressed most of the concerns. Paper can be accepted.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Rebuttal addressed most of the concerns. Paper can be accepted.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top