Abstract

This study challenges the validity of retrospective undersampling in MRI data science by analysis via an MRI physics simulation. We demonstrate that retrospective undersampling, a method often used to create training data for reconstruction models, can inherently alter MRI signals from their prospective counterparts. This arises from the sequential nature of MRI acquisition, where undersampling post-acquisition effectively alters the MR sequence and the magnetization dynamic in a non-linear fashion. We show that even in common sequences, this effect can make learning-based reconstructions unreliable. Our simulation provides both, (i) a tool for generating accurate prospective undersampled datasets for analysis of such effects, or for MRI training data augmentation, and (ii) a differentiable reconstruction operator that models undersampling correctly. The provided insights are crucial for the development and evaluation of AI-driven acceleration of diagnostic MRI tools.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3200_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://git5.cs.fau.de/rajput/death-by-retrospective-undersampling

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Raj_Death_MICCAI2024,
        author = { Rajput, Junaid R. and Weinmueller, Simon and Endres, Jonathan and Dawood, Peter and Knoll, Florian and Maier, Andreas and Zaiss, Moritz},
        title = { { Death by Retrospective Undersampling - Caveats and Solutions for Learning-Based MRI Reconstructions } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors suggest that using retrospective undersampled data for model training could be biased, as magnetization dynamics can be different in fully-/under- sampled acquisitions. They have demonstrated the its impact on the efficacy of learning based image reconstruction methods, and proposed to use a Bloch model to solve this problem.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    No comments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The manuscript is NOT well written, many details are missing. The results cannot support the claims in the manuscript. For more specific critiques, please check the “detailed and constructive comments”.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.Are there any publications talking about similar problems (mismatch of magnetization dynamics between training/real data)? In my understanding, this is not a real problem in reconstructing accelerated MR data. It can be circumvented by properly design the sequence for acquiring the fully-/under- sampled data. 2.What’s T_rec in Fig 1? Is it T1 or TR (repetition time)? 3.Details for simulation are missing. E.g., what are the sampling trajectories for data acquisition? what are the parameters for simulation (T1/T2/T2*/PD/…, TR/TI/FA/…)? 4.The same slice was used for all the images, what about other slices? 5.Can the proposed method be used for acquired data rather than simulated data?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript is NOT well written, the results cannot support the major claims.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper examines the limitations of retrospective undersampling in MRI data preparation for learning-based reconstruction models. It introduces a differentiable MRI simulation within the PyTorch framework to accurately simulate prospective undersampling scenarios. This approach allows for the identification and correction of mismatches in training data. This offers a viable solution to enhance the reliability and accuracy of accelerated MRI methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The paper introduces a differentiable MRI simulation, which enables accurate modeling of MRI physics for training data preparation in learning-based reconstruction methods. • By embedding the simulation in machine learning environment, the study facilitates the application of complex MRI simulations in learning-based reconstructions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • The study assumes homogeneity in B0 and B1 fields during the simulation of MRI data, which may not accurately reflect real-world scenarios where field inhomogeneities are common and can significantly affect image quality. This limitation could reduce the generalizability of the findings to clinical settings. • There is no detailed comparison with other existing methods or technologies.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The submission provided all the source codes.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    No comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The paper introduces a novel and technically sound method to address a significant issue in MRI reconstruction.
    2. The methods are well-explained. Given these considerations, the paper warrants acceptance due to its contributions but also suggests areas for improvement.
  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have effectively addressed all of the major and minor concerns raised during the initial review, which supports the paper’s acceptance.



Review #3

  • Please describe the contribution of the paper

    This study evaluates the efficacy of deep learning models designed for accelerated MRI reconstruction when trained on retrospectively undersampled MRI data versus their performance on prospectively collected data. The authors utilize a simulation framework to generate prospective acquisitions for their analysis. By comparing the magnetization measurements of simulated acquisitions with those assumed by an invariant IR-FLASH sequence under undersampling, they investigate the generalizability of a Variational Network model trained on both prospective and retrospective undersamplings, validating the model’s performance across both data types. The findings highlight that models trained on retrospectively undersampled data exhibit poor generalization to prospective data, whereas models trained on prospectively undersampled data demonstrate more robust generalizability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s most notable contribution is its novel comparative analysis between prospective and retrospective undersampling in MRI reconstruction, a topic scarcely addressed in existing literature. Given the prevalent reliance on retrospectively undersampled data for training deep learning models in MRI reconstruction, this research offers valuable insights into their generalization capabilities on prospectively acquired data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • The absence of references for the simulation software (MR-XXXX) and a lack of detailed methodology hinder the transparency and reproducibility of the simulations, despite the provision of code with the submission. • The paper fails to clearly define the experimental setup, contributing to ambiguities in the methodology. • Conclusions are predominantly based on the reconstruction of two to three single slices (Figures 2 to 4), without incorporating quantitative results to bolster the findings. A table could have been included providing these results along with significance testing.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The provided code within the submission’s zip file lacks explicit documentation or mention within the paper, obscuring its purpose and the reproducibility of the experiments. Furthermore, the minimal detail on the implementation of simulation experiments and the absence of software references significantly challenge the assessment of the study’s comparative analysis validity.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Minor Weaknesses • The omission of the Bloch equation’s definition (equation 1) is a missed opportunity for clarity, especially since space constraints are not an issue. • Typos: o Page 3: “… is also known as an…” o Page 5: space needed before [7, 8]: “…undersampling of existing data [7, 8]…” • The figures throughout the paper are of suboptimal quality.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite concerns regarding the lack of detailed implementation and experimental clarity, the novel insights into the differences between prospective and retrospective data training in MRI reconstruction lead me to a tentative weak acceptance. It is hoped that this paper will spur further research in this critical area.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Taking into account all reviews and authors’ rebuttal as well as authors’ commitment to add quantitative results to the main paper if accepted, I am inclined to weakly accept the paper although there is lack of clarity in the experimental framework and procedure.

    This reviewer suggests that a single paper combining the improved implementation as well as the results of this submission would be more appropriate, in case of rejection.




Author Feedback

We would like to thank all reviewers (R1, R4, R5) for taking the time to review our contribution and give us this constructive feedback. The reviewers acknowledged the novelty. In the following, we would like to take the opportunity to clarify the points raised by the reviewers: Q1: Existing literature, comparison with other work, and necessity for the study. (R1, R4) A1: R5 and R4 emphasized that the problem we outline, the mismatch in magnetization dynamics between retrospective and prospective undersampled data, is a crucial point (cf. Fig. 1), that has received little attention in the literature (R5), but there is a study by Shimron et al. that shows bias in learning-based reconstruction from public off-label data [Ref1]. We agree that suitable training data can be obtained by tailored acquisition [4]. However, this means that specific undersampling factors or patterns have to be all newly acquired. This would be best practice, but such training data can never compete in sample size with existing measured data, also limiting vendor-independent and multi-site studies. Our tool describes the correct dynamics, and offers a highly efficient alternative for data generation. Moreover, the provided reconstruction operator can be used as zero-shot model. Thus, it is highly relevant especially to be highlighted within the MICCAI society. Q2: Quantitative results and conclusions based on two to three slices. (R5,R1) A2: In the training and evaluation section, we mentioned that 5 subject volumes, each containing 70 slices, were utilized for validation/testing. Of these, two subjects were exclusively for test data, comprising 140 slices. In the submitted manuscript, we showed an exemplary test data slice with RMSE and SSIM, which is representative for the full dataset. However, we acknowledge the lack of quantitative results for the entire test set and, as suggested by R5, we will publish the table for SSIM, MAE and RMSE for prospective and restrospective test data for VN trained on retrospective data with the p-value for all three metrics preferably in a main article and, if not possible, in the supplementary material.
Q3: Explanation of Eq. 1, references for the MR-XXX framework, B1,B0 inhomogeneities and generalization to actual data. (R4,R5,R1) A3: The simulation framework MR-XXXX and the function Bloch() in Eq. 1 is our recent improved implementation of the extended phase graph algorithm [1, Ref2]. The main message of our study, that retrospective undersampled data is different from prospective undersampled data, is not affected by B0 and B1 inhomogeneities and as mentioned in the first paragraph of the methods, our operator can describe all cases including real data scenarios [1, Ref3]. We acknowledge the suggestions on application of our method on real data, which is what we currently perform to be published in future work. We will include Ref2 and Ref3 in our reference list. Q4: Code and detailed explanation of methodology. (R5) A4: One can check the performance of VN trained on retrospective data for prospective undersampling by using ‘retro_model.pth’ and argument kspace_name = kspace. The sequence used in the study was IR-FLASH with FA of 10°, the recovery time after the FLASH sequence was TREC = 2 secs and 12 inversion times TI = [10.0, 5.0, 4.5, 4.0, 3.5, 3.0, 2.5, 2.0, 1.5, 1.0, 0.2]. We will include this description in the final article. We will also provide the open sequence standard Pulseq file detailing all the sequence parameters and further documentation of the code, so that all results can be reproduced. If accepted, we will include colabs for creating training data and fitting to simulated and real MRI data, so that every MICCAI researcher can use this tool to investigate their own undersampling approach for potential data crimes, and especially how to avoid them.

Ref1. https://doi:10.1073/pnas.2117203119 Ref2. https://doi:10.1002/jmri.24619 Ref3. https:// doi:10.1002/mrm.27040




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Accept. However, I found R1’s review to lack depth and detail. The other two reviewers both found merit in the paper and seemed to be happy with the authors rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Accept. However, I found R1’s review to lack depth and detail. The other two reviewers both found merit in the paper and seemed to be happy with the authors rebuttal.



back to top