Abstract

Dynamic MRI reconstruction, one of inverse problems, has seen a surge by the use of deep learning techniques. Especially, the practical difficulty of obtaining ground truth data has led to the emergence of unsupervised learning approaches. A recent promising method among them is implicit neural representation (INR), which defines the data as a continuous function that maps coordinate values to the corresponding signal values. This allows for filling in missing information only with incomplete measurements and solving the inverse problem effectively. Nevertheless, previous works incorporating this method have faced drawbacks such as long optimization time and the need for extensive hyperparameter tuning. To address these issues, we propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction that captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure. As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios while significantly reducing optimization time and requiring minimal hyperparameter tuning. Our code is available at here.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5140_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{BaiDay_DynamicAware_MICCAI2025,
        author = { Baik, Dayoung and Yoo, Jaejun},
        title = { { Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a dynamic-aware implicit neural representation (DA-INR) for the dynamic MRI acceleration reconstruction problem. Unlike existing advanced INR-based methods for dynamic MRI, the proposed DA-INR defines a canonical space and learns a deformation field for each frame using an INR network. Moreover, by incorporating image features extracted from a pre-trained encoder, DA-INR effectively leverages image priors. Experimental results on two MRI datasets show that the proposed DA-INR outperforms state-of-the-art INR-based methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The introduction of the canonical space is both reasonable and effective for the dynamic MRI problem. This approach is well-established in previous works (e.g., D-NeRF [1]) on dynamic scene representations.
    2. The proposed DA-INR further incorporates image features to improve model optimization and reconstruction performance. This approach seems promising.
    3. This paper conducts extensive experiments on two MRI datasets. The results demonstrate the superiority of the proposed method over existing approaches.
    4. This paper is well-written and well-organized, making it easy for readers to understand the work.

    [1] Pumarola, Albert, et al. “D-nerf: Neural radiance fields for dynamic scenes.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In the experiments, this work omits comparisons with several INR-based methods for dynamic MRI [1], [2], [3]. Additionally, the comparison with Feng’s method [4], referred to as HashINR in this paper, is not entirely fair. In Sec. 3.1, the paper states, “However, due to the sensitivity of the results to minor variations in regularization weights and their dependence on the data types, we decided to reproduce this method without these regularization terms.” However, the use of explicit priors is a key contribution of Feng’s method [4]. Therefore, I do not believe this comparison is entirely justified.
    2. As stated in the Abstract, one of the motivations for the proposed DA-INR is to reduce the tuning of model hyperparameters. However, the design of DA-INR does not seem to contribute to this goal. Specifically, model overfitting can be alleviated by effectively utilizing temporal redundancy through the introduction of the canonical space, which improves reconstruction quality. However, this approach does not seem to contribute to reducing hyperparameter tuning.
    3. The proposed DA-INR introduces a pre-trained feature encoder. However, it is unclear which dataset was used to train this encoder, raising concerns about potential data leakage. Additionally, it is unclear whether the proposed method can still be considered fully unsupervised.
    4. From the visualization shown in Fig. 3, the proposed DA-INR seems to produce worse results compared to GRASP. Please explain these results.
    5. An ablation study for the pre-trained encoder is missing.

    [1] Kunz J F, Ruschke S, Heckel R. Implicit neural networks with fourier-feature inputs for free-breathing cardiac MRI reconstruction[J]. IEEE Transactions on Computational Imaging, 2024. [2] Huang W, Spieker V, Xu S, et al. Subspace Implicit Neural Representations for Real-Time Cardiac Cine MR Imaging[J]. arXiv preprint arXiv:2412.12742, 2024. [3] Catalán T, Courdurier M, Osses A, et al. Unsupervised reconstruction of accelerated cardiac cine MRI using neural fields[J]. Computers in Biology and Medicine, 2025, 185: 109467. [4] Feng, Jie, et al. “Spatiotemporal implicit neural representation for unsupervised dynamic MRI reconstruction.” IEEE Transactions on Medical Imaging (2025).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a novel method for the dynamic MRI problem. However, it lacks comparisons with many relevant baselines. Moreover, some of the motivations and results need further clarification.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you for the authors’ rebuttal. This rebuttal addresses most of my concerns. Based on the authors’ promise to release the source code and include a discussion of related work [1–3], I recommend accepting this paper.



Review #2

  • Please describe the contribution of the paper

    The article proposes an optimization method for the INR reconstruction algorithm. This method can capture the spatio-temporal continuity of dynamic MRI data within the image domain and explicitly incorporate the temporal redundancy of the data into the model structure.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a new dynamic hash encoding scheme to accelerate the optimization process in dynamic MRI reconstruction. Its main advantages include:

    1. Combines the efficiency of hash encoding for rapid optimization with an explicit design inspired by D-NeRF to effectively capture continuous temporal redundancy.
    2. By leveraging the normative network, the method proposed in this paper incorporates temporal consistency into its structure, reducing the reliance on explicit regularization terms.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The writing is unclear. The method of the manuscript is not explained clearly enough.
    2. The method use the pretrained image feature extractor to obtain additional information for dynamic MRI reconstruction. According to the reference[4], The method was studied in the field of natural images, and there is no research indicating that the method has sufficient generalization ability to be directly applied in the field of medical images
    3. In Figure 3, the result of GRASP has less artifact than the proposed method. The result of GRASP obviously has better details and contrasAt than other methods, rather the proposed method of the manuscript. And the Table2 shows GRASP is much faster, so I can’t help doubt whether the proposed method is really work.
    4. The paper does not perform ablation analysis of core components such as morphing networks, canonical Spaces, and pre-trained feature extractors, so it is not possible to quantify the contribution of each part to performance.
    5. The manuscript mentions performing time and space interpolation on the features. While time and space interpolation can compensate for low-resolution details, these operations are applied to the features of undersampled MRI. First, undersampled data contains many artifacts, and interpolation can introduce non-existent details that affect reconstruction. Second, does interpolating the deformation features impact the regularization of deformation in the spatial domain?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The article proposes a novel framework for spatio-temporal representation learning, specifically tailored for dynamic MRI reconstruction.However, the method lacks sufficient innovation, and the comparative experiments are not comprehensive enough.Furthermore, when dealing with dynamic MRI data of complex motion patterns, the deformation network and the design of the reference space of the model cannot be proved to be capable of accurately capturing subtle deformations.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    20. 1.Medical images require a higher level of detail and feature accuracy compared to natural images, and the article does not adequately demonstrate that the method of using a pre-trained image feature extractor to obtain the additional information required for dynamic MRI reconstruction is sufficiently generalisable to be applied directly to the field of medical images 2.The author’s rebuttal does not dispel any doubts about whether the interpolation operation causes artefacts that affect detail and the final result. 3.The author’s rebuttal doesn’t see any validation of the core component’s contribution to the overall performance improvement.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel framework for spatio-temporal representation learning tailored to dynamic MRI reconstruction without requiring ground truth data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method is an unsupervised learning framework that eliminates the need for fully sampled data.
    2. The method demonstrates robust performance even at extreme undersampling ratios, with reduced optimization time and fewer hyperparameters, thereby improving its flexibility and practicality.
    3. The experimental results on retrospective cardiac cine data demonstrate the effectiveness of the proposed method.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    This paper introduces a deformation field in the canonical space to account for temporal differences. The overall optimization process is accelerated by reducing the need for hyperparameter tuning. I still have some questions that need to be clarified by the authors:

    1. The concept of “dynamic-aware” mentioned in the title is not clearly reflected in the method. The authors are encouraged to clarify how this aspect is incorporated into their approach.
    2. What is the architecture of the Canonical network mentioned in the paper? It is not clearly described in the manuscript.
    3. The optimization objective in this paper relies solely on data consistency loss. How does the method enforce constraints on high-frequency components outside the undersampled regions?
    4. The authors removed the regularization term when reproducing the comparison method from reference [2], which seems unreasonable.
    5. It appears that both the dynamic cine data and the dynamic contrast-enhanced liver data used in the study consist of only a single subject. This raises concerns about the limited dataset size.
    6. Please include the variance in the quantitative results shown in Table 1.
    7. Please add subfigure labels to Fig. 3.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a dynamic-aware spatiotemporal representation learning framework for dynamic MRI reconstruction. As an unsupervised learning approach, it alleviates the reliance on fully sampled data, particularly in dynamic imaging scenarios. However, there are still some detailed issues that need to be addressed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author has addressed all of my concerns.




Author Feedback

We appreciate all of your comments.

  • Pretrained encoder (R1-2,R1-4,R2-3,R2-5) In medical imaging or inverse problems with limited data, it is common to use models pre-trained on large-scale natural images for feature extraction, especially in low-level vision tasks like reconstruction, where fundamental image properties tend to generalize well across domains [5–8]. We use MDSR, pretrained on super-resolution, as a frozen encoder in DA-INR; there is no risk of data leakage or fine-tuning on in-distribution data, ensuring our method remains unsupervised. We compared EDSR [9], RDN [10], and SwinIR [11], but omitted results due to space limits, with the suppl. limited to multimedia results only; MDSR [12] showed the best PSNR & runtime (30.13dB, 1445s) on cardiac cine reconstruction at AF 9.8, compared to RDN (29.28dB, 6024s) and SwinIR (29.34dB, 5889s).

  • HashINR w/o regularization (R2-1,R3-4) We compared with HashINR [4], the most recent and closest work in terms of architecture, data, and task. Using the official code, we found the results highly sensitive to regularization weights—small change (e.g., ±0.05) all led to noisy or black images. Despite extensive tuning, stable reproduction was infeasible. To ensure reproducibility and isolate the method’s inherent behavior, we ran it without regularization. We will release our code to allow others to verify this under identical setups, including the instability we saw. We also acknowledge the relevance of [1–3] and will cite and discuss them in the final paper.

  • Implicit Dynamics Regularization of Deformation+Canonical nets (R1-4,R1-5, R2-2,R3-2,R3-3) The reviewers’ questions converge on how our architecture (deformation (Ψt) & canonical (Ψx) nets) acts as an implicit regularizer. DA-INR enforces temporal consistency via a shared canonical space jointly optimized with Ψt across the sequence (Sec 2.2). Though described separately, Ψt and Ψx act as a unified module: Ψx models canonical signal across time, while Ψt learns to deform it per frame. Removing both essentially recovers HashINR, making the performance gap in Tab 1 (25.07dB vs 29.59dB) an empirical measure of their contribution. The canonical space is updated using spokes from all frames, aggregating structural details and forming a complete high-frequency representation. Ψt adjusts this to each frame, accounting for dynamic deviations. This acts like multi-view regularization that emerges from the framework itself rather than handcrafted priors, removing manual hyperparameter tuning. Ψt and Ψx are implemented as 5-layer MLPs with 64 hidden units and ReLU activations; Ψx outputs complex-values, and Ψt outputs deformation vectors. We will clarify these in the final paper.

  • GRASP, Fig 3 (R1-3,R2-4) While GRASP appears clean in the x–y domain, the temporal signal curves in Fig 3 (right, orange line) show it fails to capture dynamic contrast changes, instead overfitting to a few frames. This highlights its limitation in modeling temporal variation, which is essential in dynamic MRI. Our method better preserves frame-wise dynamics, offering improved spatio-temporal fidelity.

  • Single subject data (R3-5) Optimization-based approaches such as compressed sensing and INR methods [1,4,13–14], including ours, optimize per sample, so their generalizability is less affected by dataset size compared to learning-based methods. Our experimental setup follows this widely accepted protocol.

  • Etc. (R1-1,R3-6,R3-7) We will provide a clearer explanation of our method, report variance in Tab 1, and add subfigure labels to Fig 3. [1] Kunz et al.,IEEE TCI 2024 [2] Huang et al.,arXiv:2412.12742 [3] Catalán et al.,Comput. Biol. Med. 2025 [4] Feng et al.,IEEE TMI 2025 [5] Dar et al.,MRM 2020 [6] Zhang et al.,MICCAI 2023 [7] Li et al.,ICCV 2023 [8] Fang et al.,CVPR 2024 [9] Lim et al., CVPRW 2017 [10] Zhang et al.,CVPR 2018 [11] Liang et al.,ICCV 2021 [12] Gao & Zhuang,CVPRW 2019 [13] Yoo et al.,IEEE TMI 2021 [14] Lustig et al.,ISMRM, 2006




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the paper proposes a spatio-temporal implicit neural representation (INR) framework tailored for dynamic MRI reconstruction, after careful consideration of the reviewer comments and rebuttal, this AC does not find the contributions sufficient for acceptance. The authors claim novelty by incorporating a canonical space with a deformation network and a pretrained feature encoder; however, each of these components has clear antecedents in prior literature. The use of pretrained feature encoders from natural image domains, while argued to generalise in some low-level tasks, is not convincingly demonstrated to be suitable for clinical MRI, and the paper lacks any validation of this assumption—either experimentally or through ablation. The omission of key INR-based baselines in the main experiments further weakens the empirical grounding of the method.

    More critically, while the rebuttal attempts to quantify the contribution of various components (e.g., canonical and deformation networks), these results were not in the original manuscript and rely on an indirect performance gap argument rather than a proper ablation. There is also no experimental validation that directly examines the potential artifact introduction from feature-space interpolation of undersampled data, a valid and unresolved concern. The response to these issues remains speculative and lacking in rigorous support.

    Furthermore, the paper suffers from overclaims regarding reduced hyperparameter sensitivity and unsupervised learning. For example, the removal of explicit regularisation in baseline comparisons (e.g., with HashINR) is methodologically questionable, as it downplays the role of priors that are integral to the baseline’s design. The justification that regularisation tuning was too sensitive is unconvincing at this level, especially without reporting quantitative attempts to replicate prior results faithfully.

    Finally, while the method may show promise in limited examples, the experiments are confined to datasets with only a single subject per task. The absence of multi-subject validation and the lack of reported variance raise concerns about generalisability and reproducibility, which are especially critical in clinical imaging contexts.

    Overall, despite a well-written rebuttal and an interesting framework, the lack of experimental rigor, limited novelty, weak comparative baselines, and unresolved methodological concerns prevent this AC from recommending this paper for acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces a dynamic-aware implicit neural representation framework for unsupervised dynamic MRI reconstruction.

    The method incorporates a canonical space and deformation network to model temporal continuity and enriches the representation with image features from a pre-trained encoder.

    While initial concerns were raised regarding clarity, lack of ablation for key components, and comparison fairness—especially with HashINR, most of the technical concerns were addressed in the rebuttal.

    With strong empirical performance across multiple datasets and a novel formulation that avoids handcrafted regularization, the paper makes a good contribution to the MRI reconstruction literature and is recommended for acceptance.



back to top