List of Papers Browse by Subject Areas Author List
Abstract
We present NIMOSEF, a novel unified framework that leverages neural implicit functions for joint segmentation, reconstruction, and displacement field estimation in cardiac magnetic resonance imaging (CMRI). By leveraging on a shared implicit representation for joint segmentation and motion estimation our approach improves spatio-temporal consistency with respect to conventional grid-based convolutional neural networks and implicit segmentation functions. NIMOSEF employs an auto-decoder architecture to learn subject-specific latent representations from unstructured point clouds derived from image intensities and reference segmentations. These latent codes, when combined with 4D space–time coordinates, enable the generation of high-resolution segmentation outputs and smooth, temporally coherent motion estimates. Experimental evaluation on a subset of $700$ random patients from the UK Biobank demonstrates that our method achieves competitive segmentation accuracy—attaining Dice scores of up to $0.93$ for the LV, $0.90$ for the RV and $0.83$ for the LV myocardium, with improved spatio-temporal consistency, predicting a smaller number of disconnected components. Simultaneously, it achieves an average registration error of the whole heart boundary of $3.08 \pm 1.23$mm measured by the Chamfer distance, and $8.57 \pm 4.74$mm according to the $95$th percentile Hausdorff distance. Additionally, feature importance analysis reveals that the learnt implicit representation encodes physiologically relevant information. These results suggest that NIMOSEF offers a promising alternative for high-resolution, temporally consistent cardiac segmentation and motion estimation, with promising potential for advancing clinical assessment of cardiac function.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4418_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/jbanusco/nimosef-v1
Link to the Dataset(s)
UK Biobank: https://www.ukbiobank.ac.uk/
BibTex
@InProceedings{BanJau_NIMOSEF_MICCAI2025,
author = { Banus, Jaume and Delaloye, Antoine and M. Gordaliza, Pedro and Georgantas, Costa and B. van Heeswijk, Ruud and Richiardi, Jonas},
title = { { NIMOSEF: Neural implicit motion and segmentation functions } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15970},
month = {September},
}
Reviews
Review #1
- Please describe the contribution of the paper
The main contribution of this paper lies in introducing a novel unified framework based on implicit neural representations called NIMOSEF, which addresses the limitations of traditional methods in terms of resolution, spatio-temporal consistency, and clinical relevance, and provides a new technical approach for cardiac image analysis.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
NIMOSEF presents a novel unified framework based on implicit neural representations for joint segmentation, reconstruction, and displacement field estimation. Its innovation lies in the joint implicit representation of segmentation and motion estimation, which improves spatio-temporal consistency compared to conventional grid-based methods like CNNs that handle these tasks separately. NIMOSEF leverages the continuity of implicit representations to solve the inconsistency issue in traditional approaches.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper primarily compares NIMOSEF with baseline methods, such as traditional CNNs, but does not directly compare it with the latest SOTA methods. The experimental results also lack an intuitive visual evaluation. Such comparisons are necessary to demonstrate the superiority of NIMOSEF.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
(1) Limited methodological innovation: While the authors enhance implicit neural representations (INRs), INR application isn’t entirely novel. For instance, Stolt-Ansó et al.’s NISF (Neural Implicit Segmentation Functions) presented at MICCAI 2023 explored INRs in segmentation [27]. Also, Lowes et al.’s 2024 STACOM work used INRs for registering left ventricular myocardium during cardiac cycles [18]. Thus, NIMOSEF’s methodological novelty might be restricted. (2) ROI extraction validation missing: The paper uses circular Hough transform to define the cardiac region of interest (ROI) and extract 128×128 image patches. While effective, this method may be sensitive to data quality and distribution. Noise or unexpected heart position could lead to inaccurate ROI extraction, thus affecting model performance. (3) Experimental evaluation limitations: The paper mainly compares NIMOSEF with baseline methods like traditional CNNs, not with the latest INR-based methods (e.g., NISF [27] or CINA [5]) or other SOTA approaches. Also, intuitive visual comparisons are missing.
[18] Lowes, M.M., Pedersen, J.J., Hansen, B.S., et al.: Implicit neural representations for registration of left ventricle myocardium during a cardiac cycle. In: STACOM. Springer Nature Switzerland (2024) [27] Stolt-Ansó, N., McGinnis, J., Pan, J., et al.: NISF: Neural Implicit Segmentation Functions. In: MICCAI 2023. pp. 734–744. Springer Nature Switzerland (2023)
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have adequately addressed the major concerns raised during the review and rebuttal phase. Their responses are thoughtful, and the paper presents a novel and technically sound approach to joint segmentation and motion estimation. I recommend acceptance of this paper.
Review #2
- Please describe the contribution of the paper
In NIMOSEF, the authors explore the utility of using implicit neural representations for cardiac motion estimation and segmentation. Compared to concurrent research in this domain, the paper uses motion estimation as an auxiliary task to improve spatial-temporal consistency in the segmentation task. It ablates some relevant assumptions/notions on implicit neural representations vs CNNs in experimental design.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Major Strengths of the paper:
- NIMOSEF is built upon the framework introduced in [1], and expands this work by looking into an auxiliary task, namely motion estimation, and its benefits to segmentation for cardiac imaging.
- NIMOSEF experimentally validates some common assumptions of implicit segmentation functions (i.e. smooth boundaries, consistent volumes) that were previously just assumed to be true but not experimentally validated.
- The paper is nicely written, well organized, and engaging to read, it flows quite well. I would especially like to highlight the clear methods section, and the experimental design. These are easy to follow and well-motivated.
- Moreover, the presented methods are sound, and appropriately motivated (e.g. the choice of different loss functions), and experiments are fair and meaningful (hyperparameters are tuned, and evaluation criteria make sense.)
As such, it is a relevant and interesting contribution to the field of cardiac imaging and implicit segmentation functions.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
While I enjoyed reading the paper, I would like to highlight some room for improvement that the authors may use to improve it for the camera-ready version:
Intro:
- In the contribution section, I feel it should be noted more clearly that you built upon the same architecture and problem formulation as [6], but expanded it with the motion estimation task and set your paper better apart from it.
Related Work:
- I feel the related works section is the only weak part of the paper. It should discuss conditional architectures [2,3], works that use these architectures e.g. [3,4], and go a bit more into the details (as done in 2.3), as 2.1 and 2.2 read more like Intro and not related work. By now, the reader will / should already know what INRs are (SIREN etc.), and if not, you should include these ideas in the Intro and expand more on what related works papers do in related work, as you do in 2.3.
Related Work / Discussion:
- While the conditional auto-decoder architecture (MLP with the concatenation of (x,y,z,t) and latent vector) is common in both computer vision and medical imaging, better alternatives have been proposed [2,3] that have not only shown potential and merit in computer vision applications but have also been used for medical imaging, e.g. works that the authors already cite [4]. It would have been interesting and relevant to (i) include these in related work, indicating that they exist and are relevant to the paper / future work, and (ii) they would have been interesting to compare. While this doesn’t invalidate the presented results and the relevant experiments regarding assumptions of implicit functions and the value of adding an auxiliary task, this limitation should be acknowledged in the paper.
Discussion:
- I feel that using “better” conditional architectures will likely add to (even) better results than conditioning on sub-population details because these may also not be in the images. These may stem from the correlation of latent vectors and population factors (e.g. weight etc). Perhaps rather focus on this in the discussion?
- While the authors use the same strategy as in [1] to obtain gt segmentation labels, I feel that, especially since a CNN is used as the reference method, this is a somewhat limiting factor that the ground truth segmentation function comes from a pre-trained CNN as well. I know that obtaining manual labels for the 700 subjects is just not possible, but it’s something that could be discussed or stated:) I believe this is especially a limiting factor for DICE, but not for other experiments (eg Fig4).
Minor:
Typos / Formatting:
- Best results in bold in the tables
- Abstract: -> By leveraging a joint segmentation -> still demand a lot of memory -> perhaps better: are computationally expensive with respect to memory? -> Unstructured nature of the approach - Question: I don’t understand this, do you mean because of random sampling?
Misc:
- It would have been interesting to see the source code (e.g. via anonymous github) in the review process.
Questions:
(1) Why did you refrain from using both SAX and LAX images as originally proposed in [6]? Or did I get that wrong?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Based on the experimental design that validates some common conceptions of INRs previously not shown in literature, and the novelty of modeling motion estimation as a relevant auxiliary task, I believe this paper is an interesting contribution for the MICCAI community.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Many of the concerns I raised during the first review phase stemmed from unclear writing, missing methodological details, and an insufficient discussion of relevant prior work. In their rebuttal, the authors have effectively addressed these issues, providing clear justifications, particularly regarding why certain suggested baselines are not applicable in this context. I also acknowledge my own oversight regarding the evaluation metrics / unpaired evaluation setting, which explains why my proposed alternatives were not appropriate.
The authors have demonstrated a strong commitment to clarifying key aspects of their work, especially with respect to the pre-training strategy and the 3D reconstruction pipeline. As the majority of my concerns were related to clarity rather than fundamental flaws in the methodology, and since these have now been satisfactorily resolved without requiring any changes to the proposed method itself, I believe this paper is well-positioned for acceptance.
Given the novelty and promise of the presented approach, an assessment sahred by other reviewers, I recommend accepting this paper for presentation at MICCAI, where it could influence ongoing discussions in the field.
Review #3
- Please describe the contribution of the paper
The paper presents an implicit neural function approach to modeling cardiac motion and segmentation, leading to smoother and more consistent modeling than standard using convolutional architectures.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper offers thorough motivation and overview of the relevant fields.
The methods section is very clear, and bases most of its choices on previous works, leaving little to criticize. The loss functions employed are well justified in the text. The inclusion of domain-specific knowledge into the loss functions and architecture clearly shows improvements over previous works, both in terms of metrics and qualitatively.
The evaluation section is thorough. While the benefits of INRs over CNNs have been argued for in previous works, this paper’s extensive results empirically demonstrate these claims both qualitatively and quantitatively. The paper further offers an interesting investigation based on feature importance.
The paper also offers a fair amount of discussion on the limitations of INR architectures.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Unlike the previous MICCAI 2023 paper on Neural Implicit Segmentation Functions (N. Stolt-Ansó et al.), this paper proposes no inference-time procedure to generate segmentations on new images. Does that mean the results are simply smoothed versions of the ground-truth segmentations? Are the metrics provided in Table 1 for the training dataset (I don’t see a mention of a testing set anywhere). While I do not think this takes away from the paper’s contributions, I find it somewhat misleading to not explicitly emphasize this, specially since the abstract suggests NIMOSEF is a promising ‘alternative’ for cardiac segmentation. I fear this may lead unfamiliar readers to believe this work is able to segment ‘unseen’ images at the reported accuracies. I would suggest the following changes:
- emphasize NIMOSEF is a method to smoothen/interpolate existing (partial) ground-truth segmentations.
- remove (or relax) any mentions that NIMOSEF is an ‘alternative’ for segmentation.
- mention NIMOSEF could potentially be extended in future work to segment unseen volumes via inference-time optimization such as in N. Stolt-Ansó. 2023 / M. Muffoletto 2023 or via some form of single-shot latent prediction such as in various works in general computer vision.
While the paper is honest about its ‘ground-truth’ originating from W. Bai’s pretained CNN (and it is common for the majority of the UKBB dataset segmentation works to use these predicted segmentations as GT), some further discussion on the role of INRs as smoothing interpolation functions could be warranted. Afterall, the paper uses arguably flawed GT segmentations (specially around the apical and basal regions) in order to obtain a refined implicit representations.
The text does not offer a clear motivation on why the displacement module requires the sigmoid-weighted linear unit (SiLU) unit. While the cited work could likely provide context, unfamiliar readers might lack the background knowledge. Furthermore, the text mentions concatenation (“…to generate a ‘motion code,’ which is then concatenated with the encoder’s representation…”) while Figure 2 displays a symbol of addition (+).
The inclusion of an image registration section appears somewhat unnecessary to me. The field of image registration suggest there is some sort of intensity-matching and image warping. I would argue this paper is doing is something more akin to motion estimation, as the network only learns to approximate ‘ground-truth’ motion vectors extracted from the heart’s surface. The text also doesn’t clarify exactly how these surface points are selected during training. Is the network taught to predict zero vectors for non-boundary points? Also, if the author’s intention was indeed to connect their work to INR-based image registration, I would suggest to include into section 2.3 a reference to the original paper on INR-based registration by J.Wolterink: Jelmer M Wolterink, Jesse C Zwienenberg, Christoph Brune. Implicit Neural Representations for Deformable Image Registration. Proceedings of The 5th International Conference on Medical Imaging with Deep Learning, PMLR 172:1349-1359, 2022
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
While not relevant to the quality of the paper, the provided video attachment makes it hard to properly see the accuracy of the algorithm. While I doubt including qualitative results of segmentations into the paper would help gauge the quality of the modeled shapes and motion, it would have been nice to see rendered meshes and slice-wise 2D+time segmentation videos. The overall extent of the qualitative results is already exceptional and I hope that you display further animations on your poster presentation (on a tablet/phone). I look forward to seeing how this work will help define future organ modeling works, especially in the cardiac domain.
Minor spelling and formatting issues: Page4, last sentence: “…to generate a “motion code,” which is then concatenated with the encoder’s representation …”, quote should be placed before the comma.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This improves upon state-of-the-art, while having a paper structure that is comprehensive and intuitive. The results section offers strong empirical evidence on the effectiveness of INR-based interpolation. While other works would have justified their contribution based solely on metrics, this paper’s qualitative evaluation is thorough and convincing. The paper’s only flaw is the lack of test-time inference on unseen images.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors addressed many of my criticisms. On the topic of using CNN-generated ground-truth, I was left hungry for a deeper discussion than the superficial answer provided. Being familiar with the cardiac segmentation field, I am aware of the infeasibility of obtaining dense annotations for 3D+time volumes. CNN-based ground-truth suffers from spurious segmentations due to the lack of wholistic 3d+time understanding of the data. This paper’s contribution is an amazing posts-processing tool to clean up such CNN segmentations and I would have hoped the rebuttal to emphasize this challenge. -> Nontheless, I feel the paper and rebuttal are well above the acceptance standard.
I have to point out that reviewer 2’s comments on SOTA architectures to be irrelevant to this review. Every paper out there could benefit from ‘extra design choices to improve performance’. However, the authors chose make the conscious choice to use similar architectures to existing establish work (such as NISF) which makes their work easily comparable to the literature. If they would have used the proposed SOTA improvements from the computer vision literature, Reviewer 2 could have as easily criticized their decision to deviate from the existing cardiac INR literature.
Author Feedback
We thank reviewers for their constructive feedback and suggestions. We are encouraged by the positive remarks on our contribution and its relevance to the community. Below we address their concerns -Comparison with INR methods (R1) Our baseline—given by the motion head ablation experiment—shares architecture and objectives with NISF[1], making the comparison effectively equivalent. We will clarify this in the camera-ready. CINA[2] uses external feature conditioning, which we avoided to isolate the effects of our joint formulation. Lowes[3] focuses on CT registration using HU units and SDFs. We aim to model segmentation and motion jointly, without requiring precomputed SDFs. While a comparison could be informative, our emphasis is on the usefulness of a shared representation across tasks -Visual evaluation (R1,R3) Figure 4 shows smooth volume trajectories and reduced inconsistencies; Figure 1 illustrates qualitative outputs. We agree that more visuals (e.g. zoom-ins, meshes) would better showcase results. If accepted these will be in the presentation and within the GitHub repo -ROI extraction (R1) The circular Hough transform has been adopted in prior work[4] and performed robustly across all subjects. Since it is not central to our contribution, fallback mechanisms or a deep-learning approach could be included -Conditional architectures (R2) We appreciate the suggestion to expand this discussion. While the cited references were not visible, we assume R2 refers to works such as [2,5]. Future work will explore whether conditioning improves performance without biasing predictions or limiting the interpretability of latent codes. Conditioning may boost accuracy but could also constrain the structure of learned representations. We will revise the related work and discussion to reflect these trade-offs -SiLU and Figure 2 (R3) SiLU provides smooth gradients and is commonly used in INRs[6] to model continuous transitions. We will correct Figure 2 to reflect the use of concatenation -Boundary detection (R3) Boundary points are determined via k-NN label differences and guide displacement learning; non-boundary points are regularized through the L2 penalty. We will clarify this in Section 3.3 -Segmentation reference, DICE (R2,R3) We use CNN-based segmentations[7] due to dataset scale, which may affect Dice scores, particularly in challenging regions (apex/base). We will emphasize this limitation and clarify that our reference is not a gold standard -Inference and metrics (R3) All metrics are from the training set. NIMOSEF interpolates and refines annotations, correcting temporal inconsistencies. Post-submission, we tested inference optimization. First, with a warm-up phase with only intensity, akin to NISF, then motion refinement using predicted segmentations to compute boundaries. Results are omitted but support our view of NIMOSEF as a segmentation approach for unseen cases. In case, we can frame NIMOSEF as a complement or post-processing module for segmentation that simultaneously obtain motion estimates -Motion estimation vs registration (R3) We agree motion estimation is more accurate, we learn displacements from boundaries—not intensities. We were inspired by INR registration methods thus we will include a citation to Wolterink[8] in related work, adjust the terminology in the future and explore related work in motion estimation -‘Unstructured nature’ and use of LAX images (R2) We used SAX only for comparability with NISF and applicability to datasets lacking LAX. Including LAX is a promising direction for validation or additional shape constraints. “Unstructured nature” refers to point clouds where coordinates are treated independently -Minor issues (All) We will correct all noted typos, formatting, and phrasing issues References [1]Stolt-Ansó-MICCAI 2023 [2]Dannecker-MICCAI 2024 [3]Lowes-STACOM 2024 [4]Banus-ICASSP 2023 [5]Sørensen-MICCAI 2024 [6]Kazerouni-WACV 2024 [7]Bai-JCMR 2018 [8]Wolterink-MIDL 2022
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes NIMOSEF, a novel neural implicit representation framework for joint motion estimation and segmentation in 3D+t cardiac imaging. All reviewers recommend acceptance, citing the originality and technical soundness of the approach. The rebuttal effectively clarified concerns regarding methodological details, evaluation protocols, and baseline selection, particularly in the context of unpaired settings and 3D reconstruction.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper is well-written and easy to follow. It proposes a unified framework that simultaneously performs segmentation, image reconstruction, and image registration, serving as effective regularizations. Including more qualitative or visual comparisons would further highlight the paper’s contributions. I have a few minor suggestions regarding the paper’s organization. In particular, rearranging Figures 3 and 4 could improve clarity. Additionally, to address the issue of pseudo labels for segmentation, the authors might consider using the ACDC dataset, which provides ground truth labels for cardiac MRI.