Abstract

Regression on medical image sequences can capture temporal image pattern changes and predict images at missing or future time points. However, existing geodesic regression methods limit their regression performance by a strong underlying assumption of linear dynamics, while diffusion-based methods have high computational costs and lack constraints to preserve image topology. In this paper, we propose an optimization-based new framework called NODER, which leverages neural ordinary differential equations to capture complex underlying dynamics and reduces its high computational cost of handling high-dimensional image volumes by introducing the latent space. We compare our NODER with two recent regression methods, and the experimental results on ADNI and ACDC datasets demonstrate that our method achieves the SOTA performance in 3D image regression. Our model needs only a couple of images in a sequence for prediction, which is practical, especially for clinical situations where extremely limited image time series are available for analysis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1899_paper.pdf

SharedIt Link: https://rdcu.be/dV1T1

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_63

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ZedKing12138/NODER-pytorch

Link to the Dataset(s)

https://adni.loni.usc.edu/ https://humanheart-project.creatis.insa-lyon.fr/database/api/v1/folder/637218e573e9f0047faa00fc/download

BibTex

@InProceedings{Bai_NODER_MICCAI2024,
        author = { Bai, Hao and Hong, Yi},
        title = { { NODER: Image Sequence Regression Based on Neural Ordinary Differential Equations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {673 -- 682}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper handles an importance research problem for predictive longitudinal image generation by Neural ODEs. The idea and method design is novel to some extent however it still lacks of clarity with several major flaws.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work showcases a level of innovation for several reasons:

i) The utilization of a neural ordinary differential equation (ODE) entirely parameterized in a low-dimensional latent space is a notable strategy. This approach enhances the efficiency of ODE computation while potentially improving accuracy by focusing on critical temporal image features governed by the latent space.

ii) The methodology design is characterized by a clear definition of loss functions and network flow, contributing to the method’s solidity and facilitating its reproducibility.

iii) The experimental setup demonstrates robustness, with comprehensive validation across different datasets. By presenting comparable results for accuracy and computational time, the study effectively substantiates the method’s efficacy and efficiency.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

However, there are several concerns for the methodology design and experimental setup.

i) Instead of employing a complicated neural ODE for longitudinal image generation, could spatial-temporal transformer approaches with diffeomorphic constraints be considered? This alternative approach might offer advantages in terms of simplicity and interpretability.

ii) While GAN-based methods may be considered somewhat outdated, combining GAN approaches with time embedding could still present a solid alternative for comparison.

iii) There appears to be a challenge with the proposed method in generating long-range images as the number of time points increases. The observed increase in image differences over time, as depicted in both Figures 3 and 4, suggests a degradation in image quality, particularly noticeable in the change in brain contrast. Incorporating a discriminator to ensure data realism could be crucial. Despite generating images with reduced intensity differences compared to ground truth, there remains skepticism regarding whether the proposed method maintains accurate anatomical information with acceptable image quality.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

None.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

None.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Methodology design and experimental set up.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper presents a method of performing regression on medical images using neural ordinary differential equations. The method is demonstrated on two public datasets, one of brain MRIs and the other cardiac MRIs, producing promising numerical results compared to comparison SOTA methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposes an interesting and as far as I am aware novel solution for performing regression on medical images. In general, the paper is well written, easy to follow, and the proposed method seems sensible. The proposed method is compared to two other SOTA methods and produces the best numerical results on two open datasets
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The novelties compared to previous works, especially NODEO, are not clear. Some key parts of the method/theory are unclear or confusing to me (see details comments in 10). The interpretation and discussion of the evaluation results are quite limited
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Do you have any additional comments regarding the paper’s reproducibility?

no
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

If this paper is expanded to a full journal paper in the future it would be good to further discuss how the proposed method compares to other previous methods to help make the novelties of this work clearer. This is especially true for the NODEO method – from a brief look at this paper it appears to use a similar approach, and while it focusses on pairwise registrations, it does include some preliminary work and discussion on applying the method to multiple images. And while the novelties compared to the other methods mentioned are clearer, it would still be useful and informative to further explain these methods and discuss the key differences between them and the proposed method. This would help make the advantages (and potential disadvantages) of your proposed method compared to these alternative approaches clearer. E.g. it is not currently clear to me why the FPSGR methods assumes linear temporal changes, and exactly how this limits the results it can generate (I know you give the ref, but I do not have the time to read all the refs in all the miccai papers I am reviewing!) I may have misunderstood the notation or some of the underlying theory, but it seems to me that Eqs 1 and 2 define the voxel cloud q0 in the space of If or Ik, e.g. you could add (q0) after If or Ik. The transform psi then transforms the voxel cloud from If or Ik into Im or I0, where it is used to resample Im or I0 to generate the warped/resampled image that is compared to If or Ik. However, the text states that q0 represents the initial voxel cloud without any deformation, which implies it represents the voxels at time 0, i.e. the voxel locations in I0. The text also seems to describe modelling the motion of the voxel cloud over time. Presuming that ‘pull interpolation’ is used as is standard when resampling images, then the spatial transformation maps voxels for fixed image into the moving image (essentially telling us where they came from in the moving image) – which is what the Eqs seem to be saying. But then you are not modelling the motion of discrete points in the anatomy over time, but rather you are modelling how different points in the anatomy are moving through a fixed point in space over time (i.e. you are using a Eulerian rather than Lagrangian frame of reference). Maybe you are not using pull-interpolation but are using push-interpolation with some form of scattered data interpolation, or you are inverting the transformations. Or you are actually modelling in the Eulerian frame of reference (which I think should work fine but is harder to understand and justify the use of the ODEs). And/or the images should actually be warped the other way, i.e. warp each Ik to I0. Or I have misunderstood something. Whatever the answer, this issue should be clarified in a revised or expanded paper. It would be useful to provide more details on the training/fitting and inference of your method and the comparison methods to clarify how this works and highlight the similarities and differences between the approaches. If I have followed correctly, your proposed method, and FPSGR both require the model to be fitted separately to the data from each individual patient. Whereas SADM tries to learn from a large population of data during the training and uses the learned model to predict the results for an individual at inference time. It is not clear to me if the inference times for your method and FPSGR refer to the time to fit the model, or the time to generate a new result (image(s)) after it has been fitted. For these models, I would say the fitting time is more important, since this has to be repeated for each individual patient, whereas for SADM it is the inference time that is more important, since it only needs training once and can then be applied to new unseen individuals. But it is best to report both training/fitting and inference times for all methods for completeness (and to make it clear which time is which). Related to the above comment it would be useful to clarify how the pre-training of the autoencoder is performed. The evaluation performed is based on the assumption that the main goal of image regression is to generate unseen images, but I think this is usually not the clinical goal, but rather an easy way to evaluate and compare different methods. While I think it is useful to perform such an evaluation, precisely because it is relatively easy and allows comparison between different methods, I think more application focussed evaluations would really help to demonstrate the clinical benefit of the proposed method over others. Although, this would likely mean different evaluations for different datasets/applications. Related to this, your proposed method and FPSGR produce spatial transformations that can be used to define correspondence between the images and measure changes between them. However, SADM just produces a sequence of images and does not directly provide correspondence or measures of changes between the images. Whether this is a limitation or not for any of the methods will depend on the application they are being used for. Finally, while some visual results are provided, they are very small, which is common for miccai papers, but still makes evaluating them very challenging. There is also only one example result from each dataset, and no visual results from the comparison methods. I suggest using supplementary material to provide more and larger results, and as you are showing temporal sequences including movies of the results as well as / instead of images would be useful. But more importantly, there is no interpretation or discussion of the visual results at all. To me, the difference images show that the proposed method produces less errors than assuming nothing changes (i.e. comparing the images to I0) – which is good! – but there are still noticeable errors in some parts of the images. There also appears to be a strange intensity drift in the brain images over time (the background gets brighter). And when you zoom in on the cardiac images you can see they contain clear and obvious artefacts, especially later in the sequence. I strongly suggest discussing the visual results in an expanded paper, especially any apparent artefacts or other problems with them, so that it does not seem like you are trying to ignore/’hide’ any such issues.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall I think this is a good paper presenting some interesting work. I do have some criticisms of it in its current form (as explained above), but most of these could be easily addressed for a revised and/or expanded paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes to use Neural ODE to address the challenging problem of temporal image regression. To enable the applicability of Neural ODE, the Neural ODE works in a lower dimensional latent space using a pre-trained Autoencoder.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This application of NueralODE is a natural framework for the regression problem. Using an Autoencoder for dimensionality reduction is an excellent idea.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This is an excellent paper with no major weekness.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The authors should consider using a Variational Autoencoder instead of a standard autoencoder.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Strong Accept — must be accepted due to excellence (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an excellent application of NueralODE for regression.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

NODER: Image Sequence Regression Based on Neural Ordinary Differential Equations

Author(s):