Abstract

This paper presents a novel approach, termed Temporal Latent Residual Network (TLRN), to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase). To achieve accurate and robust registration results, we leverage the nature of motion continuity and exploit the temporal smoothness in consecutive image frames. Our proposed TLRN highlights a temporal residual network with residual blocks carefully designed in latent deformation spaces, which are parameterized by time-sequential initial velocity fields. We treat a sequence of residual blocks over time as a dynamic training system, where each block is designed to learn the residual function between desired deformation features and current input accumulated from previous time frames. We validate the effectivenss of TLRN on both synthetic data and real-world cine cardiac magnetic resonance (CMR) image videos. Our experimental results shows that TLRN is able to achieve substantially improved registration accuracy compared to the state-of-the-art. Our code is publicly available at https://github.com/nellie689/TLRN.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3610_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/nellie689/TLRN

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wu_TLRN_MICCAI2024,
        author = { Wu, Nian and Xing, Jiarui and Zhang, Miaomiao},
        title = { { TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces the Time-series Latent Residual Network (TLRN), a method for time-series image registration. TLRN handles large motions over time by leveraging motion continuity and temporal smoothness in consecutive frames. Utilizing a temporal residual network in a latent deformation space, TLRN builds on cumulative deformation features from previous frames. Validated on synthetic and real-world cine cardiac MRI sequences, the authors demonstrated that TLRN outperforms current methods in registration accuracy and robustness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Deep learning-based image registration methods often produce non-smooth deformations, e.g., VoxelMorph, due to the lack of regularization in their models. The TLRN method leverages motion continuity and can produce much smoother transformations (see results in Fig. 2).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The evaluation of the proposed registration method is not comprehensive, with evaluation on only 2D time-series images. Additionally, the 2D short-axis MRI slices acquired at different phases do not necessarily have spatial correspondences as the heart was moving in 3D. I would suggest the authors evaluate their method on a more well-defined time-series image registration problem, e.g., 4D CT image registration.

    2. Some notations in the paper were not well-defined, making the paper a bit hard to follow. For example, in Eq. (2), \phi_0 should be the identity map, or \phi_0(x) = x. Another example, in Eq. (4), what does \phi_i^{\tau} mean? The authors need to carefully proofread the notations used in this paper.

    3. What is the number of time steps used to compute \phi_1 for each pair of images? The deformation between phase 0 and phase 1 is less complicated than the deformation between phase 0 and phase 5; using the stationary velocity field formulation might not be the best approach. The authors could instead learn a time-varying velocity field from phase 0 to phase N, i.e., simultaneously registering all phases by learning only one time-varying velocity field.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Evaluate their method on 3D time-series images.
    2. Report mean landmark error.
    3. Compute the percentage of negative Jacobian values.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My decision is mainly based on the following two concerns/questions: (1) The evaluation is not comprehensive, (2) How does learning many different stationary velocity fields compare to learning only one time-varying velocity field?

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method to register a sequence of images to a first image simultaneously, and demonstrates that the registration performance particularly on late images in the sequence benefits from this whole-sequence-at-a-time approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method performs well on both synthetic and real data, and is explained with sufficient clarity that a reader could implement the overall approach in their own research. This is of course the goal of a research paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The approach is very similar to voxelmorph, and the improvement in performance over voxelmorph, while dramatic on synthetic data, is marginal on cardiac data. Since this paper uses the same scaling and squaring approach as diffeomorphic voxelmorph, it would not have hurt to compare to this as well. Any measurement of transform smoothness, such as percent negative jacobian voxels, is missing.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The details of the encoder and decoder are missing from the paper. The langage “Similar to [2] (top of page 4” suggests but does not confirm that the encoder and decoder share the exact architecture and parameter counts with voxelmorph, allowing direct comparison- if this is the case, this is a good selling point of the paper, as it would indicate that the change in performance comes from the TLRL layer. Could the authors confirm the feature counts and architectures of the encoder and decoder? My assessment of the reproducibility is contingent on this.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I would like to take a moment to commend the writing style of the authors, this paper is a joy to read.

    A performance comparison with an existing time series registration method (perhaps one of [4, 16, 19, 23, 24]) would have been appreciated.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the TLRL layer is a useful contribution to future work on sequence registration, and it is presented clearly and stylishly.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a novel registration network TLRN for cardiac image registration. It uses residual blocks carefully designed in the latent deformation space, parameterized by velocity fields, to effectively utilize cumulative deformation (motion) features learned from previous frames.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It pioneers the explicit integration of motion continuity and exploits the temporal smoothness inherent in consecutive image frames in the latent deformation space. They evaluated their method on two datasets and demonstrated that TLRN can enhance accuracy and robustness, particularly for tasks with large deformations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The experiments are focused only on the left ventricle, which is simple and unable to sufficiently demonstrate the performance of registration networks. In the network training, 20000 epochs seem quite large for the registration network, but no further descriptions are provided for this parameter. Do all the registration networks on this manuscript use the same settings? The authors claimed that their method “TLRN enhances accuracy, smoothness”. However, in the evaluation metrics, only MSE, Dice, and HD are used to evaluate the similarity. Metrics used to evaluate the smoothness of the obtained deformation fields like the ratio of negative Jacobian determinants are missing. How are the baseline methods trained? With a group registration or pairwise registration? Correspondingly, the ablation study is missing to demonstrate the proposed blocks in the network.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Insufficient experiments: it would be helpful to add metrics to evaluate the smoothness of obtained deformation fields, as it is highlighted in the manuscript. It would be helpful to add an ablation study which compare the proposed method with general pairwise registration methods. Lack of clarity: it would be helpful to provide details about how the training epochs are chosen, and how the baseline methods are built. For future work, I would recommend: -extend to the complete heart, like biventricular cardiac MR images. -extend to other anatomical structures like abdominal and lung images. -explore potential applications for the proposed method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The storyline is complete and the results look reasonable, but the registration experiments on the left ventricle look a little simple and lack the ablation study and some details.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers’ valuable comments and suggestions. Our responses to main questions are summarized and addressed as below.

[R1 & R3 & R4]

  1. Quantitative evaluation of smoothness. Thanks for the comments. As suggested, we will include quantitative results (i.e., the mean and standard deviation) of the percentage of negative Jacobian and detailed discussions to a revised manuscript. For example, for synthetic data, we have TLRN: 0.175% (0.373%), LM: 0.474% (0.969%), SVF-R2Net: 1.607% (1.917%), VM: 0.398% (0.591%), and TM: 0.176% (0.471%). This indicates that TLRN outperforms other baselines to maintain the smoothness of the deformation fields, while achieving the best registration accuracy.

[R1 & R4]

  1. Extension on future works. We appreciate this valuable suggestion. While we demonstrated the proposed model on cardiac left ventricular MRIs as initial results, we are actively working on extending TLRN to more complex scenarios, such as suggested biventricular cardiac MRIs, abdominal and lung images, and 4D CT image registration. We will add this into the discussion in a revised manuscript.

[R1]

  1. Compare to time-varying velocity fields. Thanks for the question. Please allow us to clarify that one of our baseline comparison (LM) is based on time-varying velocity fields without considering the motion continuity between time frames. Our experimental results show that the proposed TLRN outperforms the time-varying velocity fields. Additionally, our TLRN layer can also be easily incorporated with time-varying velocity fields. This could be an interesting direction to explore in our future work.
  2. Minor notation clarification. In Eq. (4), \phi_i^{\tau} means the stationary velocity that encodes the deformation from the reference image (the first time frame) to the {\tau}th frame. We set the timestep as 7 to compute transformation field for each pair, which is by default in many network architectures using SVF.

[R3]

  1. TLRN network architecture vs. VM. Thanks for the questions. Please let us clarify that in all our experiments, both TLRN and VM utilize U-Net as the architecture backbone of encoder and decoder with the same number of convolutional layers. Related network parameters including regularity weights on velocity fields, timesteps of integration, and etc. are also set to be the same as VM. The major difference between TLRN and VM is the proposed TLRN layer. We commit to publicize all codes upon the acceptance of this paper.
  2. Improvement on cardiac data. Thanks for the comments. We agree with R3 that the improvement on the 2D synthetic dataset is more significant than on the cardiac data, as the deformations are larger in the synthetic sequence. The benefits of our TLRN are more evident when large deformations exist. We are happy to exploit the performance on more real-world datasets with larger deformations in the future work.

[R4]

  1. Training strategy. Thanks for the questions. First of all, please let us clarify that our baseline utilizes pairwise image registration. Second, we apply the same number of training epochs for all comparison methods. We carefully track the training loss and the parameters to make sure each model achieves the best performance. Our model TLRN converges earlier around 7000 epochs.
  2. Ablation study. Our ablation study on comparing with VM (a.k.a., a registration model without the proposed TLRN layer) was performed and shown on both synthetic dataset and real cardiac MRIs. Please kindly refer our response to R3 for “TLRN network architecture vs. VM”.

All minor comments and suggestions will be carefully addressed in our revised manuscript, and the code will be published if the paper is accepted.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has addressed most of the issues raised by the reviewers. Please update the paper accordingly. I recommend accepting it.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has addressed most of the issues raised by the reviewers. Please update the paper accordingly. I recommend accepting it.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is a borderline paper for me, but has enough insight to be presented at miccai with a nice discussion. Unfortunately there was little engagement of the reviewers after rebuttal, making it challenging to appreciate the value of the revision. I do encourage the authors to take the reviews in to consideration for the C.R., as I think they will be important for the actual impact of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This is a borderline paper for me, but has enough insight to be presented at miccai with a nice discussion. Unfortunately there was little engagement of the reviewers after rebuttal, making it challenging to appreciate the value of the revision. I do encourage the authors to take the reviews in to consideration for the C.R., as I think they will be important for the actual impact of the paper.



back to top