Abstract

This study introduces a novel On-the-Fly Guidance (OFG) training framework for enhancing existing learning-based image registration models, addressing the limitations of weakly-supervised and unsupervised methods. Weakly-supervised methods struggle due to the scarcity of labeled data, and unsupervised methods directly depend on image similarity metrics for accuracy. Our method proposes a supervised fashion for training registration models, without the need for any labeled data. OFG generates pseudo-ground truth during training by refining deformation predictions with a differentiable optimizer, enabling direct supervised learning. OFG optimizes deformation predictions efficiently, improving the performance of registration models without sacrificing inference speed. Our method is tested across several benchmark datasets and leading models, it significantly enhanced performance, providing a plug-and-play solution for training learning-based registration models. Code available at: https://github.com/cilix-ai/on-the-fly-guidance

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0519_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0519_supp.pdf

Link to the Code Repository

https://github.com/cilix-ai/on-the-fly-guidance

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xin_OntheFly_MICCAI2024,
        author = { Xin, Yuelin and Chen, Yicheng and Ji, Shengxiang and Han, Kun and Xie, Xiaohui},
        title = { { On-the-Fly Guidance Training for Medical Image Registration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes “on-the-fly” guidance to improve the medical image registration by fine-tuning the deep registration network by instance optimization. The authors claim that such an approach can improve the registration quality and evaluate using three brain datasets and one abdomen dataset. The method can be considered as a self-supervised way to slightly improve the registration quality without increase of the registration time during inference.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Interesting concept that may be useful to slightly improve the quality of learning-based registration methods. 2) Easy to follow, understandable

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Limited evaluation that should be extended to answer questions related to: influence on generalizability, comparison to the direct instance-optimization, influence of the regularization coefficient.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The concept presented in the paper is relatively easy and should be reproducible. Nevertheless, link to the source code would be beneficial.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In general, the paper is easy to follow and understand. However, I miss several, very important experiments/evaluations:

    1) Comparison of the proposed OFG method to direct instance optimization applied after the learning-based registration - both in terms of the registration quality and time. 2) Ablation presenting the influence of the regularization coefficient on the OFG performance. It is crucial to verify the real impact of the OFG on the DF smoothness. The paper presents significant improvement in terms of the negative Jacobian, however, it is unclear whether the improvement comes from the OFG training or an increased regularization coefficient. 3) Evaluation using other datasets than IXI, OASIS, and LPBA40. These datasets are relatively “easy” and homogeneous - comparison to more heterogeneous datasets, as well as inter-dataset evaluation (training using one dataset, evaluation on the other) would be strongly beneficial. 4) Verification of the statistical significance (for DSC) between the baselines methods and the methods enhanced by OFG.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents nice concept that may be interesting to a wider audience. Nevertheless, I miss important comparisons and evaluations that would prove the usefulness of the proposed method. The current evaluation does not answer the most important questions related to the proposed contribution.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The rebuttal shows new results (why?), however, does not address the main concerns and motivations behind particular design choices and performed experiments. Therefore, it does not change my final score.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a new training framework for image registration that improves on existing deep learning image registration models. More specifically, the approach adds an optimisation derived deformation field as pseudo-ground truth for supervision. Improved registration results are shown for brain MR registration using a number of different datasets such as OASIS, IXI and LPBA40.

    without relying on labeled data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed framework seems novel and interesting. It consists of a prediction stage, followed by an optimization stage. The key novelty is in the optimization stage which uses a differentiable opti-mizer to iteratively refine the deformation field. Subsequently, the optimized deformation field is used as the pseudo label to provide supervision for the current predicted deformation during the training, forming a feedback loop between the prediction model and the optimizer module. The fact that different, existing learning-based registration models can be used in the prediction model is a strength.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall, the evaluation is not very convincing:

    • The improvements through the proposed framework are rather small.
    • Also the baseline performance reported on the CT images is very low (DSC of 0.312).
    • The evaluation also includes results for the number of voxels with negative Jacobians. The results show an improvement here but it is not clear where this improvement comes from?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is reasonably clear and the proposed method is novel. However, the rationale for adding the optimization stage during the training is not very clear. Since, this optimization is also intensity-driven, it is not clear why the same effect could have not also achieved by a purely intensity-based, unsupervised training. As mentioned above, the evaluation is also rather weak and could be improved.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has some strengths (interesting methodology, plug-and-play approach that can be combined with existing registration tools) but also several weaknesses (in particular the evaluation).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Thank you for the clarifications. My assessment of the paper has not changed and leaning towards acceptance of the paper.



Review #3

  • Please describe the contribution of the paper

    This paper proposed a On-the-Fly Guidance (OFG) training mechanism for learning-based deformable image registration. During training, from the prediction of the registration network, they move further step towards the optimal solution with instance-specific optimization (with only dense deformation field as optimizable parameters), and use the refined deformation field as the pseudo-ground truth to lead the network to predict better results. The OFG mechanism improves the registration results of three baseline networks in inter-patient brain MRI and inter-patient abdominal CT registration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method 1) improves existing registration models with self-supervised learning (automatic pseudo-label generation without extra annotation) 2) is a novel training workflow that can be easily adapted to existing registration networks 3) is evaluated in inter-patient brain MRI (3 datasets) and abdominal CT registration

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1)The statement of the difference between OFG and Cyclical Self-training (CST) is vague at a certain level. In essence, both methods use deformation field that is further step ahead towards the optimum, as pseudo-ground truth to improve the registration model. Except that the pseudo-ground truth of OFG is updated in each epoch (on-the-fly) while CST uses the prediction of, the model from previous stage, as pseudo-ground truth. 2)Is there a warm start of the trained model, which means training the model with image similarity and regularization for a few epochs? As is well-known that a registration model at the early training stage is not able to produce a reasonable deformation field. Therefore, during the OFG optimization step, it has a worse initial flow field compared to one produced by a model at stable training stage. This means that it will need more optimization steps (as shown in Fig.2 in Supp) to have an optimal flow field. Besides the training loss only depends on the pseudo-ground truth from OFG optimization, which means the quality of the pseudo-ground truth is important for the network training. Taking these into account, I think this will influence the convergence speed of the network training (not the OFG optimization). 2.1)A side question: at which training stage/epochs is Fig.1 in supplementary material generated? After OFG training, will it need less optimization steps for convergence -> faster convergence? 3)Is the OFG loss computed only in foreground (e.g., within the brain) or in the full image? Since from my experience, TransMorph will sometimes generate non-zero deformation field in the background area. Minor: what is 1/n in Eq.3? 4)The Differentiable deformation optimizer in Sec. 3.3 is just a simple Adam/SGD in the dense deformation field, I will not count it as a “proposed” optimizer. 5)What is exactly the “Network-based optimizer”? Is it just a pretrained network or optimizing the deformation field with the complete network (similar to neural instance optimization [1])? In Tab. 5 the Opt. Time of VoxelMorph-5 is 5.8 (seconds?, missing unit) but in Tab.4 the Network-based Opt. Time is 13.6. Why is there such difference? 5.1) What is the “Optimized Self-training” in Fig.3? Is extra optimization steps the same as OFG or a neural instance optimization? 5.2) What is the DSC shown in Tab.4 and 5? Is it on the complete dataset and from the trained model or from the extra optimization? Why does Tab.4 focus on the initial 30 epochs? 6) The “Opt weight” and “Threshold” in the second and third plot of Fig.4 miss definition. “Optimized Self-training 7) For Abdomen CT, VoxelMorph and TransMorph use different optimizer, does it mean different network architecture needs different setting of the image similarity loss and the number of optimization step? This might contradict the assumption that OFG is easy to plug-and-play.

    Continue in comments part.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The baseline methods are open-sourced. The evaluation datasets are open-sourced. The OFG optimization is easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Main weakness continues: 8) The extra time required due to the optimization in each training epoch is missing. What is the increased training time of OFG? 9) I am interested to see one extra result in Tab. 1: training with OFG and extra optimization. I see this as an upper bound of the possible registration results and I want to see if training with OFG is actually closing the gap between the model performance and the upper bound. 10) Lack of discussion on why OFG contributes a lot in decreasing non-positive Jacobian determinant. From Fig.1 in Supplementary material, after optimization, the Jacobian increase, which means the pesudo-ground truth deformation field is less smooth.

    [1] Mok TC, Li Z, Xia Y, Yao J, Zhang L, Zhou J, Lu L. Deformable medical image registration under distribution shifts with neural instance optimization.

    The following comments correspond to the previous weakness section:

    2) Elaborate more about the details of the start of training. 3) Elaborate if the OFG loss is computed within region of interest 5) Explain concepts of “Network-based optimizer”, “Optimized Self-training”. Explain the Opt. Time in Tab. 4 and 5. Explain more details regarding experiments of Tab. 4 and 5. 6) Explain concepts pf “Opt weight” and “Threshold” 7) Include the suggestions of how to pick the correct optimizer with proper number of optimization step when other researchers want to apply OFG to their architecture or other tasks (multi-modal) 8) Report the increased amount of training time of OFG 9) Include the “Upper bound” of the refined results with extra optimization after training w/o OFG (to see if the gap is closed) 10) Provide quantitative results to show that extra optimization always refine the prediction. From Fig.1, the accuracy and smoothness decrease after certain number optimization step. It would be nice to show the refined deformation fields are actually “better” than the model’s initial prediction. 11) The pesudo-ground truth could become “real ground truth” if the inputs are the fixed and moved images. Experiments regarding this would also be interesting. But this might be out of the scope of this paper. 12) Include relevant literature in self-supervised training:

    [2] Pan J, Rueckert D, Küstner T, Hammernik K. Efficient image registration network for non-rigid cardiac motion estimation. InMachine Learning for Medical Image Reconstruction: 4th International Workshop, MLMIR 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, October 1, 2021, Proceedings 4 2021 (pp. 14-24). Springer International Publishing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper brings insight into the combination of self-supervised training of registration network and instance optimization. It’s an interesting topic. However, the author’s focus seems to be diverging. I would recommend the author to focus on the more interesting part of the proposed method (e.g., deformation field with much less non-positive Jacobian determinant) instead of distinguishing themselves from previous work. Besides, to investigate if OFG can squeeze the capacity of a learning-based model by lifting-up its upper bound would also be interesting. I think this is a good paper with merits weighing over weakness. Therefore, I give my decision “weak accept”. I am looking forward to the author’s feedback regarding my previous concerns.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Thanks for the reply. Some of my confusions are addressed and the additional numbers are informative. I would still recommend the author investigate the reason why OFG keeps similar Dice while having much smoother field. Besides, I think it makes more sense to have OFG loss in the foreground area. Also, I would suggest the author clarify the missing details mentioned in the feedback in the revised manuscript. Overall, the proposed OFG is a plug-and-play method to any registration network and with extra optimization during training, it improves smoothness. I would recommend “Accept”.




Author Feedback

1.[R123] Brief Summary of Contribution (1) We proposed simple but effective OFG framework to boost the performance of existing unsupervised learning-based registration methods. (2) During the training, OFG takes the predicted deformation field from current training iteration and optimizes it as pseudo ground truth to explore the benefit of supervised learning in image registration task. The optimizer works as a “teacher” to provide direct evolving guidance to baseline registration models. (3) The overhead is limited: ~10% training overhead and ~0.7GB memory overhead. However, as OFG is applied in training only, inference speed is preserved.

2.[R123] Deformation Field Smoothness Previous methods directly apply smoothness regularization on predicted deformation field to optimize model parameters. Whereas, OFG optimizes models using only L2 loss shown in Eq (3) of main submission. The regularization is applied in the deformation optimization as in Eq (4) to provide smoother pseudo labels for direct supervision. The smoother labels provided by OFG can make the model actively fit to smoother deformation fields during the training.

Different Reg weight for TransMorph on LPBA Reg | DSC | Jacob 0.02(default) | 0.678 | 0.438 1.0 | 0.678| 0.418

Different Reg weight in Eq(4) for OFG: Reg | DSC | Jacob. 0.1 | 0.654 | 0.896 0.5 | 0.672 | 0.385 1.0 | 0.684 | 0.150 2.0 | 0.685 | 0.028

As shown, the smoothness regularization from smoother direct supervision provides better Jacob, along with DSC improvement.

3.[R13] OFG vs Direct Instance Opt We compare OFG results with the results of “direct instance optimization applied after learning-based registration”(R1), with 10 and 20 extra opt steps. As shown in the table, OFG could achieve similar DSC improvement but better Jacob. However, our method doesn’t require instance optimization in inference, which is faster (R1) and close to the upper bound (R3).

TransMorph on LPBA Config | DSC | Jacob. base | 0.678 | 0.438 base+10 opt | 0.684 | 0.518 base+20 opt | 0.686 | 0.631 OFG | 0.684 | 0.150

4.[R12] Extra Evaluation Besides brain MRI, we also provided evaluation results on Abdomen CT as shown in Tab 2 of main submission. The DSC of affine registration on abdomen is 0.236, and OFG could improve TransMorph 6.5% on DSC and 93.9% on Jacob.

  1. [R1] Verification of Improvement The proposed OFG significantly improves the DSC on IXI for VM and Vit-V, and Abdomen CT for TransMorph. Meanwhile, OFG can significantly outperform baseline methods regarding Jacob. More detail will be provided in the revised version.

5.[R3] Optimization Stability (1) As in response 3, extra optimization could refine the prediction from trained unsupervised methods. (2) We provide optimization process for deformation from (a) random initialized model and (b)random generation as in Fig 2 of Supp, which demonstrates optimization is able to refine registration even from random initialization. Therefore, it could perform well on the starting stage of training. The warm starting is an interesting topic and we will explore more in this direction.

6.[R3] OFG loss is applied on the whole image as in baseline methods, i.e. TransMorph, etc.

7.[R3] Network-based Optimizer: Apply cascaded VoxelMorph to improve registration results iteratively. The model prediction of n-th step will be the input for the n+1-th step.

8.[R3] Optimized Self-training: Optimize predicted deformation field from pre-trained registration model only once.The optimized field is used as final pseudo ground truth in supervision, with no optimization involved in training.

9.[R3] Opt weight: Considering the Training loss (a * NCC + b * OFG), opt weight is b/a to explore loss combination.

10.[R3] Threshold: The horizontal axis shows the percentage of training images using OFG loss, while the remaining will use unsupervised loss.

11.[R3] Optimization Steps: As shown in Fig. 1 of Supp, we recommend 5~10 Opt steps.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although there is reasonable disagreement over the overall merit of the paper, I agree there is just enough sufficient interest and innovation that it hsould be presented at miccai. It will lead to interesting Discussion. The overall scores also support his. Congratulations.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Although there is reasonable disagreement over the overall merit of the paper, I agree there is just enough sufficient interest and innovation that it hsould be presented at miccai. It will lead to interesting Discussion. The overall scores also support his. Congratulations.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
    The rebuttal did not change the direction of the reviewers (WA->A, WR->R). They found the approach novel, flexible and easy to implement. The authors have added new results in their rebuttal, which cannot be taken into consideration. I do agree with R4 that the result is surprising, as it does not seem like any additional information is fed to the network. In theory, the cost function is not changed by the proposed approach: argmin_\phi S(fix, mov(phi)) + lam * R(phi) == argmin_\phi S(fix, mov(phi)) + lam * R(phi) + beta *   phi – argmin_psi S(fix, mov(psi)) + lam * R(psi)   ^2. But DL is full of theoretically surprising but practically useful results. To me, this is borderline between accept and reject.
  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
    The rebuttal did not change the direction of the reviewers (WA->A, WR->R). They found the approach novel, flexible and easy to implement. The authors have added new results in their rebuttal, which cannot be taken into consideration. I do agree with R4 that the result is surprising, as it does not seem like any additional information is fed to the network. In theory, the cost function is not changed by the proposed approach: argmin_\phi S(fix, mov(phi)) + lam * R(phi) == argmin_\phi S(fix, mov(phi)) + lam * R(phi) + beta *   phi – argmin_psi S(fix, mov(psi)) + lam * R(psi)   ^2. But DL is full of theoretically surprising but practically useful results. To me, this is borderline between accept and reject.



back to top