List of Papers Browse by Subject Areas Author List
Abstract
Unsupervised registration strategies bypass requirements in ground truth transforms or segmentations by optimising similarity metrics between fixed and moved volumes. Among these methods, a recent subclass of approaches based on unsupervised keypoint detection stand out as very promising for interpretability. Specifically, these methods train a network to predict feature maps for fixed and moving images, from which explainable centres of mass are computed to obtain point clouds, that are then aligned in closed-form. However, the features returned by the network often yield spatially diffuse patterns that are hard to interpret, thus undermining the purpose of keypoint-based registration. Here, we propose a three-fold loss to regularise the spatial distribution of the features. First, we use the KL divergence to model features as point spread functions that we interpret as probabilistic keypoints. Then, we sharpen the spatial distributions of these features to increase the precision of the detected landmarks. Finally, we introduce a new repulsive loss across keypoints to encourage spatial diversity. Overall, our loss considerably improves the interpretability of the features, which now correspond to precise and anatomically meaningful landmarks. We demonstrate our three-fold loss in foetal rigid motion tracking and brain MRI affine registration tasks, where it not only outperforms state-of-the-art unsupervised strategies, but also bridges the gap with state-of-the-art supervised methods. Our code is available at https://github.com/BenBillot/spatial_regularisation.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0283_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/BenBillot/spatial_regularisation
Link to the Dataset(s)
N/A
BibTex
@InProceedings{BilBen_Spatial_MICCAI2025,
author = { Billot, Benjamin and Muthukrishnan, Ramya and Abaci Turk, Esra and Grant, P. Ellen and Ayache, Nicholas and Delingette, Hervé and Golland, Polina},
title = { { Spatial regularisation for improved accuracy and interpretability in keypoint-based registration } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15973},
month = {September},
page = {595 -- 605}
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors introduce three novel loss terms for deep learning-based keypoint registration, addressing both interpretability and spatial regularization of keypoints.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well written, precise in its contributions and shows good performance over baselines.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The novelty is limited, as well-known objective functions are merely adapted for keypoint-based registration. The registration is restricted to rigid and affine transformations. Evaluation is conducted solely on brain datasets, and comparisons are limited to baseline methods and older, relatively weak approaches such as ANTs and DLIR. Since the loss terms represent the main contribution of the paper, I would be interested in a comprehensive ablation study covering all possible subsets, including significance testing.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
While the anonymized code is appreciated, a number of files appear to be missing.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While I generally find the paper well-written and its contributions convincing with respect to the baseline, I am missing a more comprehensive evaluation to fully assess its performance—such as testing on more diverse datasets, comparing against more recent methods, or evaluating on deformable registration tasks.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
I appreciate the authors’ feedback and acknowledge that they have addressed most of the points appropriately. However, I remain unconvinced that the incremental contribution—namely, the combination of the three loss terms—offers sufficient novelty. Moreover, the results, as currently presented, provide limited value. If the primary aim of the paper is to evaluate the three loss terms, it remains unclear which term contributes to which performance improvement. While statistical significance is reported for the best result relative to the baseline, it would be informative to see a more detailed comparison. Additionally, I still believe that, for a paper that emphasizes results and evaluation rather than novelty, evaluation on a single dataset is insufficient.
Review #2
- Please describe the contribution of the paper
Keymorph is an unsupervised registration network that compute the transformation from keypoints predicted by the network. In this paper, the authors build on this framework by adding three losses to the training to improve the spread of the keypoints in the image and make the feature map associated to each keypoint more focal.
The method has been evaluated for affine registration of foetal MRI time series and brain MRI and compared to the original Keymorph, EquiTrack and the two non learning based methods ANTS and DLIR. The results shows an improvement in both accuracy and interpretability of feature maps.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The authors clearly identify a problem in the keymorph method, propose a solution to to improve interpretability and, as a consequence, also improve the registration performances without requiring new annotations.
-
experiments are convincing
-
the paper is well written and easy to read
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
no major weakness
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The L_kl loss makes the feature map have a Gaussian distribution. The L_var make them have a small variance. The Lkl loss seems less useful than the 2 other loss and it would be interesting to see if could be removed, i.e. to present the results when using Lvar and Lrep (without Lkl).
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
no major weakness
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The paper proposes to combine 3 losses to improve the interpretability and the performances of predicted key-points rigid or affine registration. Such type of model relies on a CNN, typically a UNet, to predict a series of K heatmaps. The barycenter of each of these heatmaps yields a keypoint. Such network is run on both, the fixed and the moving images, to obtain K corresponding keypoints, from which, a rigid or an affine transformation can be obtained in close form. The following 3 losses are proposed to improve this pipeline:
- Each heatmap is regularized by computing the KL divergence between the heatmap and the Gaussian distribution centered on the corresponding keypoint with the empirical variance.
- The Frobenius norm of the empirical variance of each heatmap is minimized.
- The key-points are encouraged to be spatially varied by maximizing the log of the sigmoid of all paired distances. These regularization losses are then weighted and combined with a standard similarity objective to obtain the complete registration loss. This approach is validated by incorporating these loss into the training of a KeyMorph model and an EquiTrack model, first on an in-house dataset, and then on ADNI. The efficiency and relevance of the method is clearly shown in these experiments.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well written and easy to follow. All methodological choices are well justified. The authors are building on top of relevant works of the field. The experiment section is convincing and well detailed. The method is simple and shown efficient which is a serious quality. The ablation study is very convincing. Connections to related literature are insightful.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
In my opinion, the main weakness of the paper is the that the novelty is very incremental. Combining these three losses is relatively close to other settings such as [14]. Yet, I believe the setting is different enough and the proposed changes are meaningful.
I add in the following a couple of points that I think should be addressed either in the revised version of the paper or in the rebuttal.
For the in-house dataset, the ground truth transforms are known. One metric that is missing for this experiment in my opinion is the TRE between predicted key-points after registration by the ground truth transform. Does adding these losses allow to predict better aligned key-points?
Some of the issues reported seem particularly relevant for EquiTrack which, because of its use of E-CNN has some relatively different dynamics to keyMorph. In particular, from table 1, it seems that the issue regarding the lack of spatial variance in predicted key points seems less relevant to keyMorph. Hence, it would have been nice to see an similar illustration as figure 2 but with keyMorph points. Especially, figure 11 in [38] seems to indeed indicate that points are already well spread. Hence, similarly, a breakdown of the improvements loss by loss for the keyMorph model would be beneficial.
The Dice numbers on the ADNI dataset are quite low which is due to the too simple deformation model for inter-patient registration. Hence, are these number still fully relevant? If the anatomies are very different, does an increase in mean Dice (I assume these numbers are averaged over all structures) always mean a better alignment? I believe that no. Maybe this could be discussed somewhere.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I enjoyed reading this paper and I believe it is a relevant work for the MICCAI community. I put Accept instead of Strong Accept because of the limited novelty. Yet, as mentioned above, these findings are interesting and well presented.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I believe the work presented is valuable to the MICCAI community and medical image registration community in general. It is simple, novel and shown effective.
Author Feedback
We thank the reviewers for their insightful feedback. We are glad they found our paper well written [R1,R2,R3] with clear [R1,R2] and well motivated [R3] contributions that fix a problem with current state-of-the-art (SOTA) methods [R1,R3]. The reviewers also highlighted our convincing experiments [R1,R3] that show the good results of our model [R1,R2,R3]. We now address reviewers’ comments (citations are from the paper).
R2,R3: Limited novelty We agree that our method builds on losses (Lkl, Lvar) from separate previous works in supervised keypoint-based registration [15,30]. However, we (a) combine these losses in a principled way, (b) adapt them to the unsupervised case, and (c) introduce a new loss Lrep to improve keypoints’ spatial diversity. We note that R1 and R3 found these methodological contributions to be meaningful for publication. Moreover, we highlight that MICCAI also encourages other forms of novelty, especially in the results. This is our case here as our regularisation (a) strongly improves the interpretability of keypoint-based registration methods, and (b) bridges the performance gap with supervised methods while requiring no annotations.
R1,R2,R3: Missing ablations We agree that our ablations do not cover all possible combinations of the 3 proposed losses; while we ran all ablations, we reported a subset due to space constraints. Unfortunately, due to a new MICCAI policy (no new results in the rebuttal), adding the other ablations would lead to desk rejection. However, we believe that our ablation strategy (starting from no regularisation and progressively adding each term) already explains the impact of each loss: Lkl (increased interpretability compared to no regularisation), Lkl+Lvar (Lvar makes keypoints more precise compared to Lkl alone), and Lkl+Lvar+Lrep (Lrep makes keypoints more spread-out compared to Lkl+Lvar). We also run an ablation with Lvar alone, showing that using Lvar without Lkl [R1] can lead to less interpretable bimodal distributions (Fig2, Tab1).
R2: Old/weak baselines We believe that the tested methods are representative of the SOTA in rigid/affine registration. First, the optimisation-based ANTs is probably the most used approach for medical image registration and is still very competitive (Fig3). Regarding learning methods, we include DLIR (SOTA in regression-based registration, 2019), EasyReg (recent segmentation-based method, 2023), and we build on KeyMorph (2023) and EquiTrack (2024), which are SOTA in keypoint-based registration.
R2: Work restricted to affine transforms and brains only We fully agree with R2 that extending our work to non-linear registration and to other organs (e.g. prostate, abdomen) would be very interesting and will be added to our future works. Nevertheless, we respectfully note that affine registration is an entire field of research in itself [7], and that testing only on brains is quite common in registration [3,4,9,38]. Moreover, we also enrich our study with non-standard analysis by testing on foetal brains, which are very different from adults (morphology, sequence, resolution, SNR).
R2: Missing significance testing All experiments already include significance testing (Tab1, Fig3).
R3: Dice metric We agree that Dice may not always be ideal to assess registration performance. Yet, consistently with the literature [2,10,13,25,38], we use Dice as a proxy when ground truth transforms are not available (like in Exp2). This point will be discussed.
R3: Using TRE for training/testing We thank R3 for the suggestion as we agree that (a) optimising TRE could be useful during late training (once the network learns to detect meaningful keypoints) and (b) computing TRE during testing could be used as a proxy to measure consistency in the keypoints. This point will be added.
Again, we thank the reviewers for their feedback and hope that our response has addressed their comments.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
In general, I am not in favor of papers that have a strong focus on an aggregate of loss functions. They tend to be hard to train and balance and even if that balance is achieved, it is hard to justify beyond manual fine-tuning. I also agree with reviewers #2 and #3 that the novelty is incremental and limited.
Having said, I also agree with reviewers #1 and #3 that this is a well-written paper as it is not often common when it comes to highly technical papers that rely on math. It is easy to follow the rationale for each choice and the results (even if they only focus on brain) are presented with a statistical analysis and clearly show why the losses work (even if I also think that further ablation results are needed). Furthermore, I think that the authors did a good job addressing the concerns of the reviewers within the framework (lack of new experiments). Therefore, all these positives, in my opinion, outweigh the negatives.
Regardless of the final decision, I would advise the authors to continue working on this project and extend it to more datasets and more comprehensive sets of experiments to truly gauge the effect of each loss.