Abstract

Medical images are often acquired in different settings, requiring harmonization to adapt to the operating point of algorithms. Specifically, to standardize the physical spacing of imaging voxels in heterogeneous inference settings, images are typically resampled before being processed by deep learning models. However, down-sampling results in loss of information, whereas upsampling introduces redundant information leading to inefficient resource utilization. To overcome these issues, we propose to condition segmentation models on the voxel spacing using hypernetworks. Our approach allows processing images at their native resolutions or at resolutions adjusted to the hardware and time constraints at inference time. Our experiments across multiple datasets demonstrate that our approach achieves competitive performance compared to resolution-specific models, while offering greater flexibility for the end user. This also simplifies model development, deployment and maintenance. Our code will be made available at \url{https://github.com/anonymous}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2109_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2109_supp.pdf

Link to the Code Repository

https://github.com/ImFusionGmbH/HyperSpace

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Jou_HyperSpace_MICCAI2024,
        author = { Joutard, Samuel and Pietsch, Maximilian and Prevost, Raphael},
        title = { { HyperSpace: Hypernetworks for spacing-adaptive image segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose the use of hypernetworks to generate the weights of a UNet and condition this process on the voxel spacing. This approach allows to process images at their native resolution. The experiments show that the proposed method achieves competitive performance across multiple datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of conditioning a hypernetwork on the voxel spacing to generate weights for a Unet is novel.
    • The experiments overall show benefits in cases of extreme variations in scale
    • I do think it is a neat idea to generate a Unet based on the computational requirements.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The novelty of the paper is very limited as each component of the paper has already been introduced. The main novelty is the conditioning on scale, which is an interesting finding.
    • The proposed method still shows significant drops in accuracy even for known resolutions when the spacing becomes small. See Figure 2. On the other hand generalisation for unseen scales is also not necessarly more robust than the AugmentSpacing method and depends on the dataset.
    • While the idea of generating the Unet for computational requirements this should have been properly analysed also in terms of computational demands to generate said Unet. This could have been an important contribution of this paper.
    • Table 2 BRATS lacks 1 result row
    • The benefit over using the standard fixed resolution setup is not really clear as in most cases performance is similar or worse. In the last case of MM-WHS there is an improvement with the proposed method, but it is not clear how the fixed resolution was chosen. It might just be a badly chosen scale.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I do not see any difficulties in reproducing the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Every section should start with an introductory phrase and not directly with a subheading
    • Highlight the best results in tables. Also arrows pointing out the direction of improvement helps readability
    • Bold and emph would improve readability
    • The contribution list belongs into the introduction, not the related work section
    • Figure 1 text is too small
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While it is an interesting approach to condition a Hypernetwork on the voxel spacing of an image, the authors experiments do not show a clear benefit over simply using a properly chosen fixed scale model. This paper requires improvements in terms of method and evaluation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    My initial critics still stand. The paper makes a lot of strong claims without experiments proving these claims. While the authors approach an important issue of adaptive models that can run on minimal hardware this hasn’t been evaluated in enough detail. What does this actually mean in terms of actual hardware requirements? This issue is further underlined by not providing the actual hardware the experiments have been run on.

    A reduction in GPU requirements is important, but when this comes with a noticeable drop in performance, compared to a standard UNet that is far from state-of-the-art and lacking further baselines, then it is not sufficient. Especially with such limited novelty, the results and goal should be clear.



Review #2

  • Please describe the contribution of the paper

    This paper tackles the lack of robustness of segmentation networks against images of variable resolutions. The authors propose to use a HyperNetwork, where an auxiliary MLP predicts the weights of a segmentation UNet conditionally on the resolution of the input image. This strategy is evaluated on three datasets, where it shows similar results as a network trained and tested at fixed resolution, but with lower inference computation cost.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper tackles an interesting problem, as CNNs are known to be fragile against resolution variability.
    • Using a HyperNetwork to tackle this problem is a simple idea, but it is well justified here.
    • The proposed method is evaluated on 3 different datasets, with appropriate baselines.
    • I like the last experiment, which shows the similarity of learned representations across the resolution range.
    • The paper is clearly written.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • many claims are over-statements, especially since the proposed method does not outperform the competing baselines in most experiments.
    • clear lack of information about the methods, architectural design, and training procedure.
    • some points need to be discussed (dependence on correct image headers, advantage/drawbacks of segmenting at different resolutions).
    • light literature review.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    All training and architectural details will be given in the code, but I don’t think this is good practice. The code and training procedure are two separate things, and the former should not replace the latter.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major:

    • The proposed method only obtains the best results in 1 experiment out of 3. This is not a problem (especially given the properties shown in 4.3), but I strongly encourage the authors to soften their claims, most of which are over-statements, like “the hypernetwork yields a more robust performance across the expected resolution interval”, whereas most curves are overlapping.
    • The authors insist on computational gains during inference, but conveniently leave out training times, which are much longer for HyperNetworks [1]. This should be made clear in a comparison of training/testing time/memory usage.
    • The resampling procedure is not explained, despite being central in training (augmentation) and testing (generation of ground truths at different resolutions, input/output resampling). As previously shown in [2], this aspect is not trivial and should be carefully designed.
    • Similarly, there is a lack of architecture and training information. These details are especially important for HyperNetworks, which are known to be long and unstable to train [1].
    • I think the literature review is a bit light. The paper fails to mention Bayesian methods that tackled resolution variability [3]. Then, the FS method is directly inspired from SynthSeg [2], which is the state-of-the-art in robustness to resolution and should be cited. Finally, there is a rapidly growing body of work on HyperNets for segmentation [3,4, and many others], none of which is cited.
    • The authors should discuss the dependence of their method on correct information about input resolution, since image headers are often incorrect, especially in the clinic.
    • Sometimes, it can be interesting to obtain segmentations all at the same resolution (e.g. research studies), regardless the resolution of the inputs. In this scenario, it is better to resample the images before prediction, rather than resampling the segmentations with nearest neighbours. This could explain why FS is often better than the HyperNetwork.
    • “performances collapse on the highest resolutions due to the primary network’s architecture being too shallow”. I think the problem is not the number of layers/kernels, but the field of view of fixed-size kernels, which are not large enough to include global context at increasing resolutions. In the future, it could be interesting to incorporate kernel of adaptive sizes [4,5].

    Minor:

    • “Neural network processing has become a standard in clinical settings.” This is very debatable, since almost no deep learning method is currently used in the clinic due to their fragility to clinical data variability. The authors should rephrase to better highlight that their paper is actually tackling this problem.
    • “scale-equivariance is not always desirable, as the structure size can be an essential feature.” The authors are mixing equivariance and invariance in this sentence. A scale-equivariant network would preserve the size information.
    • The figure fonts are very small.
    • The caption of Figure 3 is hard to parse. There’s an accent typo in the caption of Figure 3.

    [1] Ortiz, Guttag, Dalca. Magnitude Invariant Parametrizations Improve Hypernetwork Learning. ICLR, 2024. [2] Billot, Greve, Puonti, Thielscher, Van Leemput, Fischl, Dalca, Iglesias. SynthSeg: Segmentation of brain MRI scans of any contrast and resolution without retraining. Medical Image Analysis, 2023. [3] Van Leemput, Maes, Vandermeulen, Suetens. A unifying framework for partial volume segmentation of brain MR images. IEEE transactions on medical imaging, 2003. [4] Ma, Dalca, Sabuncu. Hyper-Convolution Networks for Biomedical Image Segmentation. Winter Conference on Applications of Computer Vision, 2022. [5] Romero, Bruintjes, Tomczak, Bekkers, Hoogendoorn, van Gemert. FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes. ICLR, 2022.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the proposed method has merits and is well evaluated, but the authors need to soften their claims and to provide more details.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I’m happy with the reviewers’ responses, who have answered most of my comments, especially about adding important discussions and softening their superiority claims. I would still like them to include reference [2] SynthSeg, which is SOTA in resolution-agnostic segmentation and very relevant to this work, where a segmentation network is trained by augmenting the resolution of the input images, and test scans are resampled to a fixed resolution r (sort of a mix between FS and AS). Finally, I emphasise that I respectfully disagree with reviewer 4 about novelty. I think this work builds on HyperNetwork to propose a novel way of tackling variability in resolution. Overall, I think this paper will foster interesting discussions at MICCAI.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel method for segmenting images at their native resolution, which eliminates the need for resampling, optimizes information usage, and reduces computational demands. The approach utilizes a hypernetwork that adapts the weights of a segmentation U-Net based on the spatial spacing of the image, allowing for resolution-specific processing. The performance of this method is evaluated across different datasets and segmentation tasks, demonstrating robustness and comparability to traditional models trained at fixed resolutions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Robust Performance Results: The newly proposed method delivers exceptionally strong results, showcasing its effectiveness in handling image segmentation at native resolutions. This highlights its practical applicability and potential to advance current methodologies.

    2. Comprehensive Analysis and Novelty: The paper provides an in-depth analysis of the proposed framework, offering substantial insights into both computational and architectural aspects of medical image segmentation. The conclusions drawn are well-founded, emphasizing the innovative approach of the method.

    3. Insightful Computational Analysis: There is a significant focus on the computational implications of the new framework, which is seldom addressed in medical imaging literature. The analysis of computational needs and the framework’s efficiency provides crucial insights that are valuable for the field, suggesting ways to optimize resource usage without compromising performance.

    4. Clarity and Thoroughness in Background Review: The context and related work are described with exceptional clarity, enhancing the readability of the manuscript. This thorough review not only situates the work within the existing body of research but also underscores its contribution to advancing medical imaging technologies

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This manuscript is free of major weeknesses.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors mention in the manuscript that the codebase will be provided in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the manuscript is strong and free of major weaknesses. I have a few minor suggestions that could further enhance its clarity and impact:

    1. Explanation of CKA Metric: I recommend including a brief, less technical description of the CKA (Centered Kernel Alignment) metric. This would provide readers with a clearer understanding of how this metric is used to assess the internal structure of networks, thereby increasing the paper’s readability for a broader audience.

    2. Figure Font Size: To improve readability, consider increasing the font size in the figures. This adjustment would make the text more legible, especially for those viewing the paper in digital formats.

    3. Discussion on Expansion to Other Network Types: In the results and conclusions section, you discuss potential future expansions to explore the framework’s applicability to nnUNet. Given the popularity of Vision Transformers and VAE-based segmentations in recent years, it would be beneficial to include your insights on the feasibility of applying this framework to these and other modern network types. This addition would not only broaden the scope of the discussion but also align your findings with current trends in the field.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper stands out for its introduction of a novel method for image segmentation at native resolutions, which shows robust performance and does not require resampling. The detailed analyses of the computational and architectural aspects are both comprehensive and insightful, addressing often overlooked elements in medical imaging research. Additionally, the manuscript is clear in presenting related work, enhancing its readability and impact. These strong points make this paper a significant contribution to the field, fully deserving of a strong accept recommendation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    Authors have exhaustively addressed reviewer concerns further solidifying this work. Thus I uphold my initial Strong Accept rating.




Author Feedback

We sincerely appreciate the reviews of all 3 reviewers and firmly believe that the core scientific contributions of our work are of interest to the MICCAI community and that they can be effectively communicated through minor edits.

RESULTS:

Regarding R4’s concern that “experiments do not show a clear benefit over simply using a properly chosen fixed scale model”: We focus on controlling and reducing resource requirements dynamically at inference time without quality degradation. Using HS allows adapting the inference resolution to the hardware, time, or image requirements without deterioration in segmentation quality for a large resolution range and without having to a priori choose a fixed resolution for training.

As requested by R3, our claims regarding relative segmentation quality are now softened to that all 3 methods FS, AS, and HS yield similar performance on a large central section of the considered intervals. Yet, we note that HS is more robust than AS on each end of the resolution interval, particularly on the BRATS and MM-WHS datasets.

Regarding the performance drop at higher resolutions mentioned (R3, R4), we clarified that this (already stated) limitation can also be mitigated by resampling the data to a lower resolution, as it is done by FS. We thank R3 for their input, note that receptive fields can be extended via increased depth and/or kernel size, and added references [4,5] as future research avenues.

All reported inference time and GPU memory numbers cover the full pipeline, for the hyper-network including the forward pass and UNet instantiation for the HS evaluation. We believe this is the analysis mentioned by R4 that “could have been an important contribution of this paper.”. We further clarified this at the end of section 4.2.

Training HS VS UNET: We thank R3 for the reference on HN’s slower convergence. Yet, in our experiment, all networks converged within the same order of magnitude of steps. We hypothesize that the structure shown by the CKA analysis allows a convergence at a similar rate. Also, the HN adds negligible memory or wall time overhead to the UNet training. We added this discussion at the end of section 4.3.

NOVELTY:

Regarding limited “…novelty of the paper…” (R4.6), we do not introduce a fundamentally new concept but design a novel, simple, and effective solution to a rather overlooked yet relevant topic to the MICCAI community. We believe that our paper furthermore contributes:

  • A method for distributing a non-discrete model generator that can be rapidly and smoothly adapted to the compute requirements and constraints.

  • The experiments performed on 3 diverse datasets were conducted without extensive hyper-parameter tuning and using standard training practices. This shows that the simplicity of the training of UNets translates to hypernetworks (that are sometimes very hard to train as mentioned by R3).

  • Adapting the network to the data at inference comes at a negligible cost, dwarfed by potential savings at coarser resolutions.

  • As pointed out as a strength of the paper by R4/5, the CKA analysis of HS’s output space provides interpretable insights allowing a deeper understanding of internal representations in UNets.

REFERENCES:

We thank R3 for providing several relevant additional citations. We added [3,4] in the related works section and [5] when mentioning future works.

REPRODUCIBILITY:

R3 expressed some concerns regarding the reproducibility of our results which we consider very seriously. We note that the training parameters (including optimizer, scheduler, number of steps) and UNet architectures are identical across experiments and are very standard. We added further details on the training procedure and architectures in section 4.1. We will provide full implementation details in the public repository, sufficiently documented for reproducing our findings.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    R4 maintains some critiques post-rebuttal, including that of limited novelty and recommends reject. However, two reviewers disagree and are enthusiastic about this paper post-rebuttal. This paper presents an interesting idea that could generate lively discussion at the conference.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    R4 maintains some critiques post-rebuttal, including that of limited novelty and recommends reject. However, two reviewers disagree and are enthusiastic about this paper post-rebuttal. This paper presents an interesting idea that could generate lively discussion at the conference.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The hypernetworks proposed in this paper have the potential to stimulate meaningful discussions within the MICCAI community, particularly regarding the prevalent issue of spacing in practice. Reviewer 4 has raised concerns about the balance between performance and GPU cost, especially given the paper’s strong claims about reducing GPU requirements. It is recommended that these claims be tempered to reflect a more balanced view. Despite this, the paper demonstrates significant merit and has received strong support from other reviewers. Therefore, I recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The hypernetworks proposed in this paper have the potential to stimulate meaningful discussions within the MICCAI community, particularly regarding the prevalent issue of spacing in practice. Reviewer 4 has raised concerns about the balance between performance and GPU cost, especially given the paper’s strong claims about reducing GPU requirements. It is recommended that these claims be tempered to reflect a more balanced view. Despite this, the paper demonstrates significant merit and has received strong support from other reviewers. Therefore, I recommend accepting this paper.



back to top