Abstract

Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical applications. These models must be regularly updated to integrate new medical data and customized to meet evolving diagnostic standards, regulatory requirements, commercial needs, and privacy constraints. Model merging offers a promising solution, as it allows working with multiple specialized networks that can be created and combined dynamically instead of relying on monolithic models. While extensively studied in standard 2D classification, the potential of model merging for 3D segmentation remains unexplored. This paper presents an efficient framework that allows effective model merging in the domain of 3D image segmentation. Our approach builds upon theoretical analysis and encourages wide minima during pre-training, which we demonstrate to facilitate subsequent model merging. Using U-Net 3D, we evaluate the method on distinct anatomical structures with the ToothFairy2 and BTCV Abdomen datasets. To support further research, we release the source code and all the model weights in a dedicated repository: https://github.com/LucaLumetti/UNetTransplant

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0752_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/LucaLumetti/UNetTransplant

Link to the Dataset(s)

BTCV Abdomen: https://www.synapse.org/Synapse:syn3193805/wiki/217753 ToothFairy2: https://ditto.ing.unimore.it/toothfairy2/ AMOS: https://zenodo.org/records/7262581 ZhimingCui (Available upon request from the authors): https://www.nature.com/articles/s41467-022-29637-2

BibTex

@InProceedings{LumLuc_UNet_MICCAI2025,
        author = { Lumetti, Luca and Capitani, Giacomo and Ficarra, Elisa and Calderara, Simone and Grana, Costantino and Porrello, Angelo and Bolelli, Federico},
        title = { { U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {626 -- 636}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper provide 1) an extensive analysis of model merging for 3D segmentation based on well-known medical datasets, revealing that combining task vectors is a flexible method for customizing models without re-training, 2) offer both theoretical and empirical validation showing how a base model with a flat loss landscape enhances model merging, 3) alongside the source code, model’s weights are publicly released to facilitate research

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This study demonstrates how specialized models for segmenting different anatomical structures can be effectively consolidated into a single unified model capable of performing all original tasks. The proposed approach utilizes task vectors and promotes wide minima during pre-training to enhance the efficacy of model merging in 3D medical image segmentation. The ability to combine model capabilities without re-training would enable dynamic, client-specific software customization, thereby accelerating deployment and offering greater flexibility.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limited novelty: The innovation of this paper looks like merely applying some existing methods for 2D scenes to 3D scenes and was experimented and analyzed on 4 medical datasets. This paper does not clearly articulate how the proposed merging framework for 3D medical segmentation differs from existing methods for the 2D domain. In addition, it lacks analysis and improvements for the challenges specific to 3D segmentation.
    2. Limited methodology and evaluation: The study is exclusively conducted on U-Net 3D (2016) and evaluated on only two datasets (ToothFairy2 and BTCV) featuring distinct anatomical structures. The lack of extensive validation performed on architectures and datasets raises concerns about the validity of the proposed method.
    3. Overlooked advancements in pre-training: The paper states (p.3, last paragraph): “While 2D image classification tasks benefit from various pre-trained models (e.g., CLIP and DINO), 3D medical segmentation lacks similar pre-trained models.” However, recent advances—such as SAM-Med3D (2023) and SegVol (2025)—leverage 3D ViT-based pre-trained models for 3D segmentation, which should be acknowledged and discussed.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of clarity in claimed contributions: The abstract states, “This paper presents an efficient framework that allows effective model merging in the domain of 3D image segmentation.” However, the paper fails to describe the specific innovations of the proposed framework for 3D medical segmentation tasks.
    2. Narrow experimental validation: The experiments are solely based on the U-Net 3D (2016) architecture, examining the impact of two distinct pre-training regimes (stable/wide minima vs. plastic/sharp minima) on model merging. The experiments conspicuously overlooks recent advancements in the field (e.g., SAM3D, SAM-Med3D). Such limitations raise significant concerns about both the generalizability and the state-of-the-art relevance of the proposed method.
    3. If the innovation of this paper is merely applying existing 2D scene methods to 3D scenes and conducting experimental analysis on 3D U-Net, then it lacks novelty.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I agree with the research contributions clarified by the authors, but the Abstract and Introduction require careful revision. These sections include too much research background (the first two sentences of the abstract and the first three paragraphs of the introduction) while lack description of the innovativeness of this work.

    The Introduction requires modifications to better align with and support the Abstract.

    1. A presentation of existing methods that avoid ‘full retraining’, such as incremental learning (the abstract refers to this as “regularly updated” while paragraph 4 calls it the “full retraining” issue - these terms should be consistent);
    2. An explanation of the remaining challenges in current approaches and a justification for why model merging represents a promising solution, including its specific advantages;
    3. Clear emphasis on this study’s unique contributions, rather than what has been demonstrated in previous studies, “specialized models for segmenting different anatomical structures can be successfully merged into a single model capable of performing all original tasks” (from paragraph 6);
    4. An explanation for why model merging has been extensively studied in standard 2D classification but remains unexplored in 3D segmentation. While the authors attribute this to “the lack of similar pre-trained models” in the “Research Question” section, this perspective overlooks recent advancements in the field (e.g., SAM3D, SAM-Med3D).

    While the paper has significant issues in highlighting its contributions, the “Framework” section provides a comprehensive methodological discussion. Although the experimental section shows some limitations, the work’s contributions to 3D segmentation merit serious consideration.



Review #2

  • Please describe the contribution of the paper

    The paper investigates the role of the pre-training regime for deep learning models to facilitate effective model merging in the context of 3D medical image segmentation. The main contribution is demonstrating that pre-training strategies encouraging “wide minima” (“stable” regime) in the loss landscape lead to base models whose fine-tuned, task-specific versions can be more effectively merged using task vector arithmetic, compared to pre-training strategies leading to sharper minima (“plastic” regime). It presents this as the first analysis of model merging specifically for 3D segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty of Problem Formulation for 3D Segmentation: While model merging and task vectors have been explored in 2D classification, this paper proposes a novel approach for their application and analysis specifically for 3D medical image segmentation.

    Focus on Pre-training’s Role: The core contribution lies in identifying and investigating how pre-training influences subsequent model merging. Linking the geometry of the loss landscape (wide vs. sharp minima) of the pre-trained model to merging effectiveness is a novel and insightful contribution.



    Strong Empirical Validation: The hypothesis is well-supported by experiments on two distinct clinical domains: abdominal CT - AMOS/BTCV, maxillofacial CBCT - Cui/ToothFairy2. The results consistently show superior merging performance when using the “stable” pre-training regime across different merging strategies and task combinations.

    Practical Relevance & Simplicity: The proposed method to achieve wider minima relies on simple modifications to standard training hyperparameters (ex.: batch size, dropout, learning rate), making it potentially easy to adopt. The findings have direct implications for building more flexible and maintainable medical AI systems.


    Reproducibility: The authors commit to releasing the source code and model weights which will be beneficial for the community.


  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Performance Gap: While the stable pre-training significantly improves merging, Table 3 shows there can still be a considerable performance gap compared to fine-tuning the pre-trained model jointly on all tasks (e.g., Kidney+Stomach in BTCV, several pairs in ToothFairy2). This suggests limitations in the current merging approach, especially when tasks exhibit higher variability or interference.

    Limited Scope of Merging: Experiments are conducted by merging only two or four tasks. It’s unclear how performance degrades as more task vectors are combined.


    Prior work cited in this paper use SGD / Stochastic Weight Averaging for obtaining wider minima- was this explored during the experiments and why was AdamW used eventually for the experiments ?

    How does the pipeline handle any potential issues with the datasets imbalance / class imbalance ?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Page 3: just below equation (1)- correct mantains to ‘maintains’

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The core contribution of the paper– investigating the role of pre-training minima width for model merging in 3D segmentation – is novel and clinically relevant. The theoretical motivation combined with strong empirical results across the selected datasets demonstrating the benefit of the “stable” pre-training regime are significant strengths. The authors also provide the source code and the model weights. However, the acceptance recommendation is “Weak Accept” primarily due to:

    • The persistent, sometimes large, performance gap compared to joint training, raising questions about the practical limits of the merging approach as presented. - The limited scale of the merging experiments (up to 4 tasks). These points suggest that while the core idea is promising and well-supported, its current demonstrated effectiveness needs further improvement. The novelty and the clear demonstration of the pre-training effect make it a valuable contribution worth considering for the conference. Based on the rebuttal by the authors, I may consider changing my rating.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I thank the authors for their comments. While the authors have addressed my comments in their rebuttal, the paper in its original form has a few missing points indicated earlier and I hope the final version is more polished and includes the clarifications. My final decision is for this paper to be accepted.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel framework for model merging in 3D medical image segmentation. The authors analyze how the curvature of the loss landscape during pre-training affects the success of merging multiple task-specific models. They demonstrate that training a base U-Net model to converge to wide minima improves merging performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Addresses a real-world challenge of model lifecycle and update in clinical environments.
    2. Strong second-order loss landscape analysis that links wide minima to mergeability.
    3. Modifying batch size, dropout, and learning rate during pre-training to induce flat minima is a practical, low-cost approach.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The merging approach relies on averaging task vectors, which may oversimplify the problem for complex tasks with significant interference.
    2. The concept of using flat minima is not novel and has been explored in other contexts; its adaptation to 3D segmentation is the novel part.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    This work highlights a practical and underexplored area in medical imaging. While the focus on training regime is appreciated, it would be valuable to extend the framework beyond simple averaging (e.g., adaptive weighting).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel application of wide minima training to facilitate model merging in 3D medical segmentation. The idea is simple but effective, and the theoretical analysis is strong. Despite relying on task vector averaging, the results show clear benefits, and the code release supports impact and reproducibility.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    After reading the rebuttal, I maintain my positive opinion of the paper. The authors clearly articulated the novelty of their contribution, specifically the theoretical and empirical analysis of pre-training regimes and their effect on model merging in 3D medical image segmentation. They appropriately clarified that their work is not a simple extension of 2D methods but rather the first to explore how pre-training impacts mergeability in the 3D domain. The discussion on the choice of backbone, merging strategies, and optimization was also reasonable and aligned with the scope of the paper.




Author Feedback

  • identifies local citations.

R3 - Novelty We respectfully believe that the core contribution of our work may have been misunderstood by R3, and we welcome the opportunity to clarify it.

Our contribution is not merely an application of existing 2D methods to the 3D scenario. As recognized by R1 and R2, we identify and address a clear gap in the literature: the impact of pre-training regimes on post-finetuning model merging, a topic that has not previously been explored in any context, either in 2D or 3D image segmentation. Moreover, we go beyond empirical results by grounding our work in theoretical analysis. Given the absence of existing works on 3D model merging, verifying our claims required us to rely on established methods and techniques from the well-developed literature on model merging in the 2D domain. However, we explicitly clarify that the application of these existing approaches itself is not claimed as part of our main contribution. We will revise the paper to make it even more explicit.

That being said, we fully acknowledge that exploring model merging strategies tailored to 3D segmentation is indeed valuable, given the challenges posed by the computational complexity of 3D architectures, the anisotropy inherent in medical imaging volumes, and the use of diverse loss functions. Nevertheless, this particular research extends beyond the intended scope of our current study. However, considering its potential impact, our paper provides a robust foundation for future studies that want to focus on the peculiarities of 3D model merging. In case of acceptance, researchers could leverage our publicly available pre-trained models and codebase, along with our established experimental protocols and findings from baseline 2D approaches.

R3 - U-Net 3D Although U-Net 3D appeared in 2016, it continues to set the benchmark for medical image segmentation and is the main protagonist of most of the latest MICCAI/CVPR medical segmentation challenges, thus definitely being relevant as a state-of-the-art case study [1-3]. Moreover, our implementation does not use exactly the original U-Net implementation but rather the residual variant (check source code), which has become one of the standard community references. While additional models could add value, the theory we formulated is architecture-agnostic, a point we consider a strength. We chose to focus on a widely adopted backbone primarily for generality.

R1, R2 - Performance Gap and Advanced Merging As the number of task vectors increases, we observe that performance generally degrades, following a trend consistent with previous papers [10, 25]. However, our recent analyses (here omitted to comply with MICCAI rebuttal policy) reveal that leveraging more advanced merging approaches [4*], in combination with our flatness-based pre-training strategy, further reduces task interference and bridges the gap with the upper bound (joint training), even in settings involving more and more tasks. In this respect, our focus in this paper is on the pre-training stage and settling strong foundations regarding the first step of the whole pipeline. We plan to submit an extension where we incorporate advanced merging strategies.

R1 - Optimization and Class Imbalance In early experiments, both SGD and AdamW yielded similar training regimes/performance, with AdamW converging faster. While our pipeline may handle class/dataset imbalance by training each task vector independently, we did not evaluate its robustness under extreme imbalance. Investigating this scenario remains a promising direction for future work.

References [1] nnU-Net Revisited: A Call for Rigorous Validation in 3D Medical Image Segmentation. MICCAI 2024. [2] Scaling nnU-Net for CBCT Segmentation. Winner of the ToothFairy2 Challenge 2024. [3] BraTS-PEDs: Results of the BraTS Challenge 2023. CoRR 2024. [4] Task singular vectors: Reducing task interference in model merging. CVPR 2025.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    Given the mixed assessments, the paper would benefit from an author rebuttal to address the concerns about novelty and experimental validation.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The three reviewers recommend acceptance and praise the merits of the paper.



back to top