Abstract

The adoption of visual foundation models has become a common practice in computer-aided diagnosis (CAD). While these foundation models provide a viable solution for creating generalist medical AI, privacy concerns make it difficult to pre-train or continuously update such models across multiple domains and datasets, leading many studies to focus on specialist models. To address this challenge, we propose Med-LEGO, a training-free framework that enables the seamless integration or updating of a generalist CAD model by combining multiple specialist models, similar to assembling LEGO bricks. Med-LEGO enhances LoRA (low-rank adaptation) by incorporating singular value decomposition (SVD) to efficiently capture the domain expertise of each specialist model with minimal additional parameters. By combining these adapted weights through simple operations, Med-LEGO allows for the easy integration or modification of specific diagnostic capabilities without the need for original data or retraining. Finally, the combined model can be further adapted to new diagnostic tasks, making it a versatile generalist model. Our extensive experiments demonstrate that Med-LEGO outperforms existing methods in both cross-domain and in-domain medical tasks while using only 0.18% of full model parameters. These merged models show better convergence and generalization to new tasks, providing an effective path toward generalist medical AI.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2729_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhuYit_MedLEGO_MICCAI2025,
        author = { Zhu, Yitao and Yin, Yuan and Li, Jiaming and Xu, Mengjie and Zhao, Zihao and Xiong, Honglin and Wang, Sheng and Wang, Qian},
        title = { { Med-LEGO: Editing and Adapting toward Generalist Medical Image Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {446 -- 455}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    A framework that integrates different specialized models into a general AI system through the use of a Singular Value Decomposition and Low-Rank Adaptation. The composition of a model zoo consisting of SVD-LoRAs for different medical image datasets, the Med-LEGO framework performs both cross-domain and in-domain information sharing to yield a classification for general and specialized doctors, respectively.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a way to merge specialist medical models using SVD-LoRA, which is a nice tweak on LoRA that helps keep things efficient. The idea of using SVD to help with merging is interesting and makes the process more stable. It’s a practical take on model fusion, especially in cases where datasets can’t be shared. The experiments cover a good range of medical tasks, and the method uses very few parameters, which is a plus.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The term “training-free” is misleading, as the method still requires training SVD-LoRA modules for each task.

    2. The added computational complexity of model merging offers limited practical benefit compared to simply routing to expert models via a smart API approach.

    3. The approach still relies on access to strong expert models, which are often the actual bottleneck in clinical applications.

    4. There is no evaluation on external or out-of-distribution datasets, limiting claims about generalizability or real-world applicability.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I gave this paper a lower score because, while the idea is novel and interesting, it doesn’t offer clear practical advantages. The method adds complexity and higher computational costs without showing strong enough gains over simply using expert models directly (for within-dataset). Also, the claim of being “training-free” can be misleading, since training is still required on a per-task basis. Additionally, the proposed approach might not be scalable when considering adding more modalities/datasets, as more and more SVD-LoRA input models will be needed when performing cross-domain information sharing. There’s no testing on external or out-of-distribution data, so it’s hard to tell if the approach would actually work well in real-world settings. Showcasing the performing of the proposed method against completely unseen data and investigate how an independent expert model versus the unified framework performs would assessing the framework’s contributions. Through the combination of these issues, I recommend the paper for “Weak Reject”.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    Inspired by AdaLoRA, the main contribution of this paper is the introduction of Med-LEGO, a training-free framework designed for seamlessly integrating multiple specialised medical image diagnosis models into a versatile generalist model (i.e., the well-known model merging). Specifically, the paper enhances LoRA through SVD, enabling efficient merging of model capabilities without requiring access to original training data. This reduces parameter size and allows easy updating of diagnostic capabilities. Overall, this paper explores a dedicated model merging method for MIC.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper introduces SVD-LoRA, an enhancement of LoRA that incorporates SVD, inspired by AdaLoRA, which seems a novel attempt in model merging tailored for medical imaging.

    • The evaluation is thorough, including 7 cross-domain datasets from MedMNIST, 3 in-domain chest X-ray datasets, and 3 new medical image diagnosis tasks.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • It appears that there is insufficient discussion or analysis of limitations in scenarios with severely imbalanced datasets, which is common in medical imaging.

    • Though LoRA-based techniques are well-explored, the paper would benefit from the discussion on why recent PEFT methods are not suitable for this case and why LoRA is the best fit for medical model merging.

    • Related works on model LEGO in other domains such as “Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks” are not discussed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    NA.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It appears to me the formulation of Med-LEGO and SVD-LoRA advances model merging for medical image diagnosis, though there are certain limitations as elaborated in the weakness section.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces Med-LEGO, a training-free framework designed to integrate multiple task- or domain-specific models into a more general model. Med-LEGO leverages SVD and low-rank adaptation to eliminate the need for resource-intensive full fine-tuning while delivering excellent performance. Extensive experiments validate the effectiveness and superiority of the proposed Med-LEGO framework.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The use of SVD and LoRA is logically sound and effectively addresses many of the limitations associated with the fusion of existing models.
    2. The paper includes tests on the effects of model merging across and within domain tasks (e.g., across 7 MedMNIST datasets and 3 chest radiograph datasets), providing detailed quantitative results and comprehensive visual analyses. 3.The paper is well-structured and easy to follow.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The paper lacks an in-depth analysis of SVD-LoRA. Both SVD and LoRA are inherently approximate techniques, where SVD decomposition may result in the loss of critical information for specific tasks, similar to the limitations observed with LoRA.

    2. There is insufficient theoretical analysis regarding the selection of the SVD eigenvalue threshold v. While the paper mentions retaining the first k eigenvalues that accumulate and exceed the threshold, it omits sensitivity analyses on how the choice of this threshold impacts model performance.

    3. Although the authors claim that Med-LEGO enhances task adaptation efficiency, providing a comparison of specific inference times would further substantiate this claim.

    4. The evaluation of model merging methods is limited. While the paper references some classical model merging techniques, it does not include comparisons with more recent and potentially relevant approaches, such as Adapter-based Tuning or Mixture of Experts (MoE). Including these would enhance the comprehensiveness of the evaluation.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Please see weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces Med-LEGO, a training-free framework designed to integrate multiple task- or domain-specific models into a more general model. Although the experimental results demonstrate that Med-LEGO is effective, the paper lacks an in-depth analysis of its underlying mechanism. Furthermore, the baseline evaluation for model merging is relatively limited, which weakens the robustness of the comparisons.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #4

  • Please describe the contribution of the paper

    The authors propose SVD-LoRA, a type of novel LoRA architecture that can be used for 1) training-free merge of LoRAs trained for different tasks, 2) more stable training for new tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a novel method, SVD-LoRA, a new LoRA architecture designed to: (1) enable training-free merging of LoRAs trained for different tasks, and (2) provide more stable training when adapting to new tasks. The effectiveness of the proposed method is demonstrated through comprehensive experiments.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper would benefit from a more thorough and precise description of the procedure for calculating A, B, and E given An and Bn for different tasks.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and valid idea, which is well supported by comprehensive experiments. Overall, I hold a positive attitude toward the acceptance of this paper. I suggest that the authors include a discussion comparing the proposed method with the MIXTURE OF LORA EXPERTS [1].

    [1] Wu, Xun, Shaohan Huang, and Furu Wei. “Mixture of lora experts.” arXiv preprint arXiv:2404.13628 (2024).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thanks for the valuable comments. We appreciate the recognition of our work and have addressed each of the reviewers’ concerns as follows:

  1. Imbalanced Datasets (R1-Q1): Thank you for highlighting this issue. As described in Section 3.4, the datasets used in our study exhibit substantial imbalance. For example, NIH-CXR14 contains 112,120 images, while Tuberculosis and Pneumonia contain only 662 and 5,856 images, respectively. This poses a significant challenge for fair evaluation. From Table 2, we observe that many baseline methods are skewed towards the dominant dataset, while Med-LEGO maintains consistent performance across all three, demonstrating superior robustness to dataset imbalance.
  2. Why use LoRA and More Related Works (R1-Q2, R1-Q3, R2, R3-Q4): We acknowledge the need to provide a clearer justification for our choice of LoRA and further elaborate on our proposed framework. LoRA has emerged as a widely adopted PEFT method in both LLMs and vision applications, thanks to its strong open-source support and compatibility with quantization and deployment tools. These advantages make LoRA particularly well-suited for the design of Med-LEGO and more accessible to the broader community. We will also include additional discussion of related methods in the final version of the paper. Med-LEGO is fundamentally different from MoE-based approaches, which perform merging during training. Our setting assumes access only to pretrained model weights, aligning with the assumptions of other baselines in our paper. We will highlight this distinction more clearly.
  3. Details of the Merging Process and SVD-Related Concerns (R2, R3-Q1, R3-Q2): We appreciate the request for greater technical clarity and will improve the description of the merging pipeline to enhance readability. Regarding the concern about potential information loss from SVD, our experiments (Tables 1 & 2) demonstrate that SVD-LoRA achieves performance comparable to both standard ViT fine-tuning and conventional LoRA. The impact of truncation is primarily governed by the eigenvalue threshold v, which represents the cumulative percentage of the top-k eigenvalues. We found that setting v to 99.5% (resulting in a low rank r of approximately 8) strikes a favorable balance between accuracy and parameter efficiency. While v is indeed a tunable hyperparameter, due to space limitations, a more comprehensive exploration and ablation study are left for future work.
  4. Task Adaptation Efficiency and Clarification on “Training-Free” (R3-Q3, R4-Q1): Thank you for the helpful suggestions. We will include inference time in the final version. Our framework focuses on merging pretrained experts without requiring access to the original training data or additional retraining, making it extremely lightweight and suitable for rapid deployment. The term “training-free” specifically refers to this post-hoc merging process, and we will clarify this more explicitly.
  5. Deployment and Expert Sharing (R4-Q2, R4-Q3): While using APIs to query individual experts is a practical solution, in clinical settings where data privacy and reliability are critical, a unified offline model is more acceptable. Our approach allows expert models to be represented using SVD-LoRA weights rather than raw datasets, making it easier to share models without compromising patient privacy. This facilitates open collaboration and enables users to assemble custom expert systems suited to their specific needs.
  6. Lack of Evaluation on External Datasets (R4-Q4): Our goal with Med-LEGO is to enable the unification of specialized expert models into generalized models without requiring large-scale pretraining or access to raw data, all in a training-free manner. We evaluated generalization on three additional datasets, achieving very strong performance, as shown in Figure 2. In future work, we plan to include a broader range of datasets for further evaluation.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The paper proposes SVD-LoRA, an enhancement to LoRA using singular value decomposition for efficient low-rank adaptation in medical vision-language tasks, and introduces Med-LEGO as a training-free evaluation benchmark. Reviewers highlight the technical soundness, novelty, and practical relevance of the proposed method, especially its modularity and improved parameter efficiency. While the paper would benefit from more detailed analysis and better reproducibility documentation, these issues are minor. Given its originality and strong contribution to the field, I recommend early acceptance.



back to top