List of Papers Browse by Subject Areas Author List
Abstract
Accurate lymph node (LN) segmentation is critical in radiotherapy treatment and prognosis analysis, but is limited by the need for large annotated datasets. While deep learning-based segmentation foundation models show potential in developing high-performing models with fewer samples, their medical adaptation faces domain-specific prior deficiencies in the LN domain and inefficient few-shot fine-tuning for complex clinical practices, highlighting the necessity of an LN segmentation foundation model. In this work, we annotated 36,106 visible LNs from 3,346 publicly available head-and-neck CT scans to establish a robust LN segmentation model (nnUNetv2). Building on this, we propose Dynamic Gradient Sparsification Training (DGST), a few-shot fine-tuning approach that preserves foundational knowledge while dynamically updating the most critical parameters of the LN segmentation model with few annotations. We validate it on two publicly available LN segmentation datasets: SegRap2023 and LNQ2023. The results show that DGST outperforms existing few-shot fine-tuning methods, achieving satisfactory performance with limited labeled data. We release the dataset, models and all implementations to facilitate relevant research: https://github.com/HiLab-git/LN-Seg-FM.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0605_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/HiLab-git/LN-Seg-FM
Link to the Dataset(s)
N/A
BibTex
@InProceedings{LuoZih_Dynamic_MICCAI2025,
author = { Luo, Zihao and Gao, Zijun and Liao, Wenjun and Zhang, Shichuan and Wang, Guotai and Luo, Xiangde},
title = { { Dynamic Gradient Sparsification Training for Few-Shot Fine-tuning of CT Lymph Node Segmentation Foundation Model } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15964},
month = {September},
page = {164 -- 174}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes a method called Dynamic Gradient Sparsification Training (DGST) for few-shot fine-tuning of a foundation model for CT lymph node segmentation. The method dynamically selects a small subset of model parameters to update during training based on their gradient magnitude, aiming to preserve the stability of the foundation model while adapting to new data with limited annotations.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper studies few-shot lymph node segmentation in CT images, which is a clinically meaningful and practically important problem.
- The authors provide a valuable dataset for the community.
- The experiments include comparisons with several strong parameter-efficient fine-tuning baselines show good performance of the proposed method.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The methodology part doesn’t clearly state the own contribution of this work. Training DNNs with sparse training is not an original idea and there are existing works (such as SIFT published at ICML’24). This paper should clearly state the differences from existing methods and what motivates the new design.
- The paper studies few-shot settings, but the experimental setting used 8:2 training-test split. It’s weird as we don’t need so many training examples. They should use more test data to increase the stability of evaluation.
- Furthermore, the results in Table 1 are highly unstable as the deviation between different random experiments is significantly higher than the improvements in average. So the results are not convincing enough to verify the method’s advantages.
- In the ablation study (Figure 3), does “Full” represent a maximum value of gamma? If so, I’m wondering what the exact value for “Full” is. The figure shows performance dropping when increasing gamma, and the method is outperformed by “Full” when gamma reaches 10. I’m wondering if the method performs even worse with a further increased gamma.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While I appreciate the contribution of releasing a large annotated lymph node dataset, the paper lacks methodological novelty and does not clearly explain how it differs from existing sparse fine-tuning methods like SIFT. The experimental setup is not ideal for few-shot learning, and the reported results are too unstable to support the claimed advantages. These concerns limit the strength of the paper and need to be addressed in rebuttal.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Authors’ rebuttal addressed most of my concerns.
Review #2
- Please describe the contribution of the paper
The paper proposes Dynamic Gradient Sparsification Training (DGST), a novel few-shot fine-tuning method for lymph node (LN) segmentation. By dynamically updating only key parameters based on gradients, DGST effectively balances model stability and adaptability. Experiments on two downstream tasks demonstrate that DGST outperforms existing methods. The authors also release a large, annotated dataset of lymph nodes.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The writing is clear. This study introduces DGST, a novel few-shot fine-tuning strategy that selectively updates the most critical parameters based on gradient information at each training iteration. Furthermore, it presents the first large-scale annotation of 36,106 visible lymph nodes on publicly available head-and-neck CT scans, thereby establishing a representative foundation model for lymph node segmentation.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Although the efficiency advantages of DGST are emphasized, no detailed comparison of time cost or resource usage (e.g., GPU memory and computational complexity) is provided, leaving its “resource-friendly” nature merely theoretical. To strengthen its contribution, the paper is encouraged to include more details about the annotation process of the dataset, such as statistical summaries and visualization examples, to better demonstrate its originality.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Although the method demonstrates certain improvements, its contributions remain marginal and lack substantial novelty. Furthermore, some important details are missing.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
I have carefully reviewed the authors’ responses as well as the comments from all reviewers. As Reviewer #1 pointed out, the downstream datasets used in the paper, SegRap2023 and LNQ2023, have relatively small sample sizes. Furthermore, the authors adopt an 80:20 split between training and testing data. The authors claim that the large deviations mainly stem from inter-patient heterogeneity. However, it remains unclear how they can demonstrate the stability of evaluation results if this factor is controlled. I believe the authors should revise the experimental setup accordingly and report statistical measures of uncertainty, such as P-values. Moreover, the contribution of the paper appears limited. The authors acknowledge that the differences in GPU memory usage and computational complexity are minimal, and attribute these to the use of the nnUNet framework rather than to the proposed method — an explanation I do not agree with. In addition, the training time performance is not particularly efficient. These points collectively fail to support the claimed efficiency advantages of the proposed DGST method. In conclusion, I maintain my recommendation to reject.
Review #3
- Please describe the contribution of the paper
The paper proposes Dynamic Gradient Sparsification Training (DGST), a few-shot fine-tuning method that dynamically (per iteration) updates high-gradient parameters to balance stability and adaptability in segmentation foundation models. Additionally, the authors annotated 36,106 lymph nodes across 3,346 CT scans and commit to releasing this dataset upon acceptance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- DGST: Introduces per-iteration dynamic parameter selection strategy, where only the highest-gradient parameters within each convolutional kernel are updated.
- Creation of a Large, Task-Specific Annotated Dataset: The authors annotated 36,106 visible lymph nodes across 3,346 CT scans from the RADCURE dataset.
- Demonstrated Clinical Relevance and Transferability: The foundation model and DGST are validated on two public datasets—SegRap2023 (in-domain) and LNQ2023 (cross-domain)—showing strong transferability and potential for clinical deployment with minimal annotation effort.
- Strong Experimental Evaluation in Realistic Few-Shot Scenarios: Comprehensive comparisons with full fine-tuning and multiple PEFT methods across multiple shot counts, including ablation studies and sensitivity analysis, convincingly demonstrate the effectiveness and efficiency of DGST.
- The authors promise to release the full dataset and implementation.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
This work does not exhibit any major weaknesses. However, the dataset annotation process is insufficiently described, with no details on annotation protocol, expertise involved, or inter-rater validation.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper proposes a “per-iteration” gradient-based sparsification training strategy (building on prior work by Zhang et al.). It is well-validated on two open-access datasets and strengthened by the planned release of a large, clinically valuable annotated lymph node dataset, though details of the annotation process are not provided.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all reviewers for their valuable and insightful reviews. They described our work as “clinically meaningful” (R1&3), with “comprehensive experiments” (R1&3), improvements over baselines (R1&2&3), highlighted our contributions in introducing a novel few-shot fine-tuning method (R2) and releasing a large, valuable annotated LN dataset (R1&2&3). Here we address their main concerns:
- Clarify on our contribution Our contribution can be summarized as two-fold:
- We have annotated and prepared for open-source release 36,106 visible LNs annotations from 3,346 publicly available HN CT scans.
We have proposed a DGST method, specifically designed for UNet-like architectures, to address the challenges of few-shot learning in medical image segmentation.
Novelty of DGST (R1) While sparse training of DNNs is not new, task-specific medical models must adapt to highly heterogeneous clinical scenarios, where critical parameters vary across cases. Static methods, such as SIFT and GPS by Zhang et al. [25], are inadequate in this context. These methods select parameters statically from the first batch or accumulated gradients, globally assessing the criticality of model parameters, which may fail to capture all patient- or disease-specific update requirements. While the SGST in Table 2 is similar to these approaches and improves upon random sparsity, it may not consistently achieve optimal performance under significant task variations. DGST addresses this by using dynamic per-kernel sparse training, adaptively updating the parameters with the highest gradient magnitudes in every iteration, effectively tackling the unique challenges of few-shot, task-specific settings.
Few-shot Setup (R1) We agree that using a larger test set would improve evaluation robustness. However, we used the 8:2 train-test split to establish an upper bound baseline “All-shot” with a sufficiently large training set, enabling a thorough assessment of the foundation model and DGST method in reducing annotation requirements.
Clarify on Variance Source (R1) We appreciate the reviewer’s feedback. The large deviations reported in Table 1 are due to inter-patient heterogeneity, which is an inherent characteristic of LN segmentation in clinical practice (e.g., LNQ challenge [3] reported a Dice IQR of 15.2-34%), rather than experimental instability. The observed average improvements provide strong evidence of the DGST method’s effectiveness.
Explanation of analysis on Gamma (R1) We appreciate the reviewer’s question. “Full” can be interpreted as the maximum gamma value, but its actual value varies across different stages of the UNet-like network due to differences in the number of trainable parameters at each layer. Our findings show that as gamma increases, the DGST method’s update strategy converges to that of “Full”, making it effectively the lower bound of DGST’s performance.
Efficiency (R2) We appreciate the reviewer’s comment. Since our foundation model uses nnUNet, the differences in GPU memory usage and computational complexity across fine-tuning methods are minimal. While resource usage is important, the minimal differences (6.3G-7.4G in VRAM usage) made them less relevant for comparison in our study. Therefore, we focused on training time as the primary efficiency metric in Table 2.
- Annotation Details (R2&3) The data annotation process involves two oncologists: one with 20 years of experience and another with 10 years of experience. The oncologists selected about 3k CT volumes with LN annotation from the landing system for training nnUNetv2, which generates predictions for the RADCURE. These oncologists confirm and correct these predictions using MIM software. The corrected data is used to re-train the model, improving its performance. This cycle of annotation, prediction, correction, and re-training refines both the model and dataset. Due to the space limit, more details will be provided upon the dataset’s open-source release.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
After the rebuttal process this paper still has received mixed scores. After reading the paper, reviews and authors rebuttal, I feel that the paper is borderline, but it has some merits to be accepted. In particular, providing a public dataset for training foundation (or other) models for the specific task is a valuable contribution. Furthermore, the differences wrt prior work seem sufficient to me to grant it technical novelty. Thus, based on these reasons I recommend its acceptance. Nevertheless, I strongly encourage the authors to consider some of the constructive feedback provided by the reviewers in the camera ready (e.g., the few-shot setting is unclear, even after the rebuttal process).
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposed a Dynamic Gradient Sparsification method for few-shot finetunning of a segmentation model trained with 36K HN Lyn data. After reading the paper, review and rebuttal, here are my concerns to support my recommendation.
- The experimental setting is unclear. For example, how to select the few-shot example. Selecting different example will impact the results for the models.
- The whole idea of pretraining on a large number of labeled data then few-shot finetuning is simple. Compare with more unsupervised self-supervised learning will help strength the paper.