Abstract

Pixel-level dense labeling is both resource-intensive and time-consuming, whereas weak labels such as scribble present a more feasible alternative to full annotations. However, training segmentation networks with weak supervision from scribbles remains challenging. Inspired by the fact that different segmentation tasks can be correlated with each other, we introduce a new approach to few-scribble supervised segmentation based on model parameter interpolation, termed as ModelMix. Leveraging the prior knowledge that linearly interpolating convolution kernels and bias terms should result in linear interpolations of the corresponding feature vectors, ModelMix constructs virtual models using convex combinations of convolutional parameters from separate encoders. We then regularize the model set to minimize vicinal risk across tasks in both unsupervised and scribble-supervised way. Validated on three open datasets, i.e., ACDC, MSCMRseg, and MyoPS, our few-scribble guided ModelMix significantly surpasses the performance of state-of-the-art scribble supervised methods. Our code is available at https://github.com/BWGZK/ModelMix.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2265_paper.pdf

SharedIt Link: https://rdcu.be/dV51A

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72114-4_44

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2265_supp.pdf

Link to the Code Repository

https://github.com/BWGZK/ModelMix/

Link to the Dataset(s)

https://github.com/BWGZK/ModelMix/tree/main/ACDC_dataset https://github.com/BWGZK/ModelMix/tree/main/MyoPS_scribbles https://github.com/BWGZK/CycleMix/tree/main/MSCMR_dataset

BibTex

@InProceedings{Zha_ModelMix_MICCAI2024,
        author = { Zhang, Ke and Patel, Vishal M.},
        title = { { ModelMix: A New Model-Mixup Strategy to Minimize Vicinal Risk across Tasks for Few-scribble based Cardiac Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {456 -- 466}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a mixup model that shares encoding parameters between different segmentation tasks using the same network architecture. For each task, a virtual encoder is created by averaging convolutive parameters between encoders. At the training stage, the model is supervised according to both the output of the original model and of virtual model, complemented by ground truth mask and scribble annotation. A regularization is added to minimize the difference between the above outputs.
The idea aims at sharing learned knowledge comes from similar data between different targets. It could improve the segmentation performance for a single task by combing the training of different tasks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The experimental design is well-structured. The authors conducted thorough ablation studies to validate the importance of each added mixup components, and the experiments were based on four widely used open-source databases.
- The mixup strategy is novel for me on cardiac MRI segmentation task and the design is rather intutive: the convolutive parameters of different models are shared to gather the learned knowledge from different tasks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The explanation part of the loss functions can sometimes be difficult to follow, such as lacking explanations for some variables in the formula (e.g., “v” in Equ. 3), or lacking explanations for some terms (e.g., the expression “vicinal distribution”, “vicinal risk”).
- Many design details are based on empirical assumptions, e.g., the linear interpolation of the convolution parameters between encoders.
- The performence comparaision between the proposal and baseline methods seems unfair: the proposed methods which were trained on multiple dataset were compared to the models trained on single dataset. I would like to have the feedback of the authors in the rebuttal if I missunderstand the design of baseline comparisons.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

The employed database are well-known and open-source with partial complementary in-house annotation. The code is not planned to be released and the implementation lacks some detailed description.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- Fig.1: The framework should be breifly illustrated in the caption, e.g., the target object of each task. Also, the font in the figure should be slightly enlarged; “Conv layer” should be “Conv layers” according to the description in Method.
- Page 2: “The proposed ModelMix is composed of three components” I would suggest “stages” instead of “components”
- Equ. 1: the “c1”, “c2”, “Lcos” should be clarified. I suppose that it is equivalent to “Ic ⊙ x” and the cosine similarity.
- Section 2.3: The supervision of the scripple is not clear for me. Is the ground truth of the scribble denoted as “y” in equ. 7 and 8?
- Tab. 1: I find nowhere the term “task combinaision” but two columns “MyoPS”. Therefore I am confused when reading the section “Combination of Tasks” and looking up Tab.1.
- Typos: page 3 “the obtained results is represented”; page 5 formatting error “For all datasets, we train the models using five randomly selected scribble-annotated volumes…”
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea is novel and its results seem interesting comparing to baseline methods. However my concern about the baseline comparaision is raised and waiting for answers during rebuttal, and the method description can be improved.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper mainly focuses on the medical image segmentation task supervised by few-scribble. Excellent segmentation results are obtained by proposing the ModelMix architecture. This architecture linearly interpolates randomly selected convolution kernels and bias terms in the encoders of different models, and minimizes the vicinal risk across tasks, so the segmentation results can be effectively improved. And experiments on different data sets demonstrate the effectiveness of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The author explored the few-scribble supervised segmentation task and innovatively proposed a method of mixing model parameters between tasks, which is very interesting.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The fairness of experimental comparisons needs more explanation.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

The method is clearly described, so the paper should be reproducible in terms of implementation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

1、The method proposed by the authors builds the proposed network architecture by considering the connections between different tasks. However, do the other scribble-supervised methods compared in the authors’ quantitative experiments introduce other tasks? If not, introducing other tasks itself can improve the generalization of the network. Wouldn’t this be a bit unfair? 2、The author can give a more clear explanation of c1 and c2 in Eq. (1). 3、”MyoPS and MyoPS” on page 6 should be “MyoPS and MSCMR” and can be corrected. 4、The Dice index in Table 3 is better when only the ACDC segmentation task is introduced than when the ACDC+MSCMR segmentation task is introduced. Can the author explain and analyze the reasons for this phenomenon?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please refer to the strength(5), weaknesses (6), comments (10) for justification.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors introduce a new approach for scribble-supervised segmentation named ModelMix. ModelMix constructs virtual models using convex combinations of convolutional parameters from separate encoders. Besides, the authors regularize the model set to minimize vicinal risk across tasks in both unsupervised and scribble-supervised way.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well organized and easy to follow.
2. The concept of ModelMix presents a novel perspective to utilize the feature information in scribble segmentation.
3. The evaluation shows promising effects of this pipeline in scribble supervised segmentation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors didn’t fully prove their assumption in the experiments. Only one convolutional layer is “mixed” with the other corresponding layer in the encoders, which is insufficient to demonstrate the theory. What will happen if more and even all the layers are mixed? Does the effect of mixing the layer surpass that of blocking the layer (DropOut)?
2. The authors provide the results of 5-scribble supervised segmentation on MSCMRseg. CycleMix only gets an average dice of 0.315, which gets an average dice of 0.80 on 35-scribble supervised segmentation task. (https://arxiv.org/abs/2203.01475) The consistent comparison or fair explanation is needed to demonstrate the pros and cons of ModelMix.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. The source code is welcome to be released.
2. Only mixing one convolution layer is weird in the setting. More explanations and experiments are required.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The idea of mixing model is delicate and novel.
2. The experiments demonstrate excellent performance of this framework.
3. More experiments are encouraged to comprehensively evaluate the value of ModelMix.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We appreciate the constructive and thoughtful comments from our reviewers. Below are our detailed responses.

Method concerns

R1, Explanations for variables (e.g., v in formula 3), or lacking explanations for some terms (vicinal distribution, vicinal risk) We are sorry for the unclear statement. v represents a vicinity distribution, measuring the probability of finding the virtual model f_ij in proximity to existing models f_i and f_j. The empirical risk denotes the average loss function across model samples. Similarly, the empirical vicinal risk indicates the average loss function across constructed vicinal model samples. We will clarify it in the revision.

R1, Many design details are based on empirical assumptions, e.g., the linear interpolation of the convolution parameters between encoders. Thanks. Our methods are designed based on the prior knowledge that linearly interpolating convolution parameters results in linear interpolation of the relevant features. The effectiveness of this approach is further validated through our ablation study.

R3, Only mixing one convolution layer is weird in the setting. More explanations and experiments are required. Thanks. In this study, we randomly mixed one convolutional layer, leading to linear interpolation among generated features, while ensuring stable training. We acknowledge that mixing more convolutional layers could be a viable approach and would like to leave it as the future work.

Experiment concerns:

R1,R2, The performance comparison between the proposal and baseline methods seems unfair: the proposed methods which were trained on multiple dataset were compared to the models trained on single dataset.

Thanks. For fair comparison, we try to train compared methods on multiple datasets, using different decoders for each task while sharing a common encoder. However, we discovered that even when trained on multiple datasets, performance is inferior to training on each dataset separately (Table 1, model #5 vs model #2). This performance gap occurs because limited training supervision makes it challenging for a single encoder to generalize across diverse tasks. Therefore, we opted to compare our proposed method with weakly supervised methods trained separately on different datasets.

R2, why the Dice in Table 3 is better when only MSCMR segmentation task is introduced than when ACDC+MSCMR segmentation is introduced?

Thanks. This discrepancy arises because images from MSCMR and MyoPS exhibit enhanced pathological information, whereas ACDC primarily contains structural information. Consequently, the pathology complementarity between MyoPS and MSCMR is more significant than that between MyoPS and ACDC. With the introduction of ACDC, ModelMix can assimilate robust shape priors, leading to improved HD performance, albeit with a slight decrease in the Dice metric.

R1,R2,R3, Source code Thanks. The code will be released upon acceptance.

Clarity:

R1, The framework should be briefly illustrated in the caption. Also, the font in the figure should be slightly enlarged. Suggest “stages” instead of “components”.

Thanks. We will revise it in the manuscript.

R1,R2, The “c1”, “c2”, “Lcos” should be clarified. I suppose that it is equivalent to “Ic ⊙ x” and the cosine similarity.

We are sorry for the unclear statement. c_1 and c_2 refer to I_c1 ⊙ x and I_c2 ⊙ x, respectively. Lcos denotes to the cosine similarity. We will clarify it in the manuscript.

R1, Is the ground truth of the scribble denoted as “y” in eq. 7 and 8? Yes, we will clarify it in the manuscript.

R1, For Table 1, the task combination is confusing. We are sorry for the confusing statement. In models with vicinal loss or a shared encoder, they utilize task combination. For instance, model #1 and #2 apply PCE and invariant loss to separate tasks. Meanwhile, models #3, #4, and #5 in the upper and lower columns report results of MyoPS + MSCMR and MyoPS+ACDC, respectively.

R1,R2. Typos. Thanks, we will revise it.

Meta-Review

Meta-review not available, early accepted paper.

back to top

ModelMix: A New Model-Mixup Strategy to Minimize Vicinal Risk across Tasks for Few-scribble based Cardiac Segmentation

Author(s):