Abstract

Medical image segmentation is essential for identifying lesion regions and di-agnosing disease. Convolutional neural networks (CNNs) and transformer-based models often struggle to effectively capture both local details and global contextual features in medical images, leading to a decline in segmen-tation performance. To address this problem, a novel medical image segmen-tation model, KMUNet, is proposed by integrating Kolmogorov-Arnold net-works (KAN) and Mamba based on the traditional U-shape architecture. This model employs a CNN-based encoder to extract local features and integrates a State Space Model-based Mamba module in the decoder to capture long-range dependencies. Initially, a global downsampling module, called KAN-PatchEmbed is presented. This module differs from traditional convolutional operations in utilizing an interval sampling strategy to alleviate the loss of feature information and KAN to reduce computational complexity, respec-tively. Furthermore, the Kolmogorov-Arnold Spatial-Channel Attention module is designed for skip connections, where KAN is employed to allocate the weight of the current channel by aggregating features across all stages. Finally, the proposed model was evaluated on three publicly available da-tasets. Experimental results reveal that KMUNet outperforms other models in segmentation tasks and produces more visually appealing segmentation re-sults. Our code is available at https://github.com/zhang-hongsheng/KMUNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1883_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/zhang-hongsheng/KMUNet

Link to the Dataset(s)

CVC-ClinicDB dataset: https://www.kaggle.com/datasets/balraj98/cvcclinicdb/data Glas dataset: https://www.kaggle.com/datasets/sani84/glasmiccai2015-gland-segmentation BUSI dataset: https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset

BibTex

@InProceedings{ZhaHon_KMUNet_MICCAI2025,
        author = { Zhang, Hongsheng and Duan, Yuting and Liu, Ting and Zhang, Weifeng and Tang, Hongzhong},
        title = { { KMUNet: A novel medical image segmentation model based on KAN and Mamba } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {297 -- 306}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work presents a novel medical image segmentation model, KMUNet, by integrating Kolmogorov-Arnold (KAN) and Mamba based on the traditional U-shape architecture. The proposed model was evaluated on three publicly available datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method has some novelty.
    2. The comparison experiments are extensive.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Table 1 is very crowded. Please underscore the second best results for a clear comparison.
    2. As shown in Table 1, the improvement upon the state-of-the-art is marginal. Taking the Dice for example, the improvement is very small, e.g., 0.9384 vs. 0.9371 on CVC-ClinicDB, 0.9485 vs. 0.9465 on Glas, and 0.7731 vs. 0.7677 on BUSI.
    3. Please also compare with other methods w.r.t. speed and the number of parameters.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Marginal improvements upon the state-of-the-art.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    In the paper, a method for medical image segmentation is presented. The method is based on the combination of a Kolmogorov–Arnold Network and a Spatial-Channel Attention module. Many comparisons with well-known segmentation network have been conducted, and the results are interesting.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Avery interesting KAN-PatchEmbed structure has been created, used for the downsamplind in the encoder part of the UNet structure. KAN-PatchEmbed has the aim to reduce the loss of feature information and the computational cost of the model. The Spatial-Channel Attention module (SCA) is an attention module that combines channel and spatial information.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The great weakness of the paper is the loss of reproducibility due to:

    • malfunction of the given link for the code (https://anonymous.4open.science/r/KMUNet-5D2E) at the time of writing this revision
    • poor description of the datasets, in particular of BUSI dataset
    • not specification of the networks training: as the datasets have been split for the experiments?
    • no critical discussion about obtained results.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite method uses a very interesting approach and the promising results, unfortunately, the paper lacks strength in the experimental part, above all for the reproducibility of the model.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    All my comments have been discussed and argued. I do not find other issues, so the paper can be accepted in the current version.



Review #3

  • Please describe the contribution of the paper

    Authors propose a novel neural network architecture for medical image segmentation. The architecture, which authors call KMUNet, appears to be derived from MALUNet (Ruan et al., 2022), which itself is a descendant from the U-Net family (Ronneberger et al., 2015). Major contributions by the authors are the integration of both Kolmogorov-Arnold network (KAN) (Liu et al., 2024/2025) and Mamba components into their architecture. The former (KANs) provide a recent alternative to a multi-layer perceptron setup by replacing the conventional combination of learnable edge weights and fixed activations on the nodes of a network by learnable spline activations on the network edges, followed by plain summation on the nodes.

    Authors integrate Mamba components as the blocks/layers in the upsampling/decoding part of their network. Authors achieve KAN integration (a) in their initial patch embedding step (which otherwise is similar to vision transformer’s patch embedding) and (b) in what they call a KAN spatial-channel attention block (KAN-SCA) (which replaces the blocks called SAB/CAB on the skip connections of MALUNet).

    To demonstrate the performance of a model with the proposed architecture, they train and evaluate it on three public medical image segmentation datasets and compare their results to those of 15 other models.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    With the presented architecture, which in itself shows considerable innovation, authors manage to demonstrate superior performance in what I would consider a clinically relevant and method-wise highly competitive field (namely, that of medical image segmentation). While their presentation and experiments, in my opinion, in many ways could have been improved (see “weaknesses” below), authors enable others to form their own opinion by (a) choosing publicly available datasets for evaluation and (b) making their source code publicly available.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In my opinion, authors could have done a better job in trying to convince their readers that the choice of KANs and Mamba within their setup is (a) reasonable, (b) indeed responsible for the overall superior performance of their model, compared to competitors.

    As to (a):

    • On page 2, authors write “we utilized KAN to enhance the interpretability of the model” – but where is this enhanced interpretability shown (by explanation, by experiment, or both)? Likewise, where do authors show what they claim in the last bullet point on page 2 (“enhances multi-scale local features”, “integrates multi-stage global contextual information”, “capture global contextual relationships, thereby allocating weights to feature maps”)?
    • On page 3, authors write that the patch embedding procedure of vision transformer (ViT) “may result in irreversible loss of spatial information” and then propose what they call KAN-PatchEmbed as an alternative. However: what is the explanation of the authors that their embedding procedure does not suffer from such loss of information? I could not find any such explanation and it does not appear as immediately intuitive to me. Moreover: from an information-theoretical perspective, what should even be the theoretical grounding of such an argument? Assuming that both patch embedding approaches (the one of ViT and KAN-PatchEmbed, that is) embed patches into the same number of parameters and furthermore assuming that ViT’s patch embedding does not “waste” parameters in some sense (which I don’t see is the case), how should Kan-PatchEmbed manage to retain/encode “more” or “better” information in some sense? This, in my opinion, would have needed a very solid theoretical argument at best, or a corresponding experimental comparison at least.
    • On page 5, authors write “The novelty of KAN-SCA is its initial integration of KAN … to improve the interpretability”. Again, where and how is this asserted improved interpretability theoretically explained and/or experimentally shown?

    As to (b): In my opinion, authors convincingly show by experiment (Sect. 3.3) that the proposed architecture indeed performs well in the chosen experimental setting. Likewise, I think authors chose the experimental setting very appropriately, given the large variety in the clinical settings of the chosen datasets and given the fact that the datasets are publicly available. However, I do not see why the reason for the demonstrated superiority of their results should lie in the particular choice of architecture. What I mean to say is:

    • First, could it be that the network would have performed equally well if MLPs had not been replaced by KANs and/or convolutional blocks had not been replaced by Mamba blocks? For future works, if authors consider proposing architectural innovations, I would highly suggest the integration of an ablation study in the corresponding work to clearly demonstrate the benefit of each individual proposed component.
    • Second, could it be that the superior performance, as compared to competitors, lies in a different number of network parameters or in a choice of hyperparameters (learning rate etc.) that favors the authors’ method? For example, I find the consistently bad overall performance of Att-UNet particularly striking. So how did authors ensure a fair comparison? Unfortunately, such detail is not given in the experiment description. I would suggest authors extend their description accordingly (also see below). I would also suggest authors add training scripts, including all necessary parameterizations for all analyzed nets, to their shared codebase (where I commend the authors for making it already available during the review process), so that others can reproduce their results and form an opinion on them in this regard.

    Regarding the description of the comparison experiment (Sect. 3.2), essential detail is missing, in my opinion:

    • Did an initial hyperparameter tuning phase take place? If so, it should be described. If not, how were parameters chosen?
    • Were all models trained with the same parameters (learning rate etc.)? If so, how did authors ensure that their choice did not provide an advantage for their model? If not, how were parameters chosen for competitors?
    • Were data augmentations used during training? If so, which?
    • As to “initial learning rate”: was the learning rate decayed? If so, by what schedule?
    • Why were different parameters chosen for the BUSI dataset? (This also relates back to: how were parameters chosen in the first place, see before?)
    • What loss terms were used?
    • What parameter counts did the used models have / were all model sizes in a comparable range?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Formal issues, phrasing, typos, etc:

    • Abstract: (1) “was proposed” → “is proposed”, “was presented” → “is presented” (a past-tense formulation like that appears to me as appropriate for the conclusion at the end of the main text, but not for the abstract, which precedes the main text); (2) “by integrating Kolmogorov-Arnold” – I guess this should be “by integrating Kolmogorov-Arnold networks / network components”; (3) I would put the URL of the codebase completely on one line: at present, copy-pasting it includes the hyphen (“sci-ence”) and thus makes the link invalid.
    • Introduction: (1) Ascribing “promising potential” to U-Net is, in my opinion, a gross understatement, given its currently 100k+ citations on Google Scholar and its continued use and adaptation for a complete decade now. Maybe writing something like “U-Net … established a U-shaped encoder-decoder, which proved to be highly suitable for medical image segmentation tasks” would be more appropriate here. (2) “… to extract contextual features in medical image” – should be “images”, I guess. (3) “l1 Loss” → “l1 loss”.
    • Sect. 2.1: What is meant by “significantly” in “significantly modeling the semantic correlation”? This could be rephrased for clarity, maybe. Or did authors mean to write “significantly better modeling …”?
    • Sect. 2.2: (1) Regarding ViT’s patch embedding approach, it is not immediately clear to me why this should be a 4-d convolution. As I know it, on 2-d images, ViT uses a 2-d convolution (where kernel size matches stride), followed by a linear projection. I guess this could also be rewritten as a 4-d convolution somehow? If so, authors should consider giving more details; or otherwise, authors should consider rephrasing. (2) Variable n below Eq. 3 is not actually used in the equation that authors refer to, but only in the figure that visually explains the equation. I would thus refer to the corresponding figure here (something like: “… and n (see Fig. 1a) is determined by …”).
    • Fig. 1b: I would suggest adding (in the main text) what operations are used to achieve downsampling and upsampling (blue and red arrows), as this is currently missing in the description of the architecture.
    • Sect. 2.3: “For instance, in the fourth-stage KAN-SCA module, we first incorporate …” – I do not understand what is meant by “for instance” here. What is this an instance/example for? Also, does this somehow deviate from what is done for the remaining KAN-SCA modules? A reformulation for clarity should be considered.
    • Sect. 3.3 and Table 1: With the given numbers, the proposed KMUNet does not have the best sensitivity with CVC-ClinicDB – MHorUNet has the same performance; Roll-Unet, ACC-UNet, and SCR-Net all perform better. The presented numbers should be double-checked, and, if correct, the highlighting in Table 1 as well as the summary in the main text on the CVC-ClinicDB dataset should be adjusted.
    • Figures 2–4: Authors should consider adding how they selected the examples for visualizing the segmentation results. For example, are random samples shown, or are samples selected to highlight specific aspects of the segmentation performance? In either case, a description of the selection process would help me as a reader to put the shown results into perspective.
    • Figure 3: In Image_3 (red arrow), authors rightly point out the oversegmentation (false positives) exhibited by some methods as compared to ground truth. What they fail to point out, however, is the undersegmentation (false negatives) of their own method in the same region. For a fair experimental comparison, this should also be explicitly mentioned in my opinion. Moreover, I wonder whether the latter (undersegmentation) is not more critical than the former (oversegmentation), especially when it comes to tracing pathological regions? I think this would be worth discussing as well in this context.
    • End of Sect. 3.3: “Fortunately, our KMUNet outperforms other models …” – I would suggest leaving it to the reader to judge whether the described finding is fortunate or not. I would thus suggest dropping the judgmental term “fortunately” (and, if necessary, replace it by a more neutral term like “on the other hand”, “in this context”, etc.).
    • References: (1) In my opinion, references should be ordered either temporally (that is to say, by order of appearance in the text) or alphabetically (by name of the first author); at present, I cannot find any obvious order. (2) I do not think the current template allows abbreviating the authors’ lists of the references, as is currently done (“… &”); authors should consider giving the full authors’ lists for all references. (3) Ref. 19: “uan, J.” should be “Ruan, J.”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I find the authors’ assessment of the U-net architecture refreshing: authors show that integrating more recent developments can still improve the performance of a model that can otherwise be considered as “having stood the test of time”. What I mainly criticize (and why I do not give a stronger rating) is the way by which authors show (or rather, not show) the impact of their U-Net adjustments. Then again, on the plus side, both their source code and the experimental data are available publicly, enabling others to judge said impact by themselves. In summary, this makes me lean towards acceptance.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I would like to congratulate the authors on a remarkable achievement: within the given 4000-letter limit for the rebuttal, they have managed to address all major concerns of all reviewers, in my opinion. I would especially like to thank them for providing additional insights into the comparability of methods (Q3), which I believe helps emphasize the value of their proposed architectural adjustments. I really encourage them to include this information in the revised manuscript, as promised in Q3.

    I am still not particularly happy with two aspects: Model interpretability (Q7): I skimmed Liu et al.’s work [11] during my review, and I understand Liu et al.’s arguments about interpretability (especially Section 4 in their paper [11]). I just do not (yet) see how these arguments should transfer to the domain of image analysis/computer vision. I would be happy if the authors of the submission toned down or elaborated on their claims in this regard, as promised in Q7. Spatial information loss (Q9): I still do not see how employing KANs should “naturally reduce information loss”, but perhaps this is due to my limited knowledge of KANs.

    That said, I believe the submission’s contributions far outweigh its remaining shortcomings, which is why I recommend its acceptance.




Author Feedback

We thank all reviewers for their constructive comments. We will improve the presentation according to suggestions. Below, we address major concerns. (R1) Q1. Please underscore the second best results for a clear comparison: We have modified Table 1 and underlined the second best results. (R1) Q2. The improvement is very small: In fact, comparative methods are already SOTA, which limits the margin for further improvement on all datasets. Moreover, the rank of second-best model is always changing with variation of datasets. In contrast, our proposed KMUNet consistently demonstrates superior generalization ability on all datasets. For example, while Roll-Unet ranks second on Glas dataset, it falls behind KMUNet by 5.5% in Dice score on BUSI dataset (0.7327 vs. 0.7731). More importantly, visualization results in Figures 2-4 demonstrate that our KMUNet outperforms other models. This superior performance is more important than Dice scores in clinical settings. (R1,R3) Q3. Compare with other methods w.r.t. speed and the number of parameters: Our KMUNet, with approximately 10M parameters and 3 GFLOPs, achieves an average reduction of 67% in parameter count and 94% in computational cost compared to UNet. Furthermore, KMUNet demonstrates superior computational efficiency over other Mamba- or Transformer-based models. We will update these numbers in the revised paper. (R1,R2,R3) Q4. Malfunction of the given link for the code: The anonymous website is now accessible. If the link still does not work, please search “KMUNet” on GitHub to access the full source code. The code link contains train.py file, which can be trained directly. (R2,R3) Q5. Description of the BUSI datasets and more implementation details: 1. BUSI contains 210 malignant breast ultrasound images, and we split all datasets into 70% training and 30% test. Images were resized to 256×256. 2. All models are trained with same parameters. (1) Data augmentation includes random rotation (0–360°) and horizontal/vertical flipping (each 50% probability). (2) AdamW can adaptively adjust learning rate. We adopt Adam optimizer with a cosine annealing strategy(T_max=max epochs) for BUSI dataset. The reason is that BUSI is more challenging for segmentation, so we decided to introduce more randomness. All other hyperparameters follow PyTorch defaults. 3. The loss function integrates cross-entropy and Dice losses with 1:1. We will update these experimental details in revised paper. (R2) Q6. No critical discussion about obtained results: We have included a dedicated discussion on the trade-off between computational efficiency and segmentation performance. This result will be updated in revised paper or on GitHub. (R3) Q7. Basis for improving Interpretability Using KAN: Sorry for the mistake in our expression. Our KAN is based on the work from [11]. In [11], authors have extensively used mathematical formulas and experimental results to demonstrate the interpretability of KAN. We believe that integrating KAN into KMUNet can provide the model with interpretability. We will revise this expression in revised paper. (R3) Q8. (a1): KAN-SCA learns multi-scale feature information through the dilated convolution of SAB. Subsequently, the KAN in CAB learns information from all stages to allocate channel weights for the current stage. (R3) Q9. Evidence of KAN-PatchEmbed’s spatial information loss mitigation: The traditional PatchEmbed uses a 4x4 convolution with a stride of 4, causing loss of spatial relationships between adjacent patches due to non-overlapping sampling. In contrast, KAN-PatchEmbed employs KAN to learn global information, which naturally reduces information loss. (R3) Q10. All ablation studies about Mamba and KAN: All conclusions in this article are based on experimental results. Due to space constraints, we will update all ablation study results on GitHub. (R3) Q11. Writing: We will check and revise the full text for writing problems.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have clearly clarified the issues raised by the reviewers. The explanations in the rebuttal look reasonable and correct to me.



back to top