Abstract

The lightweight models for automatic medical image segmentation have the potential to advance health equity, particularly in limited-resource settings. Nevertheless, their reduced parameters and computational complexity compared to state-of-the-art methods often result in inadequate feature representation, leading to suboptimal segmentation performance. To this end, We propose a Cascade Multi-Receptive Fields (CMRF) module and develop a lighter yet better U-Net based on CMRF, named TinyU-Net, comprising only 0.48M parameters. Specifically, the CMRF module leverages redundant information across multiple channels in the feature map to explore diverse receptive fields by a cost-friendly cascading strategy, improving feature representation while maintaining the lightweightness of the model, thus enhancing performance. Testing CMRF-based TinyU-Net on cost-effective medical image segmentation datasets demonstrates superior performance with significantly fewer parameters and computational complexity compared to state-of-the-art methods. For instance, in the lesion segmentation of the ISIC2018 dataset, TinyU-Net is 52x, 3x, and 194x fewer parameters, respectively, while being +3.90%, +3.65%, and +1.05% higher IoU score than baseline U-Net, lightweight UNeXt, and high-performance TransUNet, respectively. Notably, the CMRF module exhibits adaptability, easily integrating into other networks. Experimental results suggest that TinyU-Net, with its outstanding performance, holds the potential to be implemented in limited-resource settings, thereby contributing to health equity. The code is available at https://github.com/ChenJunren-Lab/TinyU-Net.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2191_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ChenJunren-Lab/TinyU-Net

Link to the Dataset(s)

https://challenge.isic-archive.com/data/ http://ncov-ai.big.ac.cn/download?lang=en

BibTex

@InProceedings{Che_TinyUNet_MICCAI2024,
        author = { Chen, Junren and Chen, Rui and Wang, Wei and Cheng, Junlong and Zhang, Lei and Chen, Liangyin},
        title = { { TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fields } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    TinyU-Net is a lightweight U-Net architecture employing a novel Cascade Multi-Receptive Fields (CMRF) module for efficient medical image segmentation, demonstrating superior performance while having fewer learnable parameters and lower computational costs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Innovative Design: Their CMRF module is novel, with potentially profound implications. It is the key component of their TinyU-Net. CMRF’s ability to leverage information across various receptive fields without increasing computational complexity is critical for TinyU-Net to maintain state-of-the-art accuracy while deploying in resource-limited settings. (2) Performance Efficiency: Demonstrates better segmentation performance with significantly fewer parameters and computational complexity compared to both traditional heavyweight models and other contemporary lightweight models. (3) Good Validation: Empirical results on two datasets (ISIC2018, NCP), reporting both IoU and Dice. TinyU-Net almost always outperformed the other models they tested, except it ranked second for ground-glass opacities in CT lung scans. (4) Flexibility and Adaptability: The CMRF module is adaptable to other architectures besides U-Net. They demonstrated by integrating CMRF into SegNet and into CMUNeXt, in both cases improving accuracy while reducing computational load.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The only major weakness is an insufficient explanation of the design decisions inside CMRF. CMRF is the main point of the paper, so its internal architecture should be well justified. Their chief design element seems to be the cascaded depthwise convolutions (hence their cascaded receptive fields). Their CMRF architecture, however, also includes several other unique features, which have minimal or insufficient justification. The paper could benefit from a deeper theoretical rationale explaining why their other specific design choices yield better feature representations compared to other approaches, and it could likewise use ablation studies to justify those architectural decisions. Why did they use GELU activation functions? Why are half the channels cascaded whereas the other half are added in pairs? Why do the split by odd/even channels instead of simply splitting first half / second half of the channels? Why does CMRF need to double the channel count at the end (going from X’’’ to X_out), and why do they use PWConv-BN-Act for this purpose? (The last sentence of section 2.1 is a somewhat rambling explanation that fails to convince.) (2) Likewise, the paper’s secondary weaknesses are a couple of unexplained curiosities about their TinyU-Net architecture, specifically at the bottleneck/bottom of the “U” as shown in Fig. 1. It seems redundant for the bottleneck to do downsampling followed immediately by upsampling without any CMRF or CNN blocks in between. If they really don’t need any CMRF or CNN block in that location, then it seems they could also have made other simplifications. Did the authors try connecting the last encoding CMRF module directly to the first decoding CMRF module? Perhaps they could have further reduced parameters without penalizing performance if they completely skipped the first decoding CMRF module and instead connected the last encoding CMRF module directly to the Upsampling block that goes from 1/8 to 1/4 resolution?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    • In section 2.1, there may be a typo in their inline equation for X’’_linear: If I understood correctly, I think C_out/N should instead by C_out/(2N). • Limited Comparative Analysis: The discussion primarily focuses on the superiority of TinyU-Net in terms of parameters and performance but lacks a detailed comparative analysis of other potential trade-offs where it’s reduced parameter count might also be beneficial, such as model generalizability, training time, and amount of training data required. These would be good to include if they can fit, or otherwise suggested in future work or included in a subsequent paper/supplement. • Figures and Tables: Some figures and tables are densely packed with information and would benefit from more descriptive captions and/or better legonds. Currently, the reader must consult the main text to find the abbreviations and terms used in Fig. 1 and the tables. It would help if the authors applied small edits/formatting to help guide the reader through this dense information in the visual aids. • Technical Jargon: The paper extensively uses technical jargon and abbreviations that might be obscure for readers not familiar with the field. Including slightly more detailed explanations (or a glossary) could make the paper more accessible. • As TinyU-Net is a new model, the present paper does not yet include any clinical feedback. In the future, when they present clinicians with the results of TinyU-Net, please consider reporting on whether those clinicians observe that TinyU-Net is more prone to certain types of errors than to others? Are there any real-world benefits or caveats beyond the raw performance metrics in this paper? *** Note: It is so important to address the first major weakness, that I am okay with the authors removing less important content if necessary to make room for a more detailed justification of their architectural design choices.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Their results demonstrate a surprising increase in performance while achieving a profound reduction in number of learnable parameters and computational requirements. This would have been a strong accept, if only they had better explained their architectural design decisions.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper propoposes a new lightweight yet performant Tiny-UNet. It’s based on the proposed Cascade Multi-Receptive Fields module featuring deathwise and poinwise convolutions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • extensive evaluation, high number of relevant baseline methods, relevant ablation experiments and metrics.
    • the proposed method seems to cut the parameters count and computational cost by a large margin while maintaining (or improving) the accuracy.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I don’t see any major problems with the paper, rather minor comments.

    Section 3, implementation details. Adam with learning rate of 1e-4 is rather specific, does it provide the best performance? How were selected the hyperparameters of the optimizer? Was there a potential data leak thought tuning on the test data? Could the selected optimizer parameters be in favor of the proposed method and isn’t optimal for the baselines? “momentum of 0.9” -> first order momentum decay rate of 0.9

    • Are the results in Table 1 significant?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In the journal version, I would suggest also including a larger number of the datasets like brats,stare,drive; and also comparing against nnUnet as it’s self-configurable.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents solid results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    They propose a lightweight U– net Architecture which uses much fewer parameters by utilizing redundant information across multiple channels in a feature map. Evaluate the model on ISIC2018 and NCP data sets with established baselines

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written it uses a novel approach of using redundant information across multiple channels in the feature maps. The evaluation seems thorough on a generic data set along with established state of the art. They have defined the problem very well and have also given thorough motivation of solving the problem. Depiction of the architecture and approach are also thorough.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is an overall well written paper. One thing that can be improved is figure to labeling what data set the image is taken from that is kind of differentiating between the ISIC2018 and the NCP data set.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It is a very well written paper. Commending the authors on this. Further, just figure 2 can be improved to make it more understandable to the reader would be helpful.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strong contributions including novel approach and robust evaluation. Minor improvements in figure 2 labeling about the dataset the image is from could enhance clarity. Overall, merits a positive recommendation due to its strengths and potential impact.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank the reviewers for their valuable suggestions and their recognition of our approach’s innovation, effectiveness, adaptability, and verifiability, as well as the clarity and organization of our paper.

Respond to Reviewer #1: Thank you for your recognition and suggestions on this work, which are very insightful! We acknowledge the reviewer’s typo correction and will make the necessary amendments. We have enhanced the descriptions of tables to improve readability. Additionally, we will elaborate on the design principles of the CMRF module as follows: GELU is an activation function based on the Gaussian error function. Compared to ReLU and other activation functions, GELU is smoother and enhances both the convergence speed and performance during training. GELU has been widely used in medical image segmentation models [1,2], and its application here is not unique. Halving for cascading operations utilizes redundant channel information, a cost-effective and feasible approach inspired by PConv and Ghost (see section 2.1 of this paper), which employ similar strategies. The odd/even channel split is derived from the channel shuffling operation and is informed by observations in Ghost, where adjacent channels may exhibit similarity. This partition efficiently leverages such similar redundant information. Doubling the number of heads at the end of the CMRF module aligns with the bottleneck structure design. PWConv-BM-Act is used to fuse information from multiple receptive fields and to adjust the number of output channels. At the beginning of TinyU-Net’s bottom design, we added a CNN block and observed a slight decrease in segmentation performance, while adding a CMRF block resulted in no significant improvement. Therefore, we opted not to add a functional module at the bottom, leaving room for future exploration of the TinyU-Net’s bottom structure. Lastly, in future work, we plan to extend TinyU-Net to clinical settings for feedback from medical professionals. Respond to Reviewer #3: Thank you for your recognition and suggestions on this work! We adopted the review’s recommendation to describe momentum decay more clearly. Regarding implementation details (such as learning rate and optimizer), we employed a standard approach consistent with many published papers [3,4]. We ensured that there was no data leakage between the training and test sets. The lesion segmentation of ISIC2018 dataset in Table 1, TinyU-Net is 52×, 3×, and 194× fewer parameters, respectively, while being +3.90%, +3.65%, and +1.05% higher IoU score than baseline U-Net, lightweight UNeXt, and high-performance TransUNet, respectively. The results show a surprising increase in performance while achieving a profound reduction in a number of learnable parameters and computational requirements. Respond to Reviewer #4: We thank the reviewers for their recognition and suggestions! We provided a more descriptive caption (i.e., Comparative qualitative results on ISIC2018 (first tow lines) and NCP (last tow lines) datasets.) in Figure 2 to improve clarity and make it more understandable to the reader.

[1] Ruan, J., Xie, M., Gao, J., Liu, T., & Fu, Y. Ege-unet: an efficient group enhanced unet for skin lesion segmentation. In MICCAI (pp. 481-490). Cham: Springer Nature Switzerland. (2023) [2] Han, Z., Jian, M., & Wang, G. G. ConvUNeXt: An efficient convolution neural network for medical image segmentation. Knowledge-Based Systems, 253, 109512. (2022) [3] Cheng, J., Gao, C., Wang, F., & Zhu, M. “Segnetr: Rethinking the local-global interactions and skip connections in u-shaped networks.” In MICCAI (pp. 64-74). Cham: Springer Nature Switzerland. (2023) [4] Lin, X., Yan, Z., Deng, X., Zheng, C., & Yu, L. ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation. In MICCAI (pp. 642-651). Cham: Springer Nature Switzerland. (2023)




Meta-Review

Meta-review not available, early accepted paper.



back to top