Abstract

Automated and accurate segmentation of individual vertebra in 3D CT and MRI images is essential for various clinical applications. Due to the limitations of current imaging techniques and the complexity of spinal structures, existing methods still struggle with reducing the impact of image blurring and distinguishing similar vertebrae. To alleviate these issues, we introduce a Frequency-enhanced Multi-granularity Context Network (FMC-Net) to improve the accuracy of vertebrae segmentation. Specifically, we first apply wavelet transform for lossless downsampling to reduce the feature distortion in blurred images. The decomposed high and low-frequency components are then processed separately. For the high-frequency components, we apply a High-frequency Feature Refinement (HFR) to amplify the prominence of key features and filter out noises, restoring fine-grained details in blurred images. For the low-frequency components, we use a Multi-granularity State Space Model (MG-SSM) to aggregate feature representations with different receptive fields, extracting spatially-varying contexts while capturing long-range dependencies with linear complexity. The utilization of multi-granularity contexts is essential for distinguishing similar vertebrae and improving segmentation accuracy. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches on both CT and MRI vertebrae segmentation datasets. The source code is publicly available at https://github.com/anaanaa/FMCNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3860_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/anaanaa/FMCNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ShiJia_Frequencyenhanced_MICCAI2025,
        author = { Shi, Jian and You, Tianqi and Zhang, Pingping and Zhang, Hongli and Xu, Rui and Li, Haojie},
        title = { { Frequency-enhanced Multi-granularity Context Network for Efficient Vertebrae Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {209 -- 218}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposes FMC-Net, a frequency-enhanced network integrating wavelet transforms and multi-granularity state space models (MG-SSM) for 3D vertebrae segmentation. It addresses image blurring via high-frequency feature refinement (HFR) and leverages MG-SSM to capture spatially varying contexts for distinguishing similar vertebrae. Evaluated on CT (VERSE2019) and MRI (LUMAR) datasets, the method improves over CNN-, transformer-, and Mamba-based models in Dice scores and Hausdorff distances.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This manuscript has following advantages,

    1. Wavelet transform mitigates information loss during down sampling, enabling robust feature decomposition.
    2. HFR separates high-frequency noise from critical edge/texture details.
    3. MG-SSM captures long-range dependencies with linear complexity, overcoming CNNs’ locality and transformers’ computational overhead.
    4. Validated on both CT and MRI datasets, demonstrating cross-modal adaptability.
    5. Visualizations confirm improved segmentation of blurred regions and anatomically similar vertebrae.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Few Questions / Comments:

    1. Section 3 for Experiments, this line needs to be improved for grammatical accuracy and clarity “We evaluate the effectiveness of our approach on two publicly vertebrae segmentation datasets.”
    2. Figure 3, The figure caption doesn’t provide information about which dataset are shows in which rows of the figure. By providing this information it will help the readers for a better reading experience.
    3. The author has not mentioned the information about cost function or loss used to train the models for the vertebrae segmentation task
    4. Also, what was the loss or cost function used for the HFR or MG-SSM modules for improving the resolution of the high frequency and low frequency components of the image after the WTD stage.
    5. The author have used separate models for CT and MRI images for the 2 datasets, VERSE2019 (CT) and LUMAR (MRI), did the author use other datasets, to validate these models ? As the results are evaluated on limited dataset, it would be good to know the generalizability of the models.
    6. The data statistics information for the input dataset was missing from the manuscript. Can the author provide information or data statistics for the input datasets. Specifically mentioning the details for the different field-of-view, highlighting multiple scanner models, and vendors, and different image contrast ?
    7. The author has not mentioned any information about the results of these models on multiple MRI Image sequences. It would be good if the authors can provide separate results from different sequences, T1-weighted, contrast-enhanced T1-weighted and T2 weighted. This will help the readers in understanding the model’s performance across different sequences and its generalizability.
    8. The manuscript mentions nnU-Net handled preprocessing. Were any dataset-specific pre-processing, normalization or augmentation steps applied to address MRI intensity heterogeneity or CT noise?
    9. Why did the author employs three dilation rates (d1–d3) in MG-SSM. How were these rates selected? How does performance change with more granularities?
    10. The Class-wise results (cervical/thoracic/lumbar) show performance disparities. Does FMC-Net struggle with specific vertebral subtypes due to size/shape variations? Why did the model have a huge difference between segmentation of Lumbar, Thoracic, Cervical vertebrae? It would be good to add this point in the manuscript.
    11. The dataset distribution information is missing from the manuscript. The subset of data from VERSE2019 and LUMAR, used for training, validation and testing the model is missing. It would be good if the authors can mention this information, as it will help the readers in understanding the training and inference process.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    NA

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The manuscript’s approach of using the wavelet transform mitigates information loss during down sampling, enabling robust feature decomposition. HFR separates high-frequency noise from critical edge/texture details.
    2. As mentioned above there are few points / questions that needs to be addressed by the authors for a strong acceptance of the manuscript.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you to authors for detailed response, to revising the manuscript.

    1. The plan to carefully revise the manuscript, including figures, is noted.

    2. The explanation regarding the combined loss function (weighted cross-entropy and Dice at each decoder stage, with no loss applied for HFR or MG-SSM) is clear.

    3. I acknowledge the MICCAI constraints on new experiments and appreciate the initiative to test the method on an additional prostate segmentation task, showing its outperformance over U-Mamba.

    4. The detailed information provided for the VerSe2019 and LUMAR datasets (manufacturers, views, T1-weighting, splits, and nnU-Net preprocessing) will significantly improve the manuscript’s clarity. The transparency regarding T2-weighted image performance and the rationale for dilation rates in MG-SSM are also helpful.



Review #2

  • Please describe the contribution of the paper

    The paper’s central contribution is FMC‑Net, a novel 3D segmentation framework that leverages wavelet‑based downsampling to perform lossless multiresolution analysis and then separately enhances the resulting high‑ and low‑frequency components to tackle image blurring and anatomical similarity. For high frequencies, a High‑frequency Feature Refinement (HFR) module amplifies edge and texture details while suppressing noise; for low frequencies, a Multi‑granularity State‑Space Model (MG‑SSM) aggregates multi‑scale contextual cues with linear complexity to distinguish adjacent vertebrae. By fusing these enhanced components within an encoder–decoder architecture (including wavelet‑based upsampling), FMC‑Net achieves state‑of‑the‑art accuracy on both CT (VERSE2019) and MRI (LUMAR) vertebrae segmentation benchmarks, demonstrating its effectiveness and efficiency for clinical spine analysis

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Lossless multiresolution via wavelet-based sampling. By replacing standard pooling and interpolation with DWT/IDWT, FMC‑Net preserves fine structural details in blurred CT/MRI scans, ensuring no information loss during down‑ and up‑sampling.

    • Specialized frequency‑domain enhancement modules. The High‑frequency Feature Refinement (HFR) and Multi‑granularity State‑Space Model (MG‑SSM) work in tandem to restore edge/textural details and capture long‑range contextual cues, respectively—addressing both noise suppression and vertebra discrimination in a unified framework.

    • Multi‑granularity context modeling for adjacent vertebrae differentiation. MG‑SSM’s use of parallel dilated convolutions followed by linear‑complexity state‑space layers at varied receptive fields enables the network to distinguish highly similar neighboring vertebrae—a capability lacking in prior CNN‑ or transformer‑only methods.

    • Cross‑modality performance. Extensive evaluation on both VERSE2019 (CT) and LUMAR (MRI) datasets demonstrates state‑of‑the‑art Dice and Hausdorff metrics, confirming FMC‑Net’s generalizability and suitability for clinical spine analysis.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Incremental methodological novelty. The use of discrete wavelet transforms for lossless downsampling and frequency‑aware feature decomposition builds directly on classical multiresolution analysis (Mallat 1989) and on prior work integrating DWT into CNNs for noise robustness (Li et al., CVPR 2020). Likewise, the core idea of using different dilation rates and scales receptive fields to capture “multi-scale” is not novel. Therefore, this work proposed a thoughtful architectural adaptation rather than a fundamentally new algorithmic paradigm.

    • Lack of discussion on the results and comparisons. Suggest to shorten the Intro and Method sections (remove repeating explanations) to provide more space for discussions. Refer to comments below.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Major comments: C1. Since SSM is claimed to be more efficient than Transformer, it would be good to include inference speed of the proposed model. C2. Please elaborate how 3D data has 7 high-freq components. From my understanding, a 2D image has 3 high-freq components, namely, horizontal, vertical, diagonal. C3. How many stage is the WTU? 5? Beside that, does Fig 1’s Overall architecture section correctly depict the fusing of F^i and F^{i+1}_D? Why did authors fuse the low-freq of F^i only? C4. In HFR, will the max-pooling and average pooling cause information loss/irreversible distortion? C5. For LUMAR dataset, why did authors use T1-weighted data only? Also why does LUMAR have lower number of epochs than VERSE2019? C6. Why does the model perform much better on LUMAR dataset than VERSE2019, as compared to existing works? C7. In Table 2, why there isn’t an Baseline + DWT-Sample + MG-SSM? C8. For some results, how can the HD95 increase while the DSC decrease?

    Minor comments: C9. Indicate the blurring parts in Fig 1 (a) and (b). Beside that, it is unclear how (d) highlights the importance of multi-granularity to handle (c). Authors may need to reprahse the caption for (c) and (d). C10. In Wavelet Transform Downsampling, simply write “concatenate F^i_low and F^i_high”, instead of using a vague term like “fusing”. C11. Typos in MG-SSM section, Liner and SiLU. C12. Please briefly describe the metrics DSC(%) and HD95.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Authors proposed a novel application of DWT-Sample on CT and MRI for Vertebrae Segmentation, and introduce separate refinement methods on the high-freq and low-freq components. However, the proposed refinements and DWT-Sample are not novel. Performance gains on VERSE2019 dataset seems to be minimal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors mostly addressed most of my comments, the manuscript may need some minor revisions.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a frequency-enhanced multi-granularity context network for efficient vertebrae segmentation. The contributions of the paper are the following:

    • The paper introduces a wavelet-based sampling strategy that uses Discrete Wavelet Transform (DWT) for downsampling and inverse DWT for upsampling, helping to preserve fine details and minimize information loss compared to traditional methods.

    • It presents a High-frequency Feature Refinement (HFR) approach that uses a dual-path spatial attention mechanism to enhance key high-frequency features like edges and textures while reducing noise, improving the clarity of blurred images.

    • The authors propose a multi-granularity state space model (MG-SSM) that captures long-range dependencies and spatially varying context at multiple scales, enabling better distinction between vertebrae with similar anatomical appearances.

    • They design an integrated encoder-decoder architecture (FMC-Net) that brings together the wavelet, HFR, and MG-SSM modules into a single framework, effectively leveraging frequency and multi-scale context for accurate vertebrae segmentation.

    • The model shows improved performance in Dice Score on both CT (VERSE2019) and MRI (LUMAR) datasets, outperforming CNN, transformer, and Mamba-based models in segmentation accuracy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The FMC-Net architecture introduces a novel wavelet-based encoder-decoder design that separates features into high and low-frequency components, offering a lossless alternative to traditional downsampling. It proves especially beneficial for handling blurred or noisy medical images.

    • The model employs a high-frequency feature refinement mechanism using a dual-path spatial attention design to enhance critical textures and reduce noise, enabling precise reconstruction of fine anatomical structures essential for vertebrae segmentation.

    • The multi-granularity state space model integrates dilated convolutions with visual state space modules to capture long-range dependencies and spatial context at multiple scales, improving the model’s ability to differentiate between anatomically similar vertebrae.

    • The method is comprehensively evaluated on both CT (VERSE2019) and MRI (LUMAR) datasets, where it surpasses leading CNN, transformer, and Mamba-based models, showing strong cross-modality generalization.

    • The ablation study confirms that each component (DWT, HFR, and MG-SSM) contributes significantly to the model’s performance, supporting the value of their combined design.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The application of wavelet-based downsampling and upsampling, while effective, builds on well-established prior work [1,2] and limits the methodological novelty of the contribution.

    • The method does not incorporate anatomical knowledge (e.g., vertebral order, shape, or spatial constraints), unlike prior works such as SpineParseNet [3] or B-Spine [4], which limits its interpretability and robustness in pathological cases.

    • The proposed Multi-Granularity State Space Model (MG-SSM) lacks a clear theoretical or architectural novelty over recent Mamba-based and SSM-based models, such as SegMamba [5] and U-Mamba [6].

    • The paper lacks an analysis of where the model fails, such as confusion between adjacent vertebrae, poor performance on small structures like C1/C2, or discussion about vertebrae anomalies (e.g., LSTV: lumbo sacral transitional vertebrae).

    • Both HFR and MG-SSM are said to improve discrimination of similar vertebrae, but the paper does not clearly isolate or quantify the contribution of each module toward this specific task.

    • The evaluation omits comparisons with hybrid or anatomically-guided models (e.g., SpineTransformers [7]), which could offer stronger interpretability or structural consistency.

    • The term “Mamba-based” is only used in the last part of the paper. Please clarify how your SSM implementation relates to and extends Mamba.

    • The paper lacks clarity: some terminology (e.g., VSSM) is insufficiently defined, and visual aids like Figure 2 lack detailed explanation, which hinders readability.

    [1] Xu, G., Liao, W., Zhang, X., Li, C., He, X., Wu, X.: Haar wavelet downsampling: A simple but effective downsampling module for semantic segmentation. Pattern recognition, 143, 109819 (2023).

    [2] Li, Q., Shen, L., Guo, S., Lai, Z.: Wavelet integrated CNNs for noise-robust image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp. 7245-7254 (2020).

    [3] Pang, S., Pang, C., Zhao, L., Chen, Y., Su, Z., Zhou, Y., Feng, Q.: SpineParseNet: spine parsing for volumetric MR image by a two-stage segmentation framework with semantic image representation. IEEE Transactions on Medical Imaging, 40(1), 262-273 (2020).

    [4] Wang, H., Song, Q., Yin, R., Ma, R. : B-spine: Learning B-spline curve representation for robust and interpretable spinal curvature estimation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Vol. 38, No. 6, pp. 5381-5389 (2024).

    [5] Xing, Z., Ye, T., Yang, Y., Liu, G., Zhu, L.: Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 578-588 (2024).

    [6] Ma, J., Li, F., Wang, B.: U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv preprint arXiv:2401.04722 (2024).

    [7] Tao, R., Liu, W., Zheng, G.: Spine-transformers: Vertebra labeling and segmentation in arbitrary field-of-view spine CTs via 3D transformers. Medical Image Analysis, 75, 102258 (2022).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • The manuscript occasionally uses ambiguous terms (e.g., “visual state space module” is not well defined).
    • The figures are overloaded with architectural detail but underexplained (e.g., Figure 2).
    • Please include qualitative examples where FMC-Net underperforms or fails to segment.
    • Please clarify the novelty of the State space component (especially its differentiation from existing SSM-based or Mamba-based architectures)
    • I’m reporting some syntax and grammar issues I’ve noted:
    • “sensing spatially-varying contexts” -> capturing spatially-varying contextual information
    • “Spine is vital to the body” -> The spine is vital to the body
    • “The proposed method is implemented based on PyTorch” -> The proposed method is implemented in PyTorch
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a well-motivated approach to 3D vertebrae segmentation, addressing critical challenges such as image blurring and inter-vertebral similarity through the integration of wavelet-based frequency decomposition and multi-scale context modeling via state space modules. The proposed FMC-Net achieves strong empirical performance across two modalities and benchmark public datasets (VERSE2019 and LUMAR), outperforming multiple CNN, transformer, and Mamba-based baselines. However, the overall novelty is moderate, and several important aspects require further clarification. The use of wavelet transforms and state space modeling, while well-integrated, builds heavily on prior work and lacks a clearly articulated innovation beyond existing frequency-aware and SSM-based designs. The MG-SSM module, in particular, is not sufficiently differentiated from recent Mamba-style approaches. Despite these concerns, I believe the architecture is well designed, each module contributes meaningfully (as shown by ablation), and the focus on combining frequency and contextual cues is highly relevant to the problem domain. If the authors can clarify the novelty and clearly delineate their methodological contribution in the rebuttal I would support accepting this paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed my comments during the rebuttal and clarified the novelty of their work.




Author Feedback

We thank all reviewers and address concerns as follows: To R1&2&3 Q1:Writing and figures A1:We’ll carefully revise the manuscript. To R1 Q1:Loss function A1:We use a combined loss of weighted cross-entropy and Dice losses at each decoder stage. No loss is applied for HFR or MG-SSM. Q2:Generalization A2:MICCA2025 is not allowed to add new experiments. However, we test our method on additional prostate segmentation tasks. The results clearly show that our method outperforms the second-best U-Mamba. Q3:Dataset Details A3:VerSe2019 contains 160 scans covering cervical, thoracic, and lumbar views from four manufacturers (GE, Siemens, Philips, Toshiba). LUMAR includes 156 lumbar spine scans from Philips, Siemens, and GE. We use T1-weighted images and split them into training, validation, and test sets (8:1:1). The preprocessing of both datasets is same to nnU-Net without additional operations. We’ll add these details. Q4:Other sequences A4:We conduct experiments on T1-weighted images due to their higher quality. Results on T2-weighted images are lower, with 68.28 DSC and 27.94 HD95. Q5:Dilation rates of MG-SSM A5:The dilation rates follow common practices in other segmentation works. More granularity enables the model to capture more context, enhancing its ability to distinguish similar vertebrae. Q6:Performance differences A6:The performance gap mainly arises from data characteristics. Similar trends of other methods indicate that this is a data-driven rather than model-specific issue. We’ll clarify this. To R2 Q1:Novelty of sampling A1:We agree that DWT is not new for sampling in 2D images or models[1,2]. However, our work extends it to 3D dataset with multi-scale adaptions. More importantly, our work enhances the decomposed high-and low-freq features, yielding better representations than previous methods. Q2:Anatomical priors A2:We agree that anatomical priors are important in model interpretability and robustness. However, such priors may lead to a limited model capacity. Without them, our method achieves stronger performance and generalization. Q3:Novelty of MG-SSM A3:In fact, methods[5,6] directly use Mamba in a U-Net. They can’t extract multi-scale features in single SSM. Very differently, our MG-SSM considers different scales of SSM, and extracts multi-context features across all scales. Compared with other multi-scale methods, e.g., OCTA-Mamba and MSVM-UNet, our MG-SSM jointly processes all scales, capturing distinct contexts and enhancing the discrimination of adjacent vertebrae. Q4:Failure analysis and comparison with [7] A4:We’ll add more analysis and compared results in the main text. Q5:Effects on similar vertebrae A5:We note that they have different effects on similar vertebrae. HFR improves the boundary accuracy, while MG-SSM captures better contextual regions. We’ll verify each module through ablation studies. To R3 Q1:Novelty of methods A1:See A1 for R2. Q2:Major comments A2:1)We’ll include the inference speed in the main text. 2)In fact, each axis has a low-pass (L) and a high-pass (H) filter, resulting in 8 components: one low-freq (LLL) and seven high-freq components (LLH, LHL, LHH, HLL, HLH, HHL, HHH). 3)WTU is used for 5 stages. Theoretically, low-freq features contain most content information. Thus, they are initially integrated. The fused features and high-freq features with complementary details are then upsampled via IDWT. 4)HFR uses grouped pooling to minimize information loss. 5)-6)LUMAR contains only lumbar vertebrae, while VerSe2019 includes all spinal regions, making segmentation more challenging and performance gains relatively smaller. For a fair comparison, we use only T1 and fewer epochs for LUMAR. 7)Results of Baseline + DWT-Sample + MG-SSM are:76.56 DSC and 18.67 HD95. 8)HD95 is sensitive to boundary outliers. With HFR, enhancing boundary information may increases noise, leading to performance fluctuations. DSC stably reflects segmentation quality, clearly showing each module’s contribution.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Authors have addressed most of the comments and there is unanimous acceptance among reviewers.



back to top