Abstract

Despite significant breakthrough in computational pathology that Medical Hyperspectral Imaging (MHSI) has brought, the asymmetric information in spectral and spatial dimensions pose a primary challenge. In this study, we propose a multi-stage multi-granularity Focus-tuned Learning paradigm for Medical HSI Segmentation. To learn subtle spectral differences while equalizing the spatiospectral feature learning, we design a quadruplet learning pre-training and focus-tuned fine-tuning stages for capturing both disease-level and image-level subtle spectral differences while integrating spatially and spectrally dominant features. We propose an intensifying and weakening strategy throughout all stages. Our method significantly outperforms all competitors in MHSI segmentation, with over 3.5% improvement in DSC. Ablation study further shows our method learns compact spatiospectral features while capturing various levels of spectral differences. Code will be released at https://github.com/DHC233/FL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0621_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0621_supp.pdf

Link to the Code Repository

https://github.com/DHC233/FL

Link to the Dataset(s)

https://www.kaggle.com/datasets/hfutybx/mhsi-choledoch-dataset-preprocessed-dataset

BibTex

@InProceedings{Don_Multistage_MICCAI2024,
        author = { Dong, Haichuan and Zhou, Runjie and Yun, Boxiang and Zhou, Huihui and Zhang, Benyan and Li, Qingli and Wang, Yan},
        title = { { Multi-stage Multi-granularity Focus-tuned Learning Paradigm for Medical HSI Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce a transformer-based deep learning architecture, designed for segmenting hyperspectral pathology slides. The method employs a custom Indicative Spatio-spectral transformer with a deformable attention mechanism that adapts to feature relevance, reducing redundancy and focusing on pertinent data. Validated across 2 datasets, this approach significantly improves the Dice Similarity Coefficient (DSC) by over 3.5%, surpassing the performance of existing segmentation methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors achieve a quite significant increase in all the relevant metrics for the datasets presented in this paper, thereby showing the improvement of the proposed network.

    The methodology presented is well-conceived, addressing critical challenges in hyperspectral imaging, such as the high co-linearity between spectral bands. This approach enhances the model’s ability to discern subtle spectral differences, thereby improving the accuracy of segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Comparison to the gold standard: As HSI has significant drawbacks in acquisition time, equipment cost etc, it should also be evaluated with the normal, RGB counterpart. Even the same network could be used solely on the RGB images, but an improvement in performance is to be expected. However, this is not dealt within this paper.

    Model Complexity: No comparison or mention of the model complexity vs the other models are mentioned in the paper. Thereby, the reader does not have any indication if the comparison between models is fair. As a model with more parameters is expected to perform significantly better.

    It is not immediately clear if the model is trained on both datasets or not? The authors should specify, so that it is not up to interpretability of the reader.

    The authors present a generalist approach to hyperspectral image segmentation, but only test on a very specific medical pathology dataset. The credibility of this paper can be significantly improved if the algorithm is also tested on other datasets, be it in the medical hyperspectral range or not.

    Since the image size is reduced to 256x256 you lose quite some detailed information. Other possible methods do patch wise segmentation, however this is not discussed in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Since the proposed architecture is also trained and evaluated on a previously unreleased dataset, the reproducibility is not guaranteed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refrain from using abbreviations in the abstract.

    2.1: Please specify what you understand of image level and disease level? 3.1: Please elaborate on the choice of reducing the file size to 256x256. And not working with patches from the high res images.

    As I understand it the FL is in essence a (learned) dimensionality reducuction technique, why would this perform better than standard dimensionality reduction techniques? A discussion of this would be helpful.

    Please include the network interference time and amount of parameters as this is an important aspect of any algorithmic solution.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The architecture is well constructed, however the applicabily for a medical approach is not well discussed, as is the use of the additional training dataset makes the comparison between models invalid or somewhat skewed.

    Also for reproducibility this dataset should be made publicly avaible, but it is not stated in the paper that this will happen. Or the trained models.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a segmentation framework for Medical HSI Segmentation. A two-stage training strategy is proposed to effectively capture subtle spectral differences at both the disease and image levels while integrating spatial and spectral features. The well-designed intensifying and weakening strategy throughout all stages to enhance the performance. Extensive experiments demonstrate the outperforming improvements.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This manuscript is well-written, the figure is easy to read, highlight the novelty of the proposed method.
    2. The experiments are sufficient and well-designed, effectively highlight the performance of the suggested pipeline, and analysis of the experimental results is thorough.
    3. The proposed Spectral Focus Forge Module (SFF) and Indicative Spatiospectral Transformer are interesting.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The formatting of the references section is inconsistent.
    2. The introduction section lacks the illustration of other medical HSI segmentation method, the authors are advised to emphasize the difference from other frameworks.
    3. As shown in Fig.3., the proposed method seems memory-costing, it’s necessary to show the GPU memory usage during training and the other metrics like parameter counts, FLOPs compared to other framework.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Revise the grammar and formatting issues in the script, standardize the reference formatting to highlight the professionalism of this article.
    2. Provide a more comprehensive overview of Medical HSI segmentation methods, emphasizing the differences between the proposed method and others.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method proposed in this paper is quite innovative, with improvements specifically for medical HSI images. The paper is well-written, the experiments are thorough, and the proposed method holds certain clinical value, making a significant contribution to the medical imaging community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a network with self-supervise learning for MHSI segmentation. It breakthoughs the traditional structures for spectral and spatial feature extraction by using a hybride network. The proposed SFF module with gaussian filter is proved to reduce the redundency between different bands. By proposed self-supervised model QSQL to initialize the weights, the model can increase the inter-class distance and reduce the intra-class distance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -the paper is ordently structured with clear illustrations to describe both the overview and details the model architecture, main strategy and results comparison. -the experiments is rich and covers a wide range of different segmentation models. And the ablation study is well explained.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -some steps are not clearly and a little confusing: model is to capture both disease-level and image-level features. As deccribe in paper, the model firstly captures the disease-level features in SSF module, and then these features in further fed into the next modules to extract image-level. In this way, the image-level features are depending on disease-level, which is similar to sequential structure in Fig1(e). How to preserve the disease-level features for the final mask prediction?
    -This step is inconsistent with the description in Fig2: according to the Fig2: S2 is through DSA and DCA2 while S1 is through the DCA1. -some steps are not convincing: in Bi-Scale Extractor, intensified and weakened stream is separated by different scales in two adaptive average pooling layers is not that reasonable. And the separation of Hspa and Hspe depending on two different scales in adaptive average pooling is also not persuasive.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The experiments were conducted on public and private dataset, but there is no further informtion about the private datset, for example, if it is approved by local ethic committee, collected by which type of device.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This work proposed a new method to segment medical hyperspectral images and improve the accuracy significantly. Self-supervised learning method highlight the novelty and model’s performance. But the model structure itself is a little complex with many different modules sequential connected and some part of it looks not reasonable and convincing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the paper is well constructed and proposed a new method for medical hyperspectral image segmentation especially by introducing a QSAL method. It significantly improve the segmentation accurecy. But some parts and steps of the model are not very reasonable, like using different scales in two adaptive average pooling layers to intensify and weaken the features, and based on this further divided them into spatial and spectral. The motivation for the adoption of DSA and DCA modules has not been satisfactorily explained. The necessity and effects of pretraining on ImageNet, which is used for natural image classification, have not been mentioned.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all the reviewers for valuable comments. We will revise the paper to address grammar and formatting issues in the final version (@R3). In the following, we will address the major concerns one by one. 1.1) @R1: “Image level” refers to the analysis of features from individual images, focusing on their spatial and spectral characteristics. This involves understanding the detailed texture, patterns, and spectral information of a single image. “Disease level”, on the other hand, involves analyzing the spectral characteristics commonly found in datasets of specific diseases to identify subtle spectral differences unique to those diseases. 1.2) @R1: We resized the images to 256x256 for computational efficiency and feasibility, ensuring a balanced width-to-length ratio for large datasets. This uniform size allows for faster processing and reduced memory usage, which is crucial for training deep learning models. While using high-resolution patches can add complexity and potential loss of context, resizing offers a good balance between efficiency and detail for accurate segmentation. 1.3) @R1: Focus-tuned Learning (FL) incorporates dimensionality reduction but goes further by actively focusing on subtle spectral differences critical for medical hyperspectral imaging segmentation. Unlike standard techniques, FL dynamically adjusts its focus based on the data’s spectral and spatial features. This adaptive approach captures more relevant features, leading to improved performance with a simple and efficient operation. 4.1) @R4: The proposed SFF module assigns weights to the spectral features of the images, guiding subsequent modules in their spectral focus. Spectral bands with higher weights have a greater impact on the subsequent segmentation process, thereby indirectly influencing the final mask prediction. 4.2) @R4: In Bi-Scale Extractor, intensified and weakened stream is separated by different scales in two adaptive average pooling layers. The feature size from the adaptive average pooling layers represents different levels of spatial granularity. Larger feature sizes (more dimensions) indicate finer spatial details, while smaller feature sizes (fewer dimensions) represent coarser spatial patterns. This approach is feasible as it aligns with the hierarchical nature of spatial feature extraction in convolutional neural networks.




Meta-Review

Meta-review not available, early accepted paper.



back to top