Abstract

Diabetic macular edema (DME) is a leading cause of vision loss worldwide. Optical Coherence Tomography (OCT) serves as a widely accepted imaging tool for diagnosing DME due to its non-invasiveness and high resolution cross-sectional view. Clinical evaluation of Hyperreflective Foci (HRF) in OCT contributes to understanding the origins of DME and predicting disease progression or treatment efficacy. However, limited information and a significant imbalance between foreground and background in HRF present challenges for its precise segmentation in OCT images. In this study, we propose an attention mechanism-based MUlti-dimensional Semantic Enhancement Network (MUSE-Net) for HRF segmentation to address these challenges. Specifically, our MUSE-Net comprises attention-based multi-dimensional semantic information enhancement modules and class-imbalance-insensitive joint loss. The adaptive region guidance module softly allocates regional importance in slice, enriching the single-slice semantic information. The adjacent slice guidance module exploits the remote information across consecutive slices, enriching the multi-dimensional semantic information. Class-imbalance-insensitive joint loss combines pixel-level perception optimization with image-level considerations, alleviating the gradient dominance of the background during model training. Our experimental results demonstrate that MUSE-Net outperforms existing methods over two datasets respectively. To further promote the reproducible research, we made the code and these two datasets online available.

Keywords: Hyperreflective foci · OCT · Attention · Segmentation

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4063_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/iMED-Lab/MUSEnet-Pytorch

Link to the Dataset(s)

https://github.com/iMED-Lab/MUSEnet-Pytorch

BibTex

@InProceedings{Wan_AHyperreflective_MICCAI2024,
        author = { Wang, Xingguo and Ma, Yuhui and Guo, Xinyu and Zheng, Yalin and Zhang, Jiong and Liu, Yonghuai and Zhao, Yitian},
        title = { { A Hyperreflective Foci Segmentation Network for OCT Images with Multi-dimensional Semantic Enhancement } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a 2D semantic segmentation model of small pathologic structures (hyperreflective foci (HRF)) in optical coherence tomography (OCT) images. To tackle the strong class-imbalance between background and tiny foreground objects they introduce a joint loss combining Dice and Pixel-level perceptual optimization loss (PO), where the latter is related to the focal loss. Furthermore, they propose and an attention-based semantic information enhancement using information from a single OCT slice (single-slice enhancement module (SEM)) and between adjacent slices (Multidimensional Enhancement Module (MEM)), which may be seen as a 2.5D channel and spatial attention module.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Combining attention modules within slices and between slices, (SEM, MEM) and introducing PO + Dice loss to improve segmentation of small objects.
    • Evaluation with state-of-the-art HRF segmentation models.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Amount of novelty. SEM, MEM, PO have been proposed elsewhere. The novelty of this paper is the combination of these three components.
    • The benefit of using PO as loss is not clear. Performance gain may result from using Dice only. Furthermore, there are many strategies available in tackling class inbalance (weighted CE, focal loss, BCE + soft Dice, etc.), which are not considered here.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Dataset is claimed to be released. Method is described with sufficient details to reproduce the network.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major:

    • I wonder how SW-3DUNet [14] was trained, as it is a 3D method, but training data are 2D slices.
    • Ablation study: It is not clear what model the backbone represents, and in particular what loss was used.
    • Focal loss in combination with (Soft) dice is quite common in this kind of segmentation. Also other loss strategies are available to tackle class imbalance. Elaborating on these loss strategies is out of scope for this MICCAI submission but may be considered in subsequent journal publication.

    Minor: Table 2 caption. Add details what M1, M2, and M3 are. Fig.5 It is not clear how hyperreflective clumps(HC), noise (NS) and Hyperreflective dots (HD) are obtained from the model. Dataset and model seem to contain HRF only. However, Fig 5. is not that relevant for a method paper and may be skipped as well.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of novelty. All components have been described elsewhere, and the benefit of using them is not clear, in particular the loss function. There are many strategies (in particular loss functions) available that tackle small object segmentation and class imbalance, but were not considered here.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper addresses the challenging task of segmenting Hyper-reflective foci(HRF) in retinal OCT images. HRF is an important biomarker in Diabetic Macular edema and also linked to AMD(Age-related Macular Degeneration).

    The task poses challenges due to the appearance of HRF as small dots, resulting in significant class imbalance between foreground and background pixels. Additionally, metrics and loss functions like Dice, reliant on overlap percentage, often struggle because of the very small sizes of HRF. Even a small difference of 1 pixel, possibly stemming from errors in expert grading, can cause a substantial decrease in Dice scores.

    The paper extends the nnUnet architecture with a novel multi-scale patch level attention within each slice (SEM module) and an inter-slice attention mechanism to improve segmentation performance (MEM module)

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Good performance improvement over many popular segmentation network architectures has been shown in Table 1.
    2. Key contribution of the paper is architectural innovation by extending the nnUnet architecture with SEM module for multi-scale patch-level attention and the MEM module to incorporate attention across 3 adjacent slices. MEM module is inspired from [11] but significantly adapted for the segmentation task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Contrary to the paper’s claim (contribution point 2 in page 2), I did not find the loss to be novel. Dice and pixel-level Binary Cross Entropy losses are commonly employed in segmentation tasks. Additionally, in this paper the BCE loss was weighted using the existing method proposed in FarSeg and FarSeg++ [12] . I am also curious as to why an additional weightage to the fore-ground pixels was not used to handle the class imbalance in addition to the FarSeg based weights which weighs the difficult to segment pixels. Moreover, the loss seems to have a minor improvement over Backbone in Table 2 (row 1 vs 2). It would have been interesting to see if the loss really leads to any improvement when compared to Backbone+M2+M3, ie only the joint loss is removed from the proposed method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?
    1. In the absence of public datasets for this task, the proposed method has been evaluated on two in-house datasets with a promise to make them publicly available ((See contribution point 3, at the end of page 2) in the un-anonymized version.

    2. Since, this paper introduces novelty in network architecture. Making the code available (at least of the introduced SEM and MEM blocks) would significantly improve the value of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. What value of gamma was used in the joint loss?

    2.The SEM module applies the Squeeze and Excite mechanism but at a patch-level considering 4 different multi-resolution patch sizes in parallel followed by 2 convolution layers for calibration, resulting in P. A 4X1 (Softmax) attention weight is computed through Global Average Pooling and 1D conv to obtain W. It is not clear to me as to how a matrix multiplication of P (with dimensions CX4XHXW) could be performed with W (of size 4X1)? My guess is that a weighted average of the 4 sets of features is performed, ie., W[k,1] is multiplied to P[:,k,:,:] and a sum across the 2nd dimension of size 4 should be performed. The intermediate tensor shape and sizes in Fig 2 could be added for more clarity on the exact operations performed in the proposed attention modules.

    1. Similarly, the matrix multiplication operation in eq. 3 was not clear to me. The explanation can be enhanced by adding tensor shape and sizes in Fig. 3 for all the intermediate operations.

    2. In the Ablation experiments, it is mentioned that the blocks are “sequentially” added (last three lines in pg 7). What does it mean? For eg, in row 3 of table 2, Does “Backbone+M2” actually mean that both M1(joint loss) and M2(SEM) are added (ie., Backbone+M1+M2) or does it only mean “Backbone+M2”, ie., joint loss is removed and only SEM is added ? Also, for the Backbone model in row 1, which loss was used for training?

    3. Cross-Testing performance. Evaluating the average performance of the already trained 3 models (1 per fold) trained on HRF-1 on the entire HRF-2 dataset and vice-versa would give an idea of the inter-scanner generalization performance. How much would the performance drop across scanners?

    4. The Multi-dimensional Enhancement Module (MEM) employed 3 slices in X_{rem}. What is the input to the proposed network? Is it 3 consecutive slices ?

    5. Dataset: It is stated that : “this study randomly selects 8 consecutive B-Scans from each OCT volume for manual annotation”. After annotation, did all 8 slices have HRF? If not, then were the slices which did not contain HRF still used for training and testing or only the slices that contained HRF were used? Also, was the Dice score computed at a slice level or at a volume-level or for 3 consecutive slices?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good performance on a challenging segmentation task was achieved outperforming several popular segmentation architectures.

    Novel attention based architectural changes have been introduced to attend to the image features at multiple scale (SEM) and to propagate information across slices.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed some of the queries in the rebuttal successfully.

    1. The details of the matrix multiplication in the SEM block and eq. 3 has been partially addressed. Authors have committed to provide code links in the final version which should clarify this issue. I also suggest adding the tensor shapes and sizes at each intermediate step in Fig. 2 and Fig. 3 to improve clarity in the camera-ready version, if accepted.

    2. Point 4 in my previous review comment has not been addressed: The last line on pg. 7: “Sequentially, we introduce M1: joint loss, M2: single-slice semantic enhancement, and M3: multidimensional semantic enhancement into the backbone.” In rows 2-4 of Table 2, what does “Backbone+M1”, “Backbone+M2” and “Backbone+M3” mean, are the losses added sequentially, ie., is “Backbone+M3”= “Backbone+M1+M2+M3” or M1 and M2 are not used and only M3 used in row 4? This has not been clarified.

    3. Was the Dice score computed at a slice level or at a volume-level or for 3 consecutive slices(comment 7 in my previous review)? This has not been clarified.



Review #3

  • Please describe the contribution of the paper

    The paper proposes an attention-based network approach to Hyper-reflective foci (HRF) segmentation in OCT images. The approach aims to overcome the challenges faced in HRF segmentation and enhance its accuracy and efficiency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors proposed a novel attention based multi-dimensional semantic information enhancement network for precise HRF segmentation using OCT images. The original contribution resides in the use of an attention modules in SEM modules to enhance semantic information.
    2. The other main important module in this paper is multi-dimensional enhancement module to exploit contextual information. Channel attention and spatial attention refined the feature map which is interesting.
    3. Introduced a perceptual loss to estimate weights for the hard samples, assigns more weightage to difficult samples which helps to fix the imbalanced dataset issue.
    4. Two new manually annotated OCT datasets are developed and made available online for new studies. The proposed model significantly improved the performance compared to existing state of art techniques UNet, SA-NET, nnUNet etc.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main weakness of the paper is the dataset size which is small.
    2. Though the paper is well written , adding more references would give a better idea of previous works.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I believe that the paper can be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although the paper is well written, more references would improve it.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Upon reviewing the paper, I find it to be quite intriguing in nature. The results obtained from the proposed SEM and MEM modules are particularly promising. What sets these modules apart is their ability to learn complex multi-scale features, which in turn contributes to refining the segmentation result.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I did not change my decision




Author Feedback

We appreciate your positive comments on our work (e.g., ‘Channel attention and spatial attention refined the feature map which is interesting’ by R#1, ‘Method is described with sufficient details to reproduce the network’ by R#3, ‘Novel attention based architectural changes’ by R#4). We address the main concerns below.

Q1: Dataset Construction and Size. (R#1, R#4) A: We randomly selected 8 consecutive B-scans from each OCT volume for manual annotation of HRF, carefully ensuring that they are in either the training or testing sets. Although we didn’t constrain HRF presence in training B-scans, our network is well-trained. While time-consuming manual annotation limits the dataset size, we plan to expand and make it publicly available in the future.

Q2: Novelty of SEM and MEM Modules. (R#3) A: Inspired by clinical practice, the SEM and MEM simulate ophthalmologists’ use of local and global perspectives to detect lesions in OCT images. Variations in HRF characteristics such as shape, size and brightness within a slice, and their proximity to tissues of similar reflectivity, require switching between detailed and broader views for accurate detection. HRF lesions often span 2-4 B-scans, making adjacent slice examination essential.

The SEM uses 4 branches to analyze regional importance at varying scales, simulating a mix of views within each slice. The MEM combines the SEM’s multi-scale attention with channel and spatial attention across multiple slices, enhancing multi-dimensional information extractions. These innovative modules, presented here for the first time, may offer new insights for future medical image analysis research.

Q3: Joint Loss Function. (R#3, R#4) A: Given HRF’s small size and major background-foreground imbalance, minor imperfection impacts segmentation results greatly. Alternatives like weighted CE, focal and soft Dice loss are partial solutions. Dice loss relies on overlap struggling with small objects, CE lacks difficult sample weighting, and focal loss uses unreliable early predictions. We opted Dice plus Perceptual Optimization (PO) losses. They dynamically adjust the weights for hard examples in early training using a decreasing cosine annealing function and focus on challenging samples as confidence grows. Extra foreground pixel weighting might address class imbalance but risks neglecting boundary samples. Experiments show this joint loss function, with gamma at 1, improves HRF segmentation, outperforming other settings (0.5, 1, 2, 5).

Q4: Model Structure and Training. (R#3, R#4) A: We used the 2D nnUNet with the standard Dice+CE loss function as the backbone, but MUSE-Net processes three consecutive slices through the SEM module to generate three feature maps per layer. These maps, as inputs to the MEM module (X_rem), are integrated and then fed into the semantic decoder to produce the prediction map. This configuration makes our model as a 2.5D segmentation network with a 2.5D attention module. We converted three slices into 3D NIfTI format for the comparison of 3D methods.

Concerning SEM and MEM module implementation, as R#4 highlighted, the matrix multiplication of W and P in the SEM module calculates region-sensitive weights [C, H, W] by averaging P[C,K,H,W] on the second dimension using W[K,1]. Further details on the matrix multiplication (Equation 3) in MEM will be provided upon public release of the code.

Q5: Necessity of Figure 5. (R#3) A: Figure 5 aims to emphasize its significance in clinical applications. Our research on the segmentation and visualization of HFR in OCT images has substantial implications for the management of disease processes and the evaluation of treatment efficacy. Figure 5 showcases HRF subtypes obtained through subsequent quantization processing, illustrating the distribution of HRF in the retina. This highlights the clinical applicability of our research in understanding and managing retinal diseases, beyond just methodological contributions.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a specific segmentation method for foci segmentation in OCT images. The reviewers raise concerns about its novelty; indeed, neither component is completely novel and has been discussed elsewhere. However, the authors combine these components to provide a solution that outperforms the default nn-Unet. I think this paper should be valuable and worth discussing for the MICCAI audience.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents a specific segmentation method for foci segmentation in OCT images. The reviewers raise concerns about its novelty; indeed, neither component is completely novel and has been discussed elsewhere. However, the authors combine these components to provide a solution that outperforms the default nn-Unet. I think this paper should be valuable and worth discussing for the MICCAI audience.



back to top