Abstract

Imaging modalities, such as Optical coherence tomography (OCT), are one of the core components of medical image diagnosis. Deep learning-based object detection and segmentation models have proven efficient and reliable in this field. OCT images have been extensively used in deep learning-based applications, such as retinal layer segmentation and retinal disease detection for conditions such as age-related macular degeneration (AMD) and diabetic macular edema (DME). However, sickle-cell retinopathy (SCR) has yet to receive significant research attention in the deep-learning community, despite its detrimental effects. To address this gap, we present a new detection network called the Cross Scan Attention Transformer (CSAT), which is specifically designed to identify minute irregularities such as SCR in cross-sectional images such as OCTs. Our method employs a contrastive learning framework to pre-train OCT images and a transformer-based detection network that takes advantage of the volumetric nature of OCT scans. Our research demonstrates the effectiveness of the proposed network in detecting SCR from OCT images, with superior results compared to popular object detection networks such as Faster-RCNN and Detection Transformer (DETR). Our code can be found in github.com/VimsLab/CSAT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3665_paper.pdf

SharedIt Link: https://rdcu.be/dV1WT

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_54

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3665_supp.pdf

Link to the Code Repository

https://github.com/VimsLab/CSAT

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Bha_Analyzing_MICCAI2024,
        author = { Bhattarai, Ashuta and Jin, Jing and Kambhamettu, Chandra},
        title = { { Analyzing Adjacent B-Scans to Localize Sickle Cell Retinopathy In OCTs } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {574 -- 584}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper explores deep learning analysis in OCT images of patients with sickle cell retinopathy. The disease causes focal thinning of the retina, which can be detected in OCR.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is a novel application of deep learning to support diagnosis in sickle cell retinopathy. The proposed Cross Scan Attention Transformer (CSAT), an extension of an existing network, performs well on the data. CSAT leverages information from adjacent slices of the OCT scan.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is not clear how other people could reproduce the the work reported as there is no mention of where to find the training and testing data and whether the trained network is available somewhere.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

See the section about main weakness.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The paper is well written. I have no complaints other than reproducibility.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an excellent paper, but the lack of support for reproducibility reduces my enthusiasm.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposed detection network for SCR from OCT scans, named CSAT, which consists of a transformer-based pre-training network and an object detector. On their internal dataset, experimental results and ablation study demonstrate its effectiveness.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposed detection network for SCR from adjacent B-scans, instead of individual images, using pre-trained embeddings and attention mechanisms, CSAT, unique extension to the DETR.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This paper uses their own internal dataset and does not mention public information about the data and code.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

No
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

(1) In 3.1 Phase 1: CSAT Pre-trainer, “This network is trained to discern whether two augmented B-scans originate from the same OCT volume of the same patient or from different OCTs of distinct patients.”, What’s the reason for this design here? In the detector network of Phase2, I understand that the adjacent images sent to the network come from the same OCT of the same patient. Why does the encoder need to be judged? The discussion of this point is related to the main novelty of this article, and the author(s) is recommended to conduct in-depth analysis and discussion. (2) Numbers of studies are mentioned in the related work section of the paper, such as Jing et al. [14], Deformable-DETR [28] and CF-DETR [4]. Necessary comparisons should be made between these studies and the results of this paper, especially improvements of DETR algorithm, “further improvise the results”. (3) If the SCR b-scans dataset cannot be made public due to certain factors, it is recommended that the author make the code public to increase the reproducibility of the method.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Proposed novel detection network for SCR diagnosis.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

Extends use of transformers to a particular medical use case, viz detection of sickle cell retinopathy in OCT.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Develops an interesting architecture that uses adjacency of OCT slices to focus the trained network on inner layers (presumably this is a good thing). Addressing sickle cell retinopathy is a good thing, though see below.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(not really a weakness, just a gap) It was unclear to me whether the method is particularly suited to SCR vs any other retina pathology diagnosis. That is, how central is SCR to this paper, beyond providing the dataset? Some clarification of this would be useful.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

Or maybe I missed something.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Below are some points where I, an average reader, stumbled: pg 1: “remais limited”: references? pg 1: “300,000…”: primarily of African descent in the US, and a population generally underserved. Also, the origin is an anti-malaria mechanism. Both points of interest. pg 2: Is Jing the only ML for SCR for OCT? pg 2: Is it the case that there are no 3-dimensional DETRs, and are there other 3-D transformers? ie is entering data as 2-D slices required, or was it a choice? pg 3: Siamese network: ref? pg 3: “whether two augmented…distinct patients”: this contradicts figure 1, which seeks to detect adjacency. Later (in 4.2) it is mentioned: close-by slices from same patient, or slices from different patients. Why not far-away slices from one patient? This detail in the paper could be aligned in the 3 spots, and the logic of the pairing choices explained a bit more. pg 4: “ultimately cousing …structure”: Is this an observation of what happened, or a necessary effect that was planned/built-in as a goal of training? Also, how do you know this happened (what forensic method reveals this effect)? Pg 4: “where n is an odd…”: how big is n relative to total number of scans? How important is the choice of n? pg 5: “only receive position encodings”: is this CS or an integer? pg 5: “N classes”: what are the classes? pg 5: 3 advantages #2: “a b-scan is input as a neighber”: this is always true by construction? pg 5: 3 advantages #3: I do not understand this item. The opposite seems true: the transformer has no access to the spatial information as to where each of the n slices is. pg 5: “alpha = 0.15”: neat idea - is there a reference for this, or newly minted here? Eqn 1: Does cos similarity measure linear distance, ie does it give a sense of near vs far, or just here vs there? pg 5: How are the choices of loss function motivated?. pg 5: what motivates the weights assigned to the losses for each class? I did not understand the 2,3,5 and the 0.05,0.6,0.35. pg 6: first text: font error. Also, an unfortunate placement, since it looks like caption or footnote for the table rather than body text. Table 1: caption “class probability”: But the pretrainer looks at adjacency - where does class come in (or does this refer to same vs different patient)? pg 6 last line: perhaps make the definitions of a, b, c more explicit so that they impress on the reader’s mind - I missed the definition. Table 2: you could also repeat the a,b,c definition here for ease. Table 2: Can you include +/- std devs from your 5-fold CV? This will give a sense of whether the differences in the table are meaningful. pg 7: Is code publicly available? Fig 3: good informative plots.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

An interesting, incremental extension of OCT analysis, likely of interest to a subset of MICCAI peeps.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We appreciate the valuable feedback provided by the reviewers and will consider their suggestions when preparing the final version of the paper.

(R1, R2, R3) Reproducibility: All the reviewers expressed concerns about the reproducibility of the paper. The camera-ready version of the paper will include a link to the working code and trained models.

(R1, R3) Pre-training method design: The pre-trainer extracts the fundamental features from the B-scans to help the detector converge more quickly with less computation. The CSAT pre-trainer is trained to classify a pair of B-scans into two classes. Positive: the pairs are adjacent B-scans from the same patient. Negative: the B-scans belong to different patients (hence, they are not adjacent). This process ensures that while learning to identify adjacent B-scans, the model focuses on the inherent features unique to the adjacent B-scans. Those features are the shape, contour, and the presence of the same artifacts in both the B-scans. Hence, by pre-learning these features, the encoder provides the information that the detector needs i.e., the similar artifacts between two B-scans. To verify this theory, we extensively visualized attention maps between the positive and negative pairs, similar to Figure 2. We observed that the common features in the adjacent B-scans were highlighted in the case of positive pairs, whereas there were few to no highlights in the negative pairs.

(R1) We are grateful for R1’s comments and will address all the proposed edits in our camera-ready version.

(R1) How central is SCR to the paper?: SCR plays a central role in this paper as the method was specifically designed to detect SCRs. However, it can also be used to identify other pathologies with similar characteristics.

(R1) Choice of ‘n’: Section 4.3 and Figure 3 (a) explain the importance of ‘n’. Our experiments with ‘n’ = 1, 3, and 5 show that increasing ‘n’ improves detection accuracy but also increases the model’s computational complexity. However, the rate of accuracy improvement decreases with higher values of ‘n’, which makes it counterproductive.

(R1) ‘N classes’: ‘N classes’ on page 5 refer to two classes, SCR and Fovea. (R1) Page 5, Advantage 2: Yes, the weights are updated every time a B-scan is input as a neighbor of other B-scans by construction.

(R1) Page 5, Advantage 3: Considering a B-scan, ‘P’, we mean that during detection, the features from B-scans closer to ‘P’ are paid more attention to than those that are farther away from ‘P’.

(R1) Loss functions: The loss functions are similar to those used in DETR, as mentioned in section 3.3. The loss weights were chosen after a series of experiments performed with varying values for weights.

(R1) ‘Class probability’ in Table 1: The ‘Class probability’ refers to the probability that the input pairs of B-scans are adjacent.

(R3) Comparison with existing methods: Jing et al. trained a YOLO network for SCR detection. In Table 2, we have compared our method with YOLO. Results show that CSAT is superior to YOLO in our dataset. Our method can be used in any DETR-like networks, such as CF-DETR and Deformable DETR by simply replacing the decoder.

Meta-Review

Meta-review not available, early accepted paper.

back to top

Analyzing Adjacent B-Scans to Localize Sickle Cell Retinopathy In OCTs

Author(s):