Abstract

Selecting the appropriate power for intraocular lenses (IOLs) is crucial for the success of cataract surgeries. Traditionally, ophthalmologists rely on manually designed formulas like “Barrett” and “Hoffer Q” to calculate IOL power. However, these methods exhibit limited accuracy since they primarily focus on biometric data such as axial length and corneal curvature, overlooking the rich details in preoperative images that reveal the eye’s internal anatomy. In this study, we propose a novel deep learning model that leverages multi-modal information for accurate IOL power calculation. In particular, to address the low information density in optical coherence tomography (OCT) images (i.e., most regions are with zero pixel values), we introduce a cross-layer attention module to take full advantage of hierarchical contextual information to extract comprehensive anatomical features. Additionally, the IOL powers given by traditional formulas are taken as prior knowledge to benefit model training. The proposed method is evaluated on a self-collected dataset consisting of 174 samples and compared with other approaches. The experimental results demonstrate that our approach significantly surpasses competing methods, achieving a mean absolute error of just 0.367 diopters (D). Impressively, the percentage of eyes with a prediction error within ± 0.5 D achieves 84.1%. Furthermore, extensive ablation studies are conducted to validate each component’s contribution and identify the biometric parameters most relevant to accurate IOL power calculation. Codes will be available at https://github.com/liyiersan/IOL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3136_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3136_supp.pdf

Link to the Code Repository

https://github.com/liyiersan/IOL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zho_Refining_MICCAI2024,
        author = { Zhou, Qian and Zou, Hua and Wang, Zhongyuan and Jiang, Haifeng and Wang, Yong},
        title = { { Refining Intraocular Lens Power Calculation: A Multi-modal Framework Using Cross-layer Attention and Effective Channel Attention } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a novel method for refining intraocular lens power calculations previously obtained with well-established formulas, based on introducing imaging information and biometric data through a dual encoder network. The model features a RepLKNet-31B backbone that extracts features from an input 3D AS-OCT scan, and combines them all with the embedding of the biometric features, computed using a multilayer perceptron. The architecture is novel, and experiments show a reduction in the MAE and median absolute error for estimating the lens power.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper aims to leverage imaging information to improve the efficacy of existing formulas for lens power estimation, combining their outputs with features obtained by a CNN model and a joint embedding representation that captures the semantics of both biometric data and lens power estimates obtained with other models, simultaneously. Overall I think this idea is technically novel, introducing a multimodal tool that enables to improve lens power estimates with complementary imaging data.

    • The proposed architecture is novel, featuring several components designed to address specific challenges identified by the authors, trained in an end-to-end manner using multimodal information.

    • The experimental results show improvements with respect to the standard formulas used in the clinical practice, although no statistical analysis about the significance of the improvement is provided, and there is no reference value to determine the scale of those improvements.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The overall presentation of the contributions is sometimes not very well articulated in the article, making it difficult to identify the actual relevance of the proposed approach. For instance, some components are introduced to solve issues that are not experimentally demonstrated, e.g. the one that is referred to as a solution to the “low information density” issue of AS-OCT (see comments below). Furthermore, the wording used for referring to specific elements in the network seems erroneous to me, like speaking about a “text encoder” when it actually works on a feature vector in the input (seems to be inherited from the multimodal learning literature around image and text encoders, but it is still incorrect). Finally, in many places in the text (see comments below) authors vaguely refer to flaws in the state of the art that are not properly validated (e.g. implying that any lens power estimator should take into account imaging data, when the focus should be on the clinical application of the tool, e.g. based on the accuracy of the estimation).

    • It is unclear to me what is the expected range of errors in which the lens power estimation would be considered acceptable. Therefore, it is difficult to determine if results obtained from this model are indeed superior to those obtained with the existing formulas. Numbers are very tied (only change in the second decimal in some cases), so without knowing the scale I cannot tell if the obtained improvements are clinically relevant. Finally, no statistical analysis is provided to assess the significance of the improvements.

    • I believe that accuracy is not a proper metric for evaluating this particular algorithm. Lens power estimation is clearly a regression problem, so metrics like MAE or RMSE (root of the mean square error), R-squared and Pearson correlation coefficient are in my opinion the must-have metrics. Stratifying MAEs with ranges of size 0.5 diopters might probably overestimate the actual performance of the model, especially considering that some models are reporting MAEs on this range (e.g. MLP with no prior, Holladay and SRK/T formulas, etc.).

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No details about the architecture of the so-called “text encoder” (MLP) are provided in the text, so it won’t be feasible to implement that component in particular.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • I think that the overall contribution of the article needs to be better articulated. In its correct form, I am afraid that the manuscript struggles to justify the necessity of integrating imaging information for a better lens estimation. For instance, page 2 reads (about formulas): “Firstly, they lack in utilizing comprehensive data, mainly focusing on biometric measurements (…), neglecting the imaging information”. While this is true in the sense that these formulas do not make use of imaging data, it is not demonstrated that this is actually a problem: actually, a model that produces accurate estimates with a single modality is desirable, as it would enable better outcomes at a lower cost, without having to acquire specific equipment or rely on too much calculations. Instead, I would suggest the authors motivate their model by mentioning that imaging information might eventually contribute to improved results, and that they want to explore that avenue with this proposed solution.

    • In line with my previous comment, I would suggest the authors review the numbered list of contributions in Page 3 so that it only includes references to specific points that are novel and might help the field. For instance, I would not mention the experiments as a contribution as they are part of any scientific paper, but instead would focus on the clinical relevance of the results, if any.

    • Similarly to my previous point, at the end of the same paragraph it reads (about recent models) “(…) they still focus on single-modal data and are suboptimal because of simplistic model designs”. Why would a simplistic model design be suboptimal? Wouldn’t it be desirable to have a model that is simple but accurate? The focus should be on the results and the clinical application, not in the complexity of the architecture or the methodological intricacies.

    • In the same paragraph, authors mention that retinal thickness is, among other parameters, vital for accurate calculations of lens power. Is that correct? Could you please provide a citation for that?

    • I am doubtful about the so-called low information density being an actual problem for the model. Authors highlight this as a significant issue that they solved by relying on a RepLKNet backbone, but there are no experiments supporting this limitation. Only Table 2 provides a comparison with a ResNet-50 backbone, but no information regarding differences in capacity between both models is provided, so we don’t know if the changes in performance are due to a different amount of parameters or to this particular concern. Furthermore, the changes don’t seem to be particularly strong, as numbers are quite similar between models. Moreover, when describing the cross-layer attention module, authors indicate that the change in the backbone does not solve the issue of low information density, so there seems to be a contradiction there. Could you please clarify that in the rebuttal?

    • Authors say that “These dumb windows may have negative impacts on backward gradients as they do not provide meaningful direction for parameters updates”, but they don’t show any empirical evidence about this nor cite relevant sources. On the contrary, class activation maps provided in Fig. 3 in the supplementary materials show that several areas with apparent low density information are activated (e.g. in the third row), meaning that the network is still taking these “dumb windows” (as referred by the authors) into account for the prediction. Please, clarify this in the rebuttal too.

    • Adding to that, I’d like to know how authors evaluated that replacing the 3x3 kernel by the 31x31 reduces te dumb windows from 77% to 52%.

    • The current version of the manuscript reads that “5-fold cross validation was used to produce more solid results”. But right before that it says that 20% of the data was used for testing. Could you please clarify how cross-validation was used in this context? Also, experiments were conducted using 174 eyes from 117 patients. Were partitions made at a patient level? Because if eyes of the same patient are included in the training and test sets there might be a data leakage issue.

    • It would be nice to include the range of errors in which a lens power estimation would be considered safe, including relevant clinical citations to it. That way the reader would understand if there is a clinically relevant difference between the proposed model and the already existing approaches.

    • Also it is necessary to include a statistical analysis e.g. through a t-test or a Wilcoxon sign-rank test comparing the MAE and MedAE values in Table 1.

    There are also some minor formatting issues, spelling and grammatical errors to correct, namely:

    • The title includes the wording “cutting-edge”. Adjectivizing a scientific paper is not recommended, as the technical contribution of an article needs to speak for itself.

    • Page 2. “Over the past decades, many manual-designed (…)” - Should be “manually designed”.

    • Page 2. “multiple layer perceptrons” - Should be “multi-layer perceptrons”.

    • Page 2. “retina’s thickness” - Should be “retinal thickness”.

    • Page 2. “deep-learning” - Should be “deep learning”, without hyphenation.

    • Fig. 1 and Fig. 3 should be on top of the page, as mandated by the LNCS formatting rules.

    • Page 3. “as a priori” - Should be “as prior knowledge”.

    • Eq. 1. When using words within equations (like Sigmoid, Avg, Max, etc.), make use of \text{Sigmoid} in LaTeX to avoid italics.

    • Section 3.1. “like age, gender, and preoperative visual acuity”. Unless other features are included, I wouldn’t use the word “like”. And if other features are included, all of them should be listed in the text.

    • Table 1. “MLP (no priori)” - Should be “no prior”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have concerns about some of the contributions, the evaluation, and the overall organization and presentation of the article. I would be happy to change my rating though if the authors can improve the manuscript and address my comments on the rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a deep learning model that uses multimodal information to accurately calculate the power of intraocular lenses. They use a cross-layer attention module to take full advantage of hierarchical contextual information to extract comprehensive anatomical features to reduce redundancy. They also use traditional formulas as prior knowledge to aid model training. The model has been evaluated on a self-generated dataset of 174 samples, with promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors developed a deep learning model that exploits multimodal data to obtain the IOL, which can be considered as pioneering work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The novelty of the given work is limited, the author used RepLKNet as backbone and the introduced CLA is not new. The insight of CLA has been used in several domains when the input requires multiple data sources.

    The size of the dataset used is too small, which may lead to an overfitting problem.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The novelty of the given work is limited, the author used RepLKNet as backbone and the introduced CLA is not new. The insight of CLA has been used in several domains when the input requires multiple data sources.

    The authors consider the green rectangle region (lens nuclear area and posterior subcapsular area) as the main source of information. Why don’t you use segmentation techniques to detect this region and feed it to the network?

    It would be great to see a heat map or some other visualisation technique to see what exactly the model is focusing on, if the attention is on the green rectangle, it would convince readers.

    The size of the dataset used is too small, which may lead to an overfitting problem.

    It is not clear what the biometric data are, please provide more information.

    Please provide more details on the scanning mode of AS-OCT. Is it a volumetric or a single line scan?

    What is the imaging device? It is not clear whether the authors are using the full 3D AS-OCT volume or a single scan to train the model. If the latter, how is the scan selected for training?

    If the output of the given network is the lens power, how are the MAE, MedAE and accuracy metrics obtained? Need more details.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors developed a deep learning model that exploits multimodal data to obtain the IOL, which can be considered as pioneering work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a new multi-model method to estimate the most appropriate power in the intraocular lense surgery. The method utilizes OCT image and tabular data for a better prediction result.The authors also collect a correspoding dataset to train and validate the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The performance of the proposed method is evaluated on a self-collected dataset and compared with traditional calculation formulas and other AI-driven models, demonstrating significant performance improvements and potential for clinical application.
    2. Ablation studies are conducted to prove the effectiveness of network design and significance to use both modalities.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Dataset is not described well. The number of samples in the training and testing set is unkown.
    2. The network architecture is not described well. Where are the input features of the CLA module from? I suppose they are extracted from different layers of the backbone network, but no details is given.
    3. The novelty is inadequate. The proposed CLA module has similar design in other works. The fusion of information from two modality also lacks novelty.
    4. The usage of some terms is confusing. It seems that the biometric data in this work is a kind of tabular data. However, authors call them textual data.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The topic of the manuscript is interesting with potential clinical value. To the best of my knowledge, there is no deep-learning-based study focusing on IOL power estimation. However, some obvious weaknesses affect the publication of the manuscript. The authors should improve its clarity. The figure to describe the network architecture should also be refined.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript has an interesting topic but with unclear description of the dataset and method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I would like to keep my opinion unchanged. My question about the dataset was appropriately answered. But other questions were not. The key points is that all components of the network architecture, such as RepLKNet, CLA, and ECA, were from other works, leading to inadequate novelty in methodology. On the other hand, the work presented novelty in clinical application, while the scale and scope of the dataset was limited. So I think the manuscript is on the borderline.




Author Feedback

To R1, R3, R5 Q1: Reproducibility. A1: Codes will be released upon acceptance. Q2: Metrics. A2: For IOL prediction, we adopt MAE and MedAE following [1, 2]. We evaluate accuracy for two reasons: 1) Predictions with AE < 0.5 D are deemed clinically acceptable [1, 2], and accuracy reflects the proportion of clinically useful predictions. 2) The ground truth is accurate to 0.5 D increments. Q3: Dataset. A3: It includes 16 views of 2D OCT scans acquired from CASIA2 device, and biometric data detailed in Sec. 3.1 and Fig. 3. With 174 eyes from 117 patients, we split 139 eyes for training and 35 for testing. The test sets of 5-fold cross-validation are: Fold 1 (Eyes 1-35); Fold 2 (Eyes 36-70), …, Fold 5 (Eyes 140-174). To R1, R3 Q4: Novelty. A4: We address the challenge of representation learning from OCT images by employing RepLKNet and CLA to emphasize shape information and informative regions. The feature fusion using ECA facilitates efficient integration of multimodal data. Our contribution lies in identifying key challenges and proposing a viable solution. To R1 Q5: Segmentations as input. A5: Pixel-level annotations required for segmentation are challenging to obtain. Heat maps in the supplementary file, show that our model mainly focuses on informative regions. Q6: Dataset size concerns. A6: 5-fold cross-validation ensures reliable results. Our method outperforms transformer-based models prone to overfitting, showing its effectiveness on limited data. To R3: Q7: CLA details. A7: A detailed diagram of CLA is present in the supplementary file. It utilizes features from last and current layers for spatial attention. To R5 Q8: Low information density and dumb windows. A8: Low information density refers to extensive meaningless zero-value pixels. These may lead dumb windows for convolution, where ReLU activation often results in zero responses and affect feature aggregation when pooling. While RepLKNet reduces dumb windows, the extent of this reduction is constrained, as indicated by statistical analysis. Therefore, we introduce CLA for the emphasis on informative regions through attention mechanisms. ResNet-50 employs smaller kernels compared to RepLKNet. While RepLKNet has more parameters and FLOPs than ResNet-50, it shows better results on the collected small-scale dataset, showcasing its superiority to avoid underfitting and overfitting. Additionally, the highlighted regions in CAMs may be informative in other views rather than view 0 and view 15. Q9: The motivation of utilizing multi-modal data and model design. A9: AS-OCT allows detailed analysis of ocular structures, introducing imaging data benefits IOL prediction [3]. Simplistic model designs may struggle to extract effective features, potentially leading to underfitting. Results in Table 1 support this concern: traditional formulas achieve only 60% accuracy, while easy AutoML attain less than 70%. Thus, effective model design is crucial, especially with limited samples, to extract pertinent features while avoiding underfitting and overfitting. Q10: How to count the percentage of dumb windows. A10: It is determined by sliding a window across the image and checking if the sum of pixel values in each window is less than the total number of pixels. (np.sum(window) < w_size * w_size). Q11: Statistical analysis. A11: According to the rebuttal guideline, adding analysis or experiments are not allowed. [1] Carmona Gonz ́alez, D., Palomino Bautista, C.: Accuracy of a new intraocular lens power calculation method based on artificial intelligence. Eye 35(2), 517–522(2021) [2] Stopyra, W., Langenbucher, A., Grzybowski, A.: Intraocular lens power calculation formulas—a systematic review. Ophthalmology and Therapy 12(6), 2881–2902(2023) [3] An, Y., Kang, E.K., Kim, H., Kang, M.J., Byun, Y.S., Joo, C.K.: Accuracy of swept-source optical coherence tomography based biometry for intraocular lens power calculation: a retrospective cross–sectional study. BMC ophthalmology 19, 1–7(2019)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top