Friday, October 24, 2025

Using Machine Learning to Predict Myopia in Children from Routine Eye Exams

Share

Understanding Myopia Prediction: Insights from a Pediatric Ophthalmology Study

Ethical Considerations

This study was conducted in alignment with the Declaration of Helsinki, receiving ethical approval from the ethics committee at Rambam Health Care Campus. Notably, consent requirements were waived for this analysis, facilitating a broader examination of available data. The ethics approval number issued is RMB-D-0653-21, ensuring that the study met necessary ethical standards for research involving human subjects.

Data Overview

The dataset utilized for this research stems from patient records at a pediatric ophthalmology clinic, covering the period from 2010 to 2022. Each record represents a unique patient visit and contains significant details, including:

  • Unique patient identifier
  • Visit date
  • Age at the time of the visit
  • Obstetric history
  • Family history of ophthalmological conditions
  • Medical history
  • Comprehensive details gathered during the visit

In this study, textual information related to family history, medical history, or obstetric history was not directly used. Instead, these fields were transformed into binary variables indicating the presence or absence of problems (i.e., default value), streamlining the analytical process. Table 1 provides a detailed list of these variables along with their descriptions and descriptive statistics.

Diagnosing Myopia

Myopia, or nearsightedness, is identified when the spherical equivalent in either eye is less than or equal to -0.50 D. The spherical equivalent is calculated using the formula:

[
\text{spherical equivalent} = \text{objective refraction sphere} + 0.5 \times \text{objective refraction astigmatism}
]

Typically, myopia onset is defined by a spherical equivalent of -0.50 D or more in both eyes. However, our study aimed to detect the earliest signs of myopia progression, enabling classification based on findings from just one eye.

Our dataset records a total of 15,162 visits from 7,937 patients. Only those presenting at least twice and not diagnosed with myopia during their first visit were included. This criterion refined our dataset to 7,814 visits involving 2,437 patients, among whom 429 (11%) eventually developed myopia. The average follow-up time for patients was 3.08 years.

Data Preparation Techniques

During data preparation, missing values in critical fields (e.g., objective refraction sphere, cylinder, angle) were managed by imputing values from the preceding visit when it occurred within three months. This care retained significant dataset integrity while ensuring clinical relevance.

Three additional binary variables were crafted for predictive modeling purposes:

  1. Is myopia last: Indicates if the patient developed myopic refractive change post-visit.
  2. Is myopia next: Indicates if new myopic refractive error was detected in the following visit.
  3. Is myopia within year: Indicates if any myopic refractive change occurred within a year of the visit.

These variables help track patient trajectories over time. For example, if a patient is not myopic during Visit 1 but is diagnosed with myopia at Visit 4, the relevant indicators would be recorded to reflect that trajectory.

To maintain a strong focus on predicting myopia, visits with existing myopia diagnoses were excluded from the analysis.

Model Development Strategy

The research formulated three distinct prediction models for myopia:

  1. Model 1: Predicts whether a patient will develop myopia later, based on data from their first visit. This model utilized 2,437 initial visits, with 279 (11.4%) patients later developing myopia.

  2. Model 2: This model predicts whether patients without myopia will be diagnosed in their next visit, using 5,320 visits. Among these, 317 (6%) were subsequently diagnosed with myopia.

  3. Model 3: This model estimates if myopia will be diagnosed within a year for patients without myopia. It analyzed 2,297 visits, with 172 (5.7%) resulting in diagnoses within the specified timeframe.

Each model included predictors such as age, gender, obstetric history, medication sensitivity, medical history, ophthalmology history, slit lamp examination results, visual acuity, cycloplegic refraction, and amblyopia presence. The incorporation of both eyes’ refractive data was a strategic choice; empirical tests revealed that simplifying to only one eye’s data consistently lowered the model’s performance.

Machine Learning Algorithms

To train the models, we employed two well-regarded machine learning classification algorithms: Random Forest (RF) and Gradient Boosting Tree (GBT), both accessible via the scikit-learn Python library. These methodologies are popular due to their effectiveness in classification tasks.

The RF model operates by constructing a "forest" of decision trees, each trained on randomly selected subsets of the data. Predictions for a given sample are formed by a majority vote among the trees, enhancing overall predictive accuracy.

Conversely, the GBT model builds its predictive power by incrementally adding decision trees through a gradient descent approach, specifically targeting model error reduction. This method is particularly adept at revealing complex relationships within data.

Both models offer valuable feature importance analysis, highlighting which variables most significantly influence myopia development. This capability aids in understanding and interpreting the underlying factors driving predictions.

Addressing Class Imbalance

Given the imbalance present in the datasets—where myopia occurrence is significantly lower than non-occurrence—we implemented up-sampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE). This technique synthesizes novel, artificial samples to better represent the minority class, enhancing the model’s ability to discern important nuances.

Evaluation Metrics

To evaluate model performance rigorously, we employed a range of standard machine learning metrics, including accuracy, sensitivity (recall), specificity, precision, F1-score, and area under the ROC curve (AUC). A robust 10-fold cross-validation procedure ensured thorough assessment and information integrity, preventing data leakage by maintaining patient visit exclusivity between training and test sets.

By employing these structured methodologies, the study sets a foundation for understanding myopia progression and detection in pediatric populations, providing critical insights into early intervention strategies.

Read more

Related updates