Machine learning model for predicting out-of-hospital cardiac arrests using meteorological and chronological data

Objectives To evaluate a predictive model for robust estimation of daily out-of-hospital cardiac arrest (OHCA) incidence using a suite of machine learning (ML) approaches and high-resolution meteorological and chronological data. Methods In this population-based study, we combined an OHCA nationwide registry and high-resolution meteorological and chronological datasets from Japan. We developed a model to predict daily OHCA incidence with a training dataset for 2005–2013 using the eXtreme Gradient Boosting algorithm. A dataset for 2014–2015 was used to test the predictive model. The main outcome was the accuracy of the predictive model for the number of daily OHCA events, based on mean absolute error (MAE) and mean absolute percentage error (MAPE). In general, a model with MAPE less than 10% is considered highly accurate. Results Among the 1 299 784 OHCA cases, 661 052 OHCA cases of cardiac origin (525 374 cases in the training dataset on which fourfold cross-validation was performed and 135 678 cases in the testing dataset) were included in the analysis. Compared with the ML models using meteorological or chronological variables alone, the ML model with combined meteorological and chronological variables had the highest predictive accuracy in the training (MAE 1.314 and MAPE 7.007%) and testing datasets (MAE 1.547 and MAPE 7.788%). Sunday, Monday, holiday, winter, low ambient temperature and large interday or intraday temperature difference were more strongly associated with OHCA incidence than other the meteorological and chronological variables. Conclusions A ML predictive model using comprehensive daily meteorological and chronological data allows for highly precise estimates of OHCA incidence.


ABSTRACT Objectives
To evaluate a predictive model for robust estimation of daily out-of-hospital cardiac arrest (OHCA) incidence using a suite of machine learning (ML) approaches and high-resolution meteorological and chronological data. Methods In this population-based study, we combined an OHCA nationwide registry and high-resolution meteorological and chronological datasets from Japan. We developed a model to predict daily OHCA incidence with a training dataset for 2005-2013 using the eXtreme Gradient Boosting algorithm. A dataset for 2014-2015 was used to test the predictive model. The main outcome was the accuracy of the predictive model for the number of daily OHCA events, based on mean absolute error (MAE) and mean absolute percentage error (MAPE). In general, a model with MAPE less than 10% is considered highly accurate. Results Among the 1 299 784 OHCA cases, 661 052 OHCA cases of cardiac origin (525 374 cases in the training dataset on which fourfold cross-validation was performed and 135 678 cases in the testing dataset) were included in the analysis. Compared with the ML models using meteorological or chronological variables alone, the ML model with combined meteorological and chronological variables had the highest predictive accuracy in the training (MAE 1.314 and MAPE 7.007%) and testing datasets (MAE 1.547 and MAPE 7.788%). Sunday, Monday, holiday, winter, low ambient temperature and large interday or intraday temperature difference were more strongly associated with OHCA incidence than other the meteorological and chronological variables. Conclusions A ML predictive model using comprehensive daily meteorological and chronological data allows for highly precise estimates of OHCA incidence.
Out-of-hospital cardiac arrest (OHCA) is becoming a substantial worldwide health burden. 1 2 A systematic review of the international epidemiology of OHCA from 1991 to 2007 reported that the estimated incidence of emergency medical services (EMS)-attended OHCA per 100 000 person-years was 86.4 in Europe, 98.1 in North America, 52.5 in Asia and 112.9 in Australia. The percentage of patients with survival to discharge is extremely low: 9.4% in Europe, 6.3% in North America, 2.2% in Asia and 10.7% in Australia. 2 Accurately predicting the daily incidence of OHCA may provide a significant public benefit. Since the incidence of OHCA is affected by meteorological conditions, 3-10 the application of high-resolution meteorological data to medicine might provide ways to improve predictions of the daily incidence of OHCA.
Machine learning (ML) has recently emerged as a novel approach to integrate multiple quantitative variables to improve diagnosis and accuracy of incidence predictions in cardiovascular medicine. [11][12][13] Since meteorological data are very extensive and complex, ML can help identify associations not identified by conventional one-dimensional statistical approaches. By combining OHCA data with high-resolution meteorological data, such as daily forecasts, ML could use advanced analytics to build a warning system for individuals potentially at risk for OHCA of cardiac origin through internet of things (IoT) devices.
This study presents a predictive model for robust estimation of daily OHCA incidence of cardiac origin using a suite of ML approaches. This model was evaluated using a nationwide database of OHCA, as well as comprehensive meteorological data and chronological data.

Study design and setting
We matched two datasets between 1 January 2005 and 31 December 2015 at the hour level based on the time of the emergency call: the All-Japan Utstein Registry of the Fire and Disaster Management Agency (FDMA) dataset on patients with OHCA of cardiac origin and a meteorological dataset from the Weather Company, an IBM Business (Atlanta, Georgia, USA). We classified data from 1 January 2005 to 31 December 2013 in this merged dataset as the training dataset for developing the predictive models and data from 1 January 2014 to 31 December 2015 as the testing dataset for assessing whether the predictive models can work in other years. Japan has an area of approximately 378 000 km 2 and its population was approximately 127 million in 2005. 14 Online supplemental figure 1 shows three representative cities located at different latitudes in Japan: Sapporo at N43°, Kobe at N34° and Naha at N26°.
We performed a four-step analysis. First, we elucidated the association between the incidence of OHCA and daily meteorological and chronological variables. Second, we developed an ML predictive models for OHCA incidence based on daily meteorological data, chronological data and combined meteorological and chronological data from the training dataset. 15 16 Third, we examined the concordance between the predicted incidence of OHCA based on the ML model and the observed incidence of OHCA in a testing dataset. To further examine concordance at the district level after the time period covered by the original dataset, we performed heatmap analysis using another dataset on the location of OHCA in Kobe city between 1 January 2016 and 31 December 2018. The Kobe Municipal Fire Department has detailed information about where OHCA events occurred in certain districts. The population of Kobe city is more than 1.5 million. Its age distribution (population pyramid) is similar to that of Japan overall. Finally, we investigated the relative strength of the associations between meteorological variables and the incidence of OHCA in each predictive model. The main outcome was the daily incidence of OHCA.
A subcommittee for resuscitation science in the Japanese Circulation Society was provided with registry data following the prescribed governmental procedures.

OHCA dataset
Patients with OHCA of cardiac origin in the FDMA's All-Japan Utstein Registry were included. The All-Japan Utstein Registry is a prospective, population-based, nationwide registry of patients who have had an OHCA event. Data were prospectively recorded using the internationally standardised Utstein template. 17 The following patient information was collected and analysed: age, sex, aetiology of arrest (ie, cardiac or non-cardiac) and time of the emergency call. All event times were synchronised with the dispatch centre clock. In Japan, all patients with OHCA who received prehospital resuscitation efforts by EMS personnel are transported to a hospital because they are not permitted to terminate resuscitation in the field. Data were stored on the FDMA registry database server and checked for missing values. If a data form was incomplete, the FDMA returned it to the respective fire station for completion. The registry has yielded some findings about patients with OHCA. 14 18-21 Details of the registry are described in the online supplemental materials.

Meteorological and chronological dataset
We analysed meteorological data from the Weather Company (https://www. ibm. com/ weather) that operates a weather forecasting service platform (online supplemental materials). Between 1 January 2005 and 31 December 2015, the resolution of the meteorological data was 30 km gridded points (Weather Company Data Packages). In 2016, the resolution improved to 4 km gridded points. Meteorological variables included ambient temperature (°C), relative humidity (%), precipitation during the previous hour (mm), snowfall (mm), cloud coverage (%), wind speed (kph) and atmospheric pressure (hPa). Chronological variables included year (2005 was considered year 0), season (spring: March-May; summer: June-August; autumn: September-November; winter: December-February, with winter coded as the reference value), day of the week (with Sunday coded as the reference value), holidays and Japanese holiday season from 29 December to 6 January (categorical variable with a value of 0 or 1).

Development of predictive models
To develop predictive models for the daily incidence of OHCA, we used the the eXtreme Gradient Boosting (XGBoost) algorithm, which is an optimised distributed gradient boosting library widely used by data scientists for many ML challenges. [11][12][13] Hyperparameters of the XGBoost algorithm were chosen to maximise the predictive ability of the model using fourfold crossvalidation. In fourfold cross-validation, we classified our dataset into four groups, and the XGBoost algorithm fitted decision trees to three groups and used the remaining group for validation. This procedure was performed four times with a different validation group each time. Population size for each prefecture was included in the XGBoost algorithm as an offset term. We did not use ambient air pressure to avoid possible multicollinearity.

Statistical analysis
The characteristics of present dataset were summarised with medians and IQRs for continuous variables and numbers and percentages for categorical variables by prefecture and day in the training and testing datasets. The generalised linear models (GLMs) based on the Poisson distribution investigated the associations between meteorological variables and daily OHCA incidence by prefecture using all data in univariable models and a multivariable model. We exponentiated regression coefficients and 95% CIs to present incidence rate ratios (IRRs) for estimated OHCA incidence with each 1-unit increase in a meteorological variable. A p value of less than 0.05 was considered to indicate a significant difference.
We evaluated the predictive accuracy of the predictive models based on mean absolute error (MAE) and mean absolute percentage error (MAPE) between predicted values calculated by the predictive models and observed daily OHCA incidence by prefecture. MAE reflects the average magnitude of differences between predicted values and observed values. MAE ranges from zero to infinity. Lower MAE values indicate higher predictive performance. MAPE is generally used as a measure of the predictive accuracy of a forecasting method. It is an average of the absolute values of errors divided by observed values. MAPE ranges from 0% to 100%. Lower MAPE values indicate higher model predictive performance. In general, MAPE less than 10% is considered highly accurate predicting. 22 We also calculated correlation coefficients, which can range from −1.00 to 1.00. Higher absolute values indicate higher model predictive performance.
We investigated the relative strength of the associations between each meteorological variable and OHCA incidence in the ML predictive model using a Shapley Additive Explanations (SHAP) algorithm. 23 For a given set of feature values, a SHAP value reflects how much a single variable, in the context of its interaction with other variables, contributes to the difference between the actual prediction and the mean prediction.

Association between meteorological and chronological variables and OHCA incidence
Online supplemental figure 2 shows the incidence of OHCA by each meteorological variable. The association between OHCA incidence of cardiac origin and mean ambient temperature was U-shaped, meaning that the incidence of OHCA was lowest at approximately 25°C and higher at temperatures above and below.
Exponentiated regression coefficients (ie, IRRs) of the GLM are shown in table 2. In univariable models, conventional ambient temperature, relative humidity, precipitation during the previous hour, snowfall, cloud coverage and wind speed were statistically associated with OHCA incidence (p<0.05, respectively). In the multivariable model, similar statistical associations were observed, except for mean precipitation during the previous hour, mean snowfall and mean wind speed for each day.

Predictive accuracy of the models
Predicted and observed OHCA incidence of cardiac origin are plotted for each model in figure 1. Initially, we developed the

Healthcare delivery, economics and global health
predictive models based on comprehensive meteorological variables and chronological variables, respectively, using ML. The predicted values fitted the observed values well. ML predictive models with comprehensive meteorological variables were able to predict the daily change in OHCA incidence but were not able to predict a large increase in OHCA incidence accurately during the winter ( figure 1A). ML predictive models with chronological variables were able to predict a large increase in OHCA incidence during the winter (figure 1B). By combining meteorological and chronological variables in a single ML predictive model, the concordance of the predicted values and the observed values improved ( figure 1C). Predictive accuracy of the predictive models is shown in table 3. Among all predictive models, the predictive model with combined meteorological and chronological variables had the highest predictive accuracy in the training (MAE 1.314 and MAPE 7.007%) and testing datasets (MAE 1.547 and MAPE 7.788%). The predictive model with combined meteorological and chronological variables also had the highest correlations between observed and predicted values in the training (r=0.880, 95% CI 0.880 to 0.880) and testing datasets (r=0.870, 95% CI 0.860 to 0.870) (online supplemental figure 3). Moreover, using the predictive model with combined meteorological and chronological variables, we predicted the incidence of OHCA at a district level in Kobe city during a 1-week period after 2016. Figure 2 shows the heatmap of observed vs predicted numbers of OHCA incidence between 28 January and 3 February 2018. During this week, 24 OCHA events were predicted for Kobe city, while 27 OCHA events were observed. The heatmap showed that zero to four OHCA events occurred in each district during this week. Among nine districts, the predicted OHCA incidence matched the observed OHCA incidence in four districts (districts A, B, E and G). One fewer OHCA event was predicted than observed in three districts (districts C, F and I).

Predictive importance
The predictive importance of meteorological and chronological variables in the ML predictive model is shown in figure 3. With regard to meteorological variables, lower mean ambient temperature within a day was the most strongly associated with the incidence of OHCA. In addition, larger difference in mean ambient temperature from the previous day and larger difference between maximum and minimum ambient temperatures within a day were also more strongly associated with the incidence of OHCA than other variables. Among chronological variables, more recent year, winter, Sunday, Monday and holiday were more strongly associated with the incidence of OHCA.

DISCUSSION
In this study, using the predictive model developed with combined meteorological and chronological variables, we succeeded in predicting OHCA incidence of cardiac origin with high precision. Our study is the first to predict daily OHCA incidence based on both meteorological and chronological variables using ML. Previous studies that investigated meteorological variables associated with the incidence of cardiovascular events used ambient temperature alone or seasons 8-10 24 and were limited to one city or region. 8 9 Thus, these studies did not take diversity in geography and climate into account. Indeed, Japan is located in a temperate zone with four distinct seasons and its climate varies from cool temperate in the north to subtropical in the south. Latitude ranges from N45° to N20°. In this study, we used ML to process complex data that included a nationwide registry of OHCA events and a comprehensive meteorological dataset. If climate change becomes more intense, the relationship between OHCA incidence and comprehensive meteorological data may become all the more crucial.

Meaning of study
Our predictive model for the daily incidence of OHCA had high predictive accuracy. In particular, a larger difference in mean ambient temperature from the previous day and a larger range in ambient temperature within a day, in addition to mean ambient temperature lower or higher than 25°C within a day (a U-shaped distribution), were associated with OHCA incidence of cardiac origin. We speculated that a sudden change in ambient temperature on days with extreme cold or heat plays a key role in increasing the risk of OHCA of cardiac origin; this might be related to increased sympathetic tone and blood viscosity. 25 26 However, in this study, we could not investigate how the location of OHCA affects the association between meteorological condition and the incidence of OHCA because detailed information  Healthcare delivery, economics and global health about whether OHCA events occurred indoors or outdoors was not available. If this information is available in future research, one prevention method might be to advise individuals to stay home using an IoT device warning system on high-risk days. We found important chronological variables that affect the incidence of OHCA such as season, day of the week and holiday. Combining meteorological and chronological variables further improved the predictive accuracy of the ML predictive model. Importantly, at the local level, a heatmap showed that predicted OHCA incidence based on the ML predictive model fitted observed OHCA incidence well. Although our model was developed based on meteorological data with a resolution of 30 km gridded points, it could be applied even at a district level within one city. The model could be more practically useful if it could be further improved to predict OHCA incidence within a medical catchment area.

Implications
One advantage of using meteorological data to make predictions of OHCA incidence is that weather forecasts can predict meteorological conditions 2 weeks ahead. Our predictive model for daily incidence of OHCA is widely generalisable for the general population in developed countries because this study had a large sample size and used comprehensive meteorological data. Many developed countries are located in a similar latitude range as Japan. The methods developed in this study serve as an example of a new model for predictive analytics that could be applied to other clinical outcomes of interest related to life-threatening acute cardiovascular disease. It could also provide more opportunities to support self-management in high-risk individuals through IoT devices. 27 28 Moreover, we expect to use our predictive model to provide warnings to EMS personnel, in addition to citizens, on high-risk days. As a result, it may lead to shorter transport time from onset to hospital arrival and rapid start of advanced resuscitation care after hospital arrival. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes.

Strengths and limitations
Our study has several strengths. First, the All-Japan Utstein Registry included all patients with OHCA who received prehospital resuscitation efforts by EMS personnel because they are not permitted to terminate resuscitation in the field. Moreover, uniform data collection, a large sample size and a populationbased design covering all known OHCA events in Japan minimise potential sources of bias. These features contribute to the representativeness of the present predictive models.
This study has several inherent limitations. First, we did not have detailed information about where OHCA events occurred in various districts except in Kobe city; information was generally only available on the prefecture level. Second, our data did not address the potential variability in patients' preexisting medical conditions. Third, the predictability of future OHCA events will depend on the accuracy of meteorological data. Finally, external testing in other developed countries was not performed.

CONCLUSION
An ML predictive model using combined multiple meteorological and chronological variables could predict OHCA incidence of cardiac origin with high precision. Furthermore, this predictive model may be useful for preventing OHCA and improving the prognosis of patients with OHCA via a warning system for citizens and EMS on high-risk days in the future.