Intended for healthcare professionals

CCBYNC Open access
Research

Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review

BMJ 2020; 369 doi: https://doi.org/10.1136/bmj.m958 (Published 08 April 2020) Cite this as: BMJ 2020;369:m958
  1. Elham Mahmoudi, assistant professor1 2,
  2. Neil Kamdar, statistician expert2 3 4 5,
  3. Noa Kim, research informatics project manager1,
  4. Gabriella Gonzales, undergraduate student, research assistant61,
  5. Karandeep Singh, assistant professor7 10,
  6. Akbar K Waljee, associate professor, co-director, staff physician and researcher8911
  1. 1Department of Family Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
  2. 2Institute for Healthcare Policy and Innovation, University of Michigan Medical School, Ann Arbor, MI, USA
  3. 3Department of Obstetrics and Gynecology, University of Michigan Medical School, Ann Arbor, MI, USA
  4. 4Department of Surgery, University of Michigan Medical School, Ann Arbor, MI, USA
  5. 5Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
  6. 6Undergraduate Research Opportunity Program, University of Michigan, Ann Arbor, MI, USA
  7. 7Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA
  8. 8Division of Gastroenterology and Hepatology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
  9. 9Michigan Integrated Center for Health Analytics and Medical Prediction (MiCHAMP), University of Michigan, Ann Arbor, MI, USA
  10. 10Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
  11. 11Veterans Affairs Center for Clinical Management Research, Ann Arbor, MI, USA
  1. Correspondence to: E Mahmoudi Mahmoudi{at}med.umich.edu (or @Mahmoudi_E on Twitter)
  • Accepted 27 February 2020

Abstract

Objective To provide focused evaluation of predictive modeling of electronic medical record (EMR) data to predict 30 day hospital readmission.

Design Systematic review.

Data source Ovid Medline, Ovid Embase, CINAHL, Web of Science, and Scopus from January 2015 to January 2019.

Eligibility criteria for selecting studies All studies of predictive models for 28 day or 30 day hospital readmission that used EMR data.

Outcome measures Characteristics of included studies, methods of prediction, predictive features, and performance of predictive models.

Results Of 4442 citations reviewed, 41 studies met the inclusion criteria. Seventeen models predicted risk of readmission for all patients and 24 developed predictions for patient specific populations, with 13 of those being developed for patients with heart conditions. Except for two studies from the UK and Israel, all were from the US. The total sample size for each model ranged between 349 and 1 195 640. Twenty five models used a split sample validation technique. Seventeen of 41 studies reported C statistics of 0.75 or greater. Fifteen models used calibration techniques to further refine the model. Using EMR data enabled final predictive models to use a wide variety of clinical measures such as laboratory results and vital signs; however, use of socioeconomic features or functional status was rare. Using natural language processing, three models were able to extract relevant psychosocial features, which substantially improved their predictions. Twenty six studies used logistic or Cox regression models, and the rest used machine learning methods. No statistically significant difference (difference 0.03, 95% confidence interval −0.0 to 0.07) was found between average C statistics of models developed using regression methods (0.71, 0.68 to 0.73) and machine learning (0.74, 0.71 to 0.77).

Conclusions On average, prediction models using EMR data have better predictive performance than those using administrative data. However, this improvement remains modest. Most of the studies examined lacked inclusion of socioeconomic features, failed to calibrate the models, neglected to conduct rigorous diagnostic testing, and did not discuss clinical impact.

Introduction

Hospitals across the US continue to be under scrutiny to reduce their 30 day readmission rates (hereafter readmission), as a measure of both hospital quality and cost reduction. The Hospital Readmissions Reduction Program is a Medicare value based program that since October 2012 has started reducing payments to hospitals with excess readmissions.1 Between 2007 and 2015, readmission rates for specific conditions dropped from 21.5% to 17.5%.2 This has been largely attributed to investments by hospitals to enhance their discharge processes,2 which include providing better medication reconciliation, educating patients and their care givers regarding continuity of care, and implementing follow-up processes for discharged patients. However, implementation of an effective discharge process in hospitals is time consuming and expensive. The development of readmission risk tools has increased sharply in recent years to enable precise identification of patients at high risk and inform a more efficient use of post-discharge care coordination. However, because of the complexity of inpatient care and discharge processes, achieving a high sensitivity and specificity in predicting who is at risk of readmission and why is still a work in progress.

The accuracy and reliability of risk models largely depend on predictors and methods of development, validation, calibration, and clinical utility.3 In the context of choosing an appropriate set of predictors, administrative data are inherently limited, primarily due to the lack of clinical specificity for conditions and laboratory results. With recent multibillion dollar investments in electronic medical records (EMRs) and their increasing use and application in healthcare systems,4 the use of machine learning methods in medicine has also expanded. Thus, the past few years has seen a surge in the development of highly sophisticated predictive models using EMRs. Two previously published systematic reviews of predictive models of readmission—regardless of the data source used or whether the model was validated—assessed predictive models up to 2015.56 Gaps exist in the knowledge about predictive models of readmission that leverage the use of EMRs and new methods of prediction.

This study focuses on validated predictive models of readmission that specifically use EMR data. We adopted the systematic review guide for evaluation of prediction model performance.7 The objectives of this study were to evaluate the variation in predicting readmission for all patients versus patient specific populations, to examine the properties of the EMR based candidate features, to assess differences in performance between traditional regression and machine learning models, and to assess the quality of the studies.

Methods

Information sources and search

We searched Ovid Medline, Ovid Embase, CINAHL, Web of Science, and Scopus by using an inclusive combination of exploded MeSH subject headings, keywords, and title, abstract, and full text keywords, with and without adjacencies when available, with a publication date range of 1 January 2015 to 1 January 2019. The last electronic database search took place in April 2019. We imported all citations into electronic citation management software (EndNote X9). Supplementary tables A-C provide detailed information on inclusion and exclusion criteria and on our search strategy.

Eligibility criteria

Studies eligible for inclusion were peer reviewed and published between 1 January 2015 and 1 January 2019. We included only studies that developed and validated a predictive model of hospital readmission within 28 or 30 days after initial discharge. We excluded studies that did not use EMR data in the development or validation of the model, studies published before 2015 owing to overlap with previous reviews,56 studies not published in English, and conference abstract only references (supplementary table A). We did not do an extensive hand search for this systematic review.

Study selection

After de-duplication, two authors (EM and GG) screened our initial 3506 citations for title and abstract relevance. We excluded 3206 records and accessed 300 resulting citations in their full text form. Two authors evaluated each article independently by using the inclusion and exclusion criteria shown in figure 1. Discrepancies between reviewers were resolved through additional review during group discussions.

Fig 1
Fig 1

Schematic flow diagram of selected studies[A: I believe the number of full text articles excluded should be 259 rather than 257]

Data extraction

Two authors (EM and GG) extracted data from the final included studies to profile each model’s population (table 1 and table 2), candidate features (supplementary table D), model description (supplementary table E), and quality assessment (supplementary table F). To ease the cross linkage between tables 1 and 2 and the supplements, we organized all supplementary tables similarly. Firstly, we separated the included studies into two general categories: all patient populations and specific patient populations. We then listed studies in each group alphabetically according to lead author’s last name.

Table 1

Characteristics of patients and hospitals in studies that included all patient populations

View this table:
Table 2

Characteristics of patients and hospitals in studies that included specific patient populations

View this table:

Data synthesis

The wide heterogeneity of the included models did not permit a quantitative meta-analysis of their performance; however, we provide a qualitative review and synthesis of population studied and model characteristics. To analyze the differences between studies that used machine learning methods and those that used traditional regression or between those for all patient populations and those developed for specific populations, we assumed that every study was weighted equally regardless of the number of patients and/or methods used. If more than one model was used in a study, we chose the C statistic representing the maximum for that study. We report validation C statistics in this review; however, when the study was ambiguous about the C statistic being from either the development or validation dataset, we assumed they were being reported from the validation cohort. Finally, on further analysis, we calculated 95% confidence intervals for the C statistics of different study groups. We also calculated the 95% confidence intervals for the difference in the mean C statistic between the two study groups to ascertain potentially significant differences in concordance.

Results

From 3506 titles and abstracts (after removing 937 duplicates), we selected 300 articles for complete text review. Our final set included 41 studies that met our inclusion criteria (fig 1). We divided these studies on the basis of their population cohort into all patient populations (n=17, including one intensive care unit and one emergency department readmission) and patient specific populations (n=24). Most patient specific models were for heart conditions (n=13).24252627303234373843454647 The remainder were based on readmission among patients with diabetes (n=4),28334041 kidney transplantation (1),44 hemodialysis (1),29 low back surgery (1),36 pneumonia (2),3135 lupus (1),39 and psychiatric conditions (1).42 Thirty nine studies were based on data from US hospitals, and two were from other developed countries (the UK11 and Israel20).

The total sample size in each model ranged from 349 to 1 195 640.2129 All validations were done internally; most were conducted through retrospective validation (n=37) and used split sample (n=24) or cross validation (n=11) methods. The C statistics ranged between 0.52 and 0.90,2324 with 17 studies reporting a C statistic of 0.75 or greater.1112141516171923293334363742434647

Characteristics of patients and hospitals

Table 1 and table 2 show characteristics of patient populations and hospitals. Seventeen studies developed predictive models of readmission for all patient populations (table 1). Most studies included adults 18 years or older (n=9). Twelve studies used data from multiple hospitals. Included centers were non-academic (n=9),8911121314152224 academic (n=4),10161923 or a combination of both (n=3).182021 Observed readmission rates for these sets of models were between 6% and 23%.924

Twenty four studies developed predictive models of readmission among specific patient populations (table 2). Most (13/24 studies) of these models were developed for patients admitted with heart conditions.24252627303234373843454647 All patient specific studies included adults 18 years or older. Ten studies used data from multiple hospitals.25262830323435373947 Data came from academic centers (n=9),242931333640414244 non-academic centers (n=3),273843 or a combination of both (n=10).252628303235373947 Observed readmission rates for patient specific readmissions ranged between 5.9% and 54%.3638

Candidate features and predictors

Supplementary table D summarizes the features used in the predictive models. We categorized the features into five groups: clinical data, demographics, healthcare encounter history, functional status, and socioeconomic status. Using EMR data, detailed clinical and healthcare encounter data such as admission type and discharge location, primary and additional diagnoses, morbidities, laboratory results, vital signs, type and number of drugs, and basic demographics such as age, sex, race and ethnicity, and insurance type were readily available and thus examined in most of the predictive models. Additionally, Escobar et al and Morris et al used length of operating room stay in hours as a proxy for complexity of surgical procedure if the inpatient hospital stay included any surgical procedure.1217 Being admitted to an intensive care unit and number of procedures during the index hospital stay have also been used as proxies for the complexity of a patient’s condition.15181922252728303843

A few studies used composite clinical scores that are not readily available to account for severity of conditions for patients admitted to hospital. For example, Tong et al used the Braden Score to indicate risk of pressure ulcers,22 and Escobar et al used the Comorbidity Point Score or COPS2,12 which uses 45 of the 70 possible Hierarchical Condition Categories originally developed by the Centers for Medicare and Medicaid Services to measure the severity of a patient’s comorbidity. Other noteworthy composite scores included severity of illness on the day of admission and discharge based on the Laboratory Acute Physiology Score,12 Acute Laboratory Risk of Mortality Score,21 polypharmacy (more than six medicines),1622 surgical complications,41 number of laboratory results marked as “high,” “low,” or “abnormal,”9 use of specific drugs among patients admitted for heart failure,38 or use of 10 or more drugs at the time of admission among hemodialysis patients.29

Functional status is usually not recorded in structured EMR data. Only seven studies included measures of disability or limitations on activities of daily living in their models.13161720343638 Shadmi et al used a disability indicator, which is routinely collected in Clalit Health Services data in Israel,20 and this proved to be a top predictor for readmission. Morris et al used functional status available via Veterans Affairs data and nurses’ notes,17 and McGirt et al used questionnaire data in addition to EMRs to collect this information.36

Sixteen studies considered various socioeconomic proxies for socioeconomic status as candidate predictors for readmission.913141516171820212830333538404144 These studies used nurses’ notes, self-reported patient questionnaires, or census 2010 block level or zip code level aggregate data to include features such as income and education. Despite the cited importance of care giver availability, only Greenwood et al used availability of a support person after discharge.13 A couple of studies showed that proxies for socioeconomic status (not having a high school degree, being enrolled in Medicaid, and living in a poor neighborhood) were strong predictors of readmission.3544

The top predictors among all models mostly included healthcare encounter history (previous emergency or inpatient visits within three to six months before index hospital admission)111415 and a variety of clinical data indicating the severity of the patient’s condition during the index admission (low level of albumin or using a variety of constructed severity scores).122122 As stated above, a few studies also showed that disability/functional status measures and socioeconomic status were strong predictors of readmission.81720283344

Predictive models

Supplementary table E summarizes characteristics of predictive models used in the included studies. Timing of the prediction is of paramount importance for institutions to operationalize these risk assessment tools for readmission. Most studies (n=23) predicted readmission right before or at discharge. Ten studies did not report the timing of their predictions; the rest reported it as within 24 hours after admission (n=3),102332 before admission (n=1),20 or after discharge (n=3).303334 Most of the studies (n=24) examined more than one predictive model and chose the model with the highest C statistic and fewest predictors. Although all models are presented, for the ease of representation we chose the model with the highest C statistic for each study.

Out of 41 included studies, 26 used multivariable Cox or logistic regression models. Different feature selection techniques such as stepwise variable selection (forward, backward, or backward-forward methods),22 univariate binary regression, and LASSO (least absolute shrinkage and selection operator) were used.23

Fifteen studies used various machine learning methods, including Bayesian conditional probability,1043 random forest,1415384647 neural network,15192739 deep learning,41AdoBoost,22 gradient boosting,3047 natural language processing,303642 and others.244546 The most popular machine learning methods used were random forest and neural networks. Shrinkage methods (for both traditional and machine learning models) such as LASSO or machine learning algorithms such as Ado boosting were used to limit the number of features.

On average, the C statistics for machine learning and traditional regression models were 0.74 (standard deviation 0.06; 95% confidence interval 0.71 to 0.77) and 0.71 (0.07; 0.68 to 0.73), respectively. Although the mean C statistic was higher for machine learning models, the difference was not statistically significant (difference 0.03, 95% confidence interval −0.0 to 0.07). Furthermore, we did not find a significant difference between the C statistics for all patient (0.76, 0.72 to 0.79) and patient specific (0.72, 0.70 to 0.75) models (difference 0.03, –0.01 to 0.07).

A few studies used other comprehensive methods of model evaluation, such as the integrated discrimination index and net reclassification index. For example, by calculating the clinical utility of a predictive model for a given threshold, Walsh et al also used their findings to develop a model of clinical usefulness to evaluate the potential cost of mis-calibration and to measure the value of interventions aimed at reducing readmission.23 Of all studies, eight (20%) reported sensitivity and specificity of the developed models,1520242530394246 and seven (17%) reported positive and negative predictive values.15202425303244 Finally, five (12%) reported being implemented in the EMR system.813142432

Three studies used natural language processing to extract additional psychosocial information such as suicidality or excessive alcohol consumption that otherwise were not available via structured EMR data.303642 For example, Rumshisky et al used 1000 informative words to extract additional data from related clinical notes, which improved the C statistic from 0.75 in the base model to 0.78.42 In addition to structured EMRs, Golas et al used natural language processing for two types of unstructured data—physicians’ notes and discharge summaries—to analyze data related to patients’ social history and treatment during admission (allergic reactions, history of illness, intolerances and sensitivities).30

Quality assessment

We assessed the quality and risk of bias of studies by using six variables, including accounting for missing values, validation method, type of validation (internal versus external and prospective versus retrospective), calibration (yes/no), and scope of readmission assessment (only at studied hospitals or in a larger geographic area) (supplementary table F). We used a few techniques to deal with missing values in EMRs: removing data with missing values from the analytic sample,1011 creating a separate category for them,8 imputing their values,14 and considering missing laboratory results to be normal.21

As we included only validated models, most were of high quality with a relatively low risk of bias. However, only a few expanded the assessment of their models beyond basic C statistics to evaluate the clinical usefulness of the models.2446 Fifteen (37%) studies calibrated their models.8121819212326313435363741 Calibration techniques such as Hosmer-Lemeshow, plot scaling, and prevalence adjustment were used to make the model probabilities more similar to the probabilities of the population studied. However, most of these studies failed to report the number of patients in each risk group, so we were unable to estimate the average predicted readmission rate and observed-to-expected ratios for each model. Furthermore, most models measured readmission only among included hospitals instead of using a broader (regional) scope for readmission. Finally, all validations were done internally. Thus, we could not assess the generalizability and practical utility of the developed readmission risk assessment tools.

Discussion

In this systematic review, we reviewed 41 studies of the development and validation of predictive models of 30 day hospital readmission using electronic medical records. These models were developed to identify patients at high risk of readmission for whom coordinated discharge care might reduce the chance of early readmission. On average, the predictive ability of risk readmission models based on EMR data compared with that of previously published models using all other available datasets (administrative or survey data) has improved, from 0.67 to 0.74.5

Comparisons with other studies

Over the past few years, despite increasing use of “big data” and rich clinical information available via EMRs,448 and application of sophisticated machine learning methods, predicting risk of early hospital readmission with reliable accuracy has remained elusive. Hospital readmission is a complex and multidimensional problem, demanding to be better understood. Although inclusion of essential clinical data available in EMRs (such as vital signs, laboratory results, or complexity of the surgical procedure) increased the predictive ability of the models, some important clinical data were still not readily available in EMRs. For example, composite measures of severity such as the Braden Score (risk indicator for developing a pressure ulcer),22 Comorbidity Point Score (risk indicator of multimorbid severity),12 and Laboratory Acute Physiology Score (risk indicator of illness severity)1221 have rarely been examined. Furthermore, functional status or frailty at the time of discharge, known to be an important risk factor for readmission,49 is not routinely collected and used in EMR based predictive models.5051

Most notably, despite a large body of literature showing significant links between social and environmental factors and risk of readmission or other adverse health events,5253 health systems are still not systematically collecting these data. Including selected social and environmental factors,5455 such as care giver availability or housing instability,56575859 could likely substantially improve the predictive accuracy of the risk readmission models.55 To fill this void, alternative approaches have been examined. For instance, Census Bureau zip code or block level socioeconomic data have been merged with EMRs.303335 Perhaps because of the imprecise nature of these aggregate data, however, many of them did not show significant difference in discriminatory power when examined in predictive models. Many models started using natural language processing to extract key social and environmental data from unstructured data such as physicians’ notes.60 Although physicians’ or nurses’ notes are unsystematically recorded, meaning that what is recorded by one physician may not be recorded by another, natural language processing has shown promising results for improving the accuracy of predictive models. Natural language processing can also be used to collect other salient information that is usually missing from structured EMRs, such as psychosocial or sensory statuses.303642

Additionally, the quality and integrity of EMR data are of concern and have implications for leveraging these data to develop accurate and precise risk assessment.61 This systematic review emphasized that nowhere in the literature was appropriate validation of EMR data performed. Furthermore, for missing data elements, standard approaches to identify the missingness mechanism and to appropriately deal with it without compromising the data elements used for further modeling are absent. Also, as previously described, EMRs lack certain salient data elements informed by the literature for risk assessment. We hypothesize that overcoming these data quality and integrity problems would not only improve the C statistics but would also introduce significant and impactful features that are missing from currently reviewed models.

Furthermore, with the emergence of big data and sophisticated machine learning methods in healthcare, the number of predictive models of hospital readmission has increased over the past few years.62 The clinical utility of machine learning methods, however, needs further attention. For example, sophisticated machine learning methods such as neural networks work like a black box, lacking transparency in selection of features. Thus, their relative contribution, usefulness, and interpretability in medicine need to be investigated.63 For example, class imbalance—when dichotomized outcomes are substantially different in probabilities—might cause biased predictions when machine learning is used, unless certain adjustments are made to correct it.64 Otherwise, models that are developed using imbalanced training data would intrinsically provide more accurate predictions for the class with a higher number of occurrences. Despite extreme class imbalance in hospital readmission, only a small number of reviewed predictive models adjusted for it.

EMRs encompass a large repository of multidimensional data. Although traditional regression is easy to use and implement, it may not take advantage of the volume of data elements available in the EMR; however, machine learning methods are capable of using the exhaustive set of data elements for consideration.

Despite the growing literature supporting machine learning methods as an alternative, coupled with potential benefits in their use for predicting readmissions, three important remaining criteria have been highlighted, which our systematic review attempts to tackle. Firstly, feature selection remains an important criterion that is predicated on having an exhaustive and diverse set of data elements available, such as socioeconomic and functional status. Subsequent studies should consider implementing sufficiently granular data elements via text mining, merging this with smaller geographic units of analysis (census tract or neighborhood level), or encouraging health systems to collect these salient attributes. Secondly, machine learning methods struggle to achieve parsimony owing to the selection of several hundred to thousands of features to predict an outcome. The use of machine learning methods, although fashionable and offering a potential academic exercise, fails to answer important clinical questions about the implementation and interpretability of the results. Thirdly, machine learning methods vary substantially in their interpretability, creating barriers and impediments for clinical buy-in and for their implementation across health systems. Although interpretable machine learning methods have been absent in this systematic review, the evolution of the field requires development and implementation of interpretable machine learning methods to establish clinical usefulness and inspire potential changes in practice patterns.

The paucity of studies (5%) that provide information on the implementation and clinical utility of these models in the hospital setting leads to a substantial void in how these models can improve care coordination and discharge planning across readmission risk strata. With interpretable models that would enhance clinician “buy-in,” their implementation would encourage identification of patients who need efficient allocation of limited available resources for care coordination. Furthermore, these models would inform hospitals to tailor appropriate discharge protocols to the patients across readmission risk groups.

Regardless of the method used for prediction, careful diagnostic tests such as C statistics, sensitivity and specificity, positive and negative predictive values, integrated discrimination index, and net reclassification index should be calculated and discussed to ensure not only the accuracy of a model but also its clinical usefulness.65 The C statistic is a measure of “discrimination” because it measures whether a model can discriminate patients at higher risk from those at lower risk. Besides C statistics, most of the reviewed models failed to calculate and interpret a reasonable array of other diagnostic tests or even clinical usefulness of the models developed. Finally, to ensure approximate closeness of the model’s performance to existing probabilities of the target population, a prediction model, regardless of how it was developed, needs to be well calibrated.23 Most of the models we reviewed either did not discuss calibration or simply used goodness-of-fit tests such as Hosmer-Lemeshow in place of full calibration. Categorization into tenths of predicted counts and observed counts can be difficult if discrimination is poor.66

Limitations of study

Our study had a few limitations. Most probably, the definition and classification of variables (both predictors and outcomes) varied among models. Although improving, EMR data are not yet standardized like administrative claims or survey data. As these studies used center specific EMRs, predictive models were developed for particular hospital settings and are not generalizable at a national level; therefore, the broader clinical and practice benefit may be localized and most likely applicable for institutional quality improvement. However, if disparate EMR data can be recoded in a manner that harmonizes across different EMRs, then comparisons can be made. Data coordinating centers could plausibly guide hospitals to do this, and this should be a future direction. We attempted to synthesize the findings to the best of our ability, so future studies may benefit from testing the variables and methods that were found to be most promising. Furthermore, few rigorous studies that have studied interventions targeting reduction in readmissions have actually shown a decrease in readmissions.6768 Finally, as discussed above, most of the reviewed models neither included other recommended diagnostic tests besides C statistics nor discussed the clinical usefulness of their findings. To minimize bias, we chose only the highest quality models by including the ones that explicitly validated their findings.

Conclusions and policy implications

In short, despite notable progress in the development and accuracy of the models, predicting and reducing readmission remain a complex process.6768 Most of the models developed to date have moderate predictive ability. No well accepted threshold of what constitutes an accurate C statistic exists because model discrimination is also a measure of how predictable a given outcome is. For an outcome such as 30 day readmission that is considered to be difficult to predict, a C statistic of 0.75 may be adequate for the model to be useful. In contrast, for an outcome that is readily predictable by clinical experts, even a model with a C statistic of 0.90 may not be useful. Use of EMR data and machine learning methods have created an enormous opportunity for further refinement of risk prediction tools for readmission, making them specifically pragmatic for hospitals to better identify patients who are at higher risk of readmission. Continued development of these models to optimize performance of the model (tuning) may lead us toward improvement through institutional quality improvement and readmission reduction.

What is already known on this topic

  • The development of tools to predict the risk of 30 day hospital readmission and thus enable identification of patients at high risk has increased sharply in recent years

  • However, achieving a high sensitivity and specificity in predicting who is at risk of readmission and why is still a work in progress

  • The accuracy and reliability of risk prediction models largely depend on predictors and methods of development, validation, calibration, and clinical utility

What this study adds

  • On average, risk prediction models using electronic medical records have better predictive performance than those using administrative data, but this improvement remains modest

  • The quality and integrity of electronic medical records are concerning and pose significant barriers to effectively leveraging these data to develop accurate and precise risk assessment tools

  • Most studies did not account for salient socioeconomic features, failed to calibrate their models, and lacked careful assessment of the clinical utilities and implementation of the developed tools

Acknowledgments

We acknowledge the tremendous help from Whitney Ann Townsend, Liaison Services Librarian to the Department of Family Medicine, in developing the initial search terms, and Murphy Vandervest and Shivani Shant from the Undergraduate Research Opportunity Program at the University of Michigan in screening of the initial search results. We thank Lois Phizacklea for her help in formatting the manuscript.

Footnotes

  • Funding: This study was supported by grants from the National Institutes of Health, P30 AG015281, and the Michigan Center for Urban African American Aging Research and from the University of Michigan Claude D Pepper Older Americans Independence Center, AG024824.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support for the submitted work as described above; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work

  • Contributors: EM designed the data collection, developed the search strategy, monitored the data collection, screened and reviewed the selected articles, and drafted and revised the paper. NK drafted sections of the paper and revised the paper. NK developed and conducted the search strategy and drafted sections of the paper. GG screened the initial search results, reviewed selected studies, and drafted sections of the paper. AKW drafted and revised the paper. KS revised the paper. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. EM is the guarantor.

  • Ethical approval: Not applicable.

  • Data sharing: No additional data available.

  • Transparency: The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

  • Dissemination to participants and related patient and public communities: Not applicable.

http://creativecommons.org/licenses/by-nc/4.0/

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

References