Article Text

Download PDFPDF
Machine learning: a long way from implementation in cardiovascular disease
  1. Suliang Chen,
  2. Amitava Banerjee
  1. Institute of Health Informatics, University College London, London, UK
  1. Correspondence to Dr Amitava Banerjee, Institute of Health Informatics, University College London, London, UK; ami.banerjee{at}

Statistics from

The term ‘machine learning’ (ML) dates back to the 1950s to describe how algorithms and neural network models can assist computer systems in progressively improving their performance. In the last decade, advanced ML algorithms have been increasingly used for phenotypic identification in different cardiovascular diseases (CVDs), driven by two major factors. First, a gap persists between disease definitions from research or consensus guidelines and routine clinical practice. Second, as electronic health records (EHRs) are increasingly adopted within and across countries, there are unprecedented opportunities to investigate disease definitions in a more reproducible and generalisable manner. The objective of ML algorithms in such analyses is to develop replicable EHR-based phenotypical definitions for a given CVD1 and to predict group assignments of new patients.

EHRs include diagnostic codes from primary care, secondary care and administrative data. To produce homogeneous subgroups of EHR data, clustering methods are unsupervised ML algorithms which aim to group objects with similar attributes relying on similarity and distance measures. In their Heart paper, Hedman and colleagues2 identify six phenotypes of heart failure with preserved ejection fraction (HFpEF), derived using data from 320 outpatients with HFpEF in the Karolinska-Rennes cohort study. HFpEF is the example par excellence of a disease where disease definitions, and consequently knowledge regarding pathophysiology and management, remain elusive. The team provides a prediction tool for patient assignment of identified subtypes, although this has not been externally validated.

The authors provide new insights in a disease lacking a single evidence-based therapy to alter prognosis. They found that two pheno-groups had worst prognosis: those with hypertension and highest prevalence of coronary artery disease, renal disease, anaemia and diabetes (pheno-group 1) and those with atrial fibrillation and high prevalence of chronic obstructive pulmonary disease, old age, kidney dysfunction and anaemia (pheno-group 2). Pheno-group 1 is consistent …

View Full Text


  • Twitter @amibanerjee1

  • Contributors The manuscript was conceived by AB. The first draft was jointly prepared by AB and SC. Both authors contributed to revision of the manuscript and have accepted the final version.

  • Funding AB and SC have received funding from the BigData@Heart Consortium, under the Innovative Medicines Initiative-2 (116074, supported by the European Union’s Horizon 2020 programme and EFPIA (Chairs: D E Grobbee, S D Anker).

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles