Article Text

Download PDFPDF

Metabolic phenotyping and cardiovascular disease: an overview of evidence from epidemiological settings
  1. Aikaterini Iliou1,
  2. Emmanuel Mikros1,
  3. Ibrahim Karaman2,
  4. Freya Elliott3,
  5. Julian L Griffin4,
  6. Ioanna Tzoulaki2,5,6,
  7. Paul Elliott2,6,7,8
  1. 1 Pharmacy, National and Kapodistrian University of Athens School of Health Sciences, Athens, Attica, Greece
  2. 2 Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
  3. 3 School of Medicine and Dentistry, Queen Mary University, London, UK
  4. 4 Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK
  5. 5 Department of Hygiene and Epidemiology, University of Ioannina, Ioannina, Greece
  6. 6 BHF Research Centre for Excellence, Faculty of Medicine, Imperial College London, London, UK
  7. 7 MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
  8. 8 Imperial College Biomedical Research Centre, Imperial College London, London, UK
  1. Correspondence to Professor Paul Elliott, MRC-HPA Centre for Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Imperial College of Science Technology and Medicine, London W2 1PG, UK; p.elliott{at}


Metabolomics, the comprehensive measurement of low-molecular-weight molecules in biological fluids used for metabolic phenotyping, has emerged as a promising tool to better understand pathways underlying cardiovascular disease (CVD) and to improve cardiovascular risk stratification. Here, we present the main methodologies for metabolic phenotyping, the methodological steps to analyse these data in epidemiological settings and the associated challenges. We discuss evidence from epidemiological studies linking metabolites to coronary heart disease and stroke. These studies indicate the systemic nature of CVD and identify associated metabolic pathways such as gut microbial cometabolism, branched-chain amino acids, glycerophospholipid and cholesterol metabolism, as well as activation of inflammatory processes. Integration of metabolomic with genomic data can provide new evidence for involved biochemical pathways and potential for causality using Mendelian randomisation. The clinical utility of metabolic biomarkers for cardiovascular risk stratification in healthy individuals has not yet been established. As sample sizes with high-dimensional molecular data increase in epidemiological settings, integration of metabolomic data across studies and platforms with other molecular data will lead to new understanding of the metabolic processes underlying CVD and contribute to identification of potentially novel preventive and pharmacological targets. Metabolic phenotyping offers a powerful tool in the characterisation of the molecular signatures of CVD, paving the way to new mechanistic understanding and therapies, as well as improving risk prediction of CVD patients. However, there are still challenges to face in order to contribute to clinically important improvements in CVD.

  • epidemiology
  • coronary artery disease
  • research design
  • biomarkers
  • stroke

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Cardiovascular disease (CVD) remains the major cause of death globally, with an estimated 85 million deaths annually.1 CVD compromises a diverse set of diseases, with coronary heart disease (CHD) and stroke being the most common manifestations. A large body of research over the past decades has identified a range of well-established risk factors, including male sex, high blood pressure, high total and low-density lipoprotein (LDL) cholesterol levels, smoking and type 2 diabetes, as well as many genetic factors that affect an individual’s risk of developing CVD.2 Nonetheless, the molecular mechanisms linking these and other risk factors to CVD are only partly understood, particularly how molecular mechanisms interact with environmental exposures such as diet and xenobiotics. Better understanding of the underlying pathways which are implicated in CVD pathophysiology is important to design novel or improved strategies for prevention, risk stratification and treatment.

Metabolomics, the comprehensive measurement of low-molecular-weight molecules in biological fluids, provides an assessment of the metabolic signatures (metabolic phenotype) of intrinsic and extrinsic exposures (the internal and external exposomes) from a variety of sources, including genetic, dietary, lifestyle, gut microbial and psychosocial factors. This involves the biochemical analyses of multiple metabolites in biological fluids, tissue homogenates as well as intact tissues. Substantial progress has been made in metabolomics through improved instrument performance and use of advanced chemoinformatic and bioinformatic tools for data acquisition, processing and analysis. These have facilitated the investigation of metabolites in relation to CVD in large population studies. Applications of metabolic phenotyping in CVD have so far been multifaceted, including the use of metabolic biomarkers to study the effects of lifestyle and environmental exposures on CVD risk,3 4 investigation of mechanisms and pathophysiological processes underlying CVD development,5 6 and progression and evaluation of prognostic biomarkers.7–9 Here, we provide a brief account of metabolomics technologies and their application in CVD research, focusing on epidemiological studies of CVD and its main components, CHD and stroke.


There is no single technology to measure the entire metabolome. To address this, a wide variety of approaches, biological fluids, analytical platforms and statistical methods have been proposed to systematically increase the coverage of the metabolome. Analytical advances have allowed the simultaneous detection and quantitation of a variety of small molecules with different chemical properties and structures, including carbohydrates, lipids, steroids, organic acids, amino acids, peptides, energy-related metabolites and gut microbial cometabolites, ranging from aqueously soluble metabolites to non-polar lipids and lipophilic components.

The application of metabolomics can be hypothesis driven, through targeted (closed) methods quantifying a small number of structurally related metabolites, or an untargeted (open, hypothesis generating) approach where the aim is to cover as much of the metabolome as possible. With the targeted method, only a preselected group of metabolites that usually are chemically related, belonging to a metabolic class or pathway, is measured. With the use of standards labelled with stable isotopes, these assays can be made quantitative. In untargeted approaches, the chemical identity of the metabolites may not be known a priori, and chemical/spectral annotation may be needed post hoc to identify the molecular species. Furthermore, the untargeted methods are not optimised for a particular class of compounds and, thus, are often less sensitive than targeted methods and may be semiquantitative rather than fully quantitative.


Many known metabolites are present in blood and urine, which are the most frequently used sample types in human metabolic phenotyping. They are relatively easy to collect and store in epidemiological settings and carry extensive information on the metabolic phenotype. Blood is sampled as either blood plasma or serum, and contains a wide range of aqueously soluble metabolites and lipids, with the latter largely contained within lipoprotein fractions. Urine is a biological waste material containing products of endogenous metabolism, gut microbial cometabolism, renal function and exogenous exposures, and therefore also captures a complex and wide set of metabolic processes.10 It also has the advantage that it can significantly concentrate waste products of metabolic pathways. Other biofluids that have been used include saliva and cerebrospinal fluid, tissue extracts from atherosclerotic plaques or cell extracts. The latter may reveal tissue-specific biological processes that cannot be captured in blood and urine. Variations in sample collection procedures, handling, transport and storage both within and between studies may affect the metabolic profile and introduce bias, highlighting the need for standard operating procedures and extensive quality control (QC).11 12

Analytical platforms and analysis workflow

The majority of metabolic phenotyping data are acquired using nuclear magnetic resonance (NMR) spectroscopy, and mass spectrometry (MS)-hyphenated instrumental platforms, mainly liquid chromatography–MS and gas chromatography–MS. The relative advantages and disadvantages of each technology and their main applications are described in table 1.

Table 1

Strengths and weaknesses of the core instrumentation used in metabolomics

Metabolomics analysis can be divided into four main steps: preprocessing, statistical analysis, identification of the unknown features (for untargeted analyses) and pathway analysis (figure 1). Postdata acquisition processing (so-called preprocessing) in metabolomics is complex and requires a relatively sophisticated methodology as datasets can be very large as a result of the high-throughput nature of the data.13 QC samples, often pooled across samples included in the study, are used to assess reproducibility and to account for instrument drift, analytical signal variability, and baseline and retention time shifts across the same (intra-) or different (inter-) batches.14

Figure 1

Schematic of the metabolic phenotyping analysis workflow. LC-MS, liquid chromatography–mass spectrometry; NMR, nuclear magnetic resonance; QC, quality control.

Statistical analysis needs to take into account characteristics of metabolomics data, including a high degree of collinearity, high-dimensionality, non-linearity, missingness and non-normality.15 Both univariate and multivariate methods are used to address different types of metabolomics data and study questions. In univariate approaches, standard generalised linear models are commonly used. To account for multiple hypothesis testing, Bonferroni, false discovery rate correction and a less stringent metabolome-wide significance-level approach16 are often reported. Multivariate approaches are also used to tease out latent information from spectral data and to select relevant metabolic features. The most common multivariate approaches fall within two groups: unsupervised, such as principal component analysis, and supervised, such as partial least squares discriminant analysis. Unsupervised methods involve data reduction to differentiate clusters of samples that share common variation, while supervised methods focus on group separation in the data (figure 2).

Figure 2

Schematic representation of score plots from (A) PCA and (B) partial least squares discriminant analysis models on a hypothetical two-group data where between-group variation is smaller than sample-to-sample variation. Within-group variation dominates; therefore, the group pattern is better revealed by separation using supervised methods. PC, principal component; LV, latent variable.

Unknown statistically significant metabolic features need to be identified using analytical chemistry and bioinformatics analyses to relate features in the spectra to individual metabolites, often making use of online databases (eg, LipidMaps, Metlin and the Human Metabolome Database). The Metabolomics Society has proposed a scoring system that describes the confidence of the metabolite identity assignment.17 The list of identified metabolites is often followed up by pathway analyses to aid interpretation of the results by placing the identified metabolites into relevant biochemical pathways (figure 3).18 19

Figure 3

Ingenuity pathway analysis network showing molecular relationships between metabolites which have been associated with atherosclerosis (blue: inverse association, orange: direct association) (adapted from Tzoulaki et al 5). ;AARS, alanyl-tRNA synthetase; ALT, alanine transaminase; CPO,carboxypeptidase O; HTT, Huntington disease protein; LDL, low-density lipoprotein; LPO, lactoperoxidase; SFTPD, surfactant protein D.

The nature of metabolomics data and the use of different analytical techniques and platforms for metabolite measurement and quantification pose special challenges to standardisation and harmonisation across studies and laboratories. Different preprocessing strategies and lack of information on metabolite identification and quantification procedures may introduce heterogeneity between studies. Several initiatives are attempting to synthesise metabolomics data from different studies such as the Consortium of Metabolomics Studies, which has focused mainly to date on pooling data from single platforms.20 Extension of these efforts to combine data across different assay platforms is much more challenging but is expected to enhance the generalisability, reproducibility and molecular scope of metabolomic findings and to facilitate their implementation in clinical practice.

Table 2

Summary of prospective epidemiological studies examining associations between metabolites and CVD

Applications of metabolic phenotyping in CVD

Epidemiological studies employing metabolic phenotyping in CVD have focused mainly on the identification of novel biomarkers of CVD, exploring biochemical pathways that underly cardiovascular phenotypes, Mendelian randomisation (MR) to explore causality and assessment of metabolomics in CVD risk prediction.

Early epidemiological studies on metabolomics and CVD outcomes were predominantly cross-sectional investigations reporting metabolites that were mainly consequences of clinical disease. With improved high-throughput technologies, data from prospective investigations have since been generated (table 2 summarises their study design and main findings). We discuss some of the major results from these studies further.

Lipid metabolic pathways

The evidence linking lipid metabolism and dyslipidaemia with CVD is well established. Metabolic profiling has further provided an in-depth characterisation of lipid species in MS technologies and has highlighted the potential role of specific lipid metabolic pathways such as sphingolipids, glycerophospholipids and glycerolipids as biomarkers of disease.

In a large-scale MS-based metabolomics study across three cohort studies, Ganna et al 9 showed that after adjustment for traditional CVD risk factors, four lipid-related metabolites were associated with CHD: lysophosphatidylcholine (LPC) 18:1, LPC 18:2 and sphingomyelin 28:1 (inverse associations) and monoglyceride (MG) 18:2 (direct association). LPCs were also inversely associated with body mass index and presence of subclinical CVD cross-sectionally, while a reverse pattern was observed for MG 18∶2. MG 18∶2 is central in the synthesis and breakdown of triglycerides, while MGs are precursors for diglycerides, which are known to inhibit insulin signalling acting through protein kinase Cε, mechanistically linking excess lipid storage with insulin resistance.21 An inverse association between LPC 18:2 and 17:O in relation to incident myocardial infarction was also reported by another prospective multicohort investigation.8 LPCs are mainly contained in high-density lipoproteins (HDLs) and are produced on hydrolysis of phosphatidylcholines (PCs) in acute and chronic inflammation. Their association with CVD outcomes in these studies add further support to the importance of systemic inflammatory processes in the pathogenesis of CVD.

Several studies have highlighted the importance of glycerophospholipids, mainly sphingomyelins, in CVD. Among them, sphingomyelin 32:1 has been found to be directly associated with incident ischaemic stroke, forearm vasodilation resistance (a marker of subclinical CVD) and inversely with leucocyte count (a marker of inflammation); however, these associations were attenuated when adjusting for traditional risk factors.22 Sphingomyelins are found in plasma and cell membranes and plasma lipoproteins that are involved in apoptosis and oxidative stress and have been associated with atherosclerotic plaque instability and lipid plaque burden.23 Sphingomyelins can be broken down to ceramides, another important class of signalling molecules which may be involved in the pathogenesis of CVD through multiple pathways including lipotoxicity, inflammation and apoptosis.24 Studies in animal models have suggested that pharmacological inhibition or genetic ablation of enzymes driving ceramide synthesis ameliorate atherosclerosis and related pathways.25 However, results from epidemiological studies are inconsistent, with some but not all studies showing a direct association between different ceramides and future CVD.26

Neutral lipids, in particular, cholesterol esters and triacylglycerols (TAGs) with a low carbon number and double-bond content have been associated with CVD in a prospective cohort study.27 TAGs are associated with hepatic steatosis and raised de novo lipogenesis, in part as a consequence of diets high in carbohydrates, in human feeding studies.28

Epidemiological studies have also investigated the role of different lipid particles and lipoproteins through 1H NMR measurements. For example, very low-density lipoprotein (VLDL), intermediate-density lipoprotein (IDL) and LDL, as well as triglycerides in all lipoproteins (mostly HDL particles) were directly associated with myocardial infarction and ischaemic stroke, while cholesterol in large HDL (but not in small) was inversely associated with these outcomes.29 In another prospective investigation of 105 different NMR measured lipoproteins,5 lower-density lipoproteins showed strong direct associations with atherosclerosis and CVD with total apolipoprotein B and apolipoprotein B within total plasma LDL being associated with CVD after adjustment for non-lipid CVD risk factors.

Other metabolic pathways

Other prospective studies have identified an NMR-based metabolic fingerprint associated with CVD. In Würtz et al,30 four metabolites were associated with incident CVD: phenylalanine and monounsaturated fatty acid levels were directly associated with incident CVD, while omega-6 fatty acids and docosahexaenoic acid levels were associated with lower risk of CVD. Vaarhorst et al 31 used the least absolute shrinkage and selection operator (LASSO) algorithm to identify a metabolite score consisting of 13 1H NMR signals associated with CHD, comprising a lipid fraction, glucose, valine, ornithine, glutamine, creatine, glycoproteins, citrate and 1,5-anhydrosorbitol. Meanwhile, Tzoulaki et al 5 evaluated the association of 1H NMR features with coronary artery calcium and intima media thickness. They found a range of metabolites associated with atherosclerosis, including alanine, glycine, methionine, glucose, acetaminophen-glucuronide, glycerol, acetyl glycoproteins, myo-inositol, mannose, 1,5-anhydrosorbitol, glutamate, glutamine, N,N-dimethylglycine, lysine, phenylalanine, 5-oxoproline, 3-hydroxybutyrate, citrate and albumin. Several of these metabolites were also associated with incident CVD, highlighting the importance of these pathways in progression to clinical CVD.

The aforementioned analyses highlight the interactions of amino acids with CVD, in addition to the well-established links of carbohydrates and lipid metabolism with CVD. While further work is needed to understand the role of specific amino acids in CVD, there may be shared mechanisms with insulin resistance and type 2 diabetes, where studies have shown that branched chain amino acids, glycine and phenylalanine contribute to insulin resistance.32 33

Overall, metabolomics studies to date have revealed systemic disturbances and interconnected pathways underlying CVD. These pathways include lipid, fatty acid and carbohydrate metabolism, branched chain amino acids and aromatic acid metabolism, tricarboxylic acid and urea cycle, and muscle metabolism (figure 3). The identified pathways point to the activation of inflammatory processes and oxidative stress as key determinants of CVD. Many of these pathways, lipid metabolism in particular, have shown marked differences between men and women, which may in part explain the sex differences observed in CVD risk. For example, VLDL triglyceride content is higher in men relative to women, while inflammatory glycoprotein acetyl is higher in women across different ages.34

Gut microbiome

Metabolomics can capture metabolic markers of gut microbiota, which have been increasingly recognised to have an important role in cardiometabolic health. Particularly, trimethylamine-N-oxide (TMAO), a gut microbial-host cometabolite of dietary choline and carnitine, was observed to be strongly associated with CHD and diabetes.3 4 35 Supplementation with TMAO or its precursors promoted atherosclerosis in mice, but this effect was not observed in germ-free mice indicating that a gut-microbial step was involved. Specifically TMAO involves the gut microbial transformation of dietary PC and carnitine,36 typical components of a meat-based diet which, in turn, is associated with poor cardiometabolic health (figure 4). Animal studies have shown that increasing TMAO promotes thrombosis, atherosclerosis and metabolic dysfunction, whereas inhibiting TMAO reduces the formation of atherosclerotic lesions and thrombosis potential and improves glucose tolerance.37–39

Figure 4

TMAO pathway and its association with cardiovascular disease. TMA, trimethylamine; TMAO, trimethylamine-N-oxide.

Multiomics analyses and MR

Several studies have integrated genetic and metabolomics data using the MR approach to explore potential causality. In MR, genetic variants associated with metabolite levels are used as instrumental variables to investigate their effects on CVD (figure 5). Since genetic variants are randomly assigned during meiosis, using genetic variants as instruments for metabolite levels mimics the design of a randomised controlled trial.40 For example, MR analyses have supported a possible causal role of MG 18∶2 and apolipoprotein B on CHD risk.41 However, MR analysis relies on specific assumptions such as the absence of horizontal pleiotropy, that is, the effects of the genetic variants on the outcome are assumed to act exclusively through the risk factor (metabolite) of interest. In relation to metabolomics, many lipid-related genetic variants are highly pleiotropic and therefore the standard MR paradigm may not be suitable.42 Systems biology approaches may also be used to highlight links between biological pathways. For example, integration of gene and metabolite networks revealed an interaction network between genes mainly involved in inflammatory, insulin and lipid pathways, and NMR-identified metabolite markers of atherosclerosis (figure 3).5

Figure 5

Example of Mendelian randomisation concept applied to metabolites.

Clinical utility and risk stratification

As well as indicating pathways underpinning CVD, metabolites may serve as novel disease biomarkers to aid CVD risk stratification for disease prevention, although often these studies have been small and underpowered, and their incremental predictive ability over and above traditional risk factors has been limited.5 8 27 30 31 Given the importance of lipids in CVD risk stratification, research effort has concentrated on the role of lipid species and, in particular, in lipoprotein particles. Among these, only apolipoproetin-B and lipoprotein A have shown some evidence for clinical utility in specific patient populations, whereas the clinical utility of lipoprotein subclasses and apolipoprotein profiles remains to be established.43 44

Conclusions and future directions

Metabolomics presents a powerful and promising approach for an in-depth molecular characterisation of cardiovascular health and disease. Advances in analytical technologies and informatics have made possible the generation and analysis of high-dimensional complex metabolic phenotyping data in large population studies. Such studies have highlighted the systemic nature of CVD, including the role of gut microbial cometabolism, branched chain amino acids, glycerophospholipid and cholesterol metabolism, as well as activation of inflammatory processes. Although lipid and inflammatory pathways are already well known with respect to CVD, metabolic phenotyping produces a higher-resolution and comprehensive biological signature, which may lead to new understanding of mechanisms and novel drug targets. The clinical utility of metabolites in risk stratification to predict future cardiovascular risk among healthy individuals, over and above current CVD risk algorithms, has not yet been established. Efforts to harmonise and synthesise the differently generated metabolomic data across studies and platforms are needed to enhance the generalisability, reproducibility, molecular scope and potential for clinical utility of this approach. In addition, integration of metabolomics data with other orthogonal technologies such as proteomics, genomics and epigenetics will provide an even deeper understanding of the underlying biological pathways and mechanisms. In this regard, validation and follow-up of epidemiological findings with mechanistic investigations such as in vitro and animal studies would strengthen the evidence for cause-and-effect relationships.

Ethics statements



  • Contributors IT and AI drafted the paper; PE supervised the project; and all authors made critical revisions.

  • Funding PE is director of the MRC Centre for Environment and Health funded by the UK Medical Research Council (MR/L01341X/1, MR/S019669/1). PE and IT are supported by the British Heart Foundation Centre for Research Excellence at Imperial College London and the National Institute for Health Research Imperial Biomedical Research Centre, Imperial College London. AI is supported by a Stavros Niarchos Foundation grant.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, conduct, reporting or dissemination plans of this research.

  • Provenance and peer review Commissioned; externally peer reviewed.