Article Text
Abstract
Background Integration of large proteomics and genetic data in population-based studies can provide insights into discovery of novel biomarkers and potential therapeutic targets for cardiometabolic diseases (CMD). We aimed to synthesise existing evidence on the observational and genetic associations between circulating proteins and CMD.
Methods PubMed, Embase and Web of Science were searched until July 2023 for potentially relevant prospective observational and Mendelian randomisation (MR) studies investigating associations between circulating proteins and CMD, including coronary heart disease, stroke, type 2 diabetes, heart failure, atrial fibrillation and atherosclerosis. Two investigators independently extracted study characteristics using a standard form and pooled data using random effects models.
Results 50 observational, 25 MR and 10 studies performing both analyses were included, involving 26 414 160 non-overlapping participants. Meta-analysis of observational studies revealed 560 proteins associated with CMD, of which 133 proteins were associated with ≥2 CMDs (ie, pleiotropic). There were 245 potentially causal protein biomarkers identified in MR pooled results, involving 23 pleiotropic proteins. IL6RA and MMP12 were each causally associated with seven diseases. 22 protein-disease pairs showed directionally concordant associations in observational and MR pooled estimates. Addition of protein biomarkers to traditional clinical models modestly improved the accuracy of predicting incident CMD, with the highest improvement for heart failure (ΔC-index ~0.2). Of the 245 potentially causal proteins (291 protein-disease pairs), 3 pairs were validated by evidence of drug development from existing drug databases, 288 pairs lacked evidence of drug development and 66 proteins were drug targets approved for other indications.
Conclusions Combined analyses of observational and genetic studies revealed the potential causal role of several proteins in the aetiology of CMD. Novel protein biomarkers are promising targets for drug development and risk stratification.
PROSPERO registration number CRD42022350327.
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
We searched PubMed, Web of Science and Embase from database inception to July 2023 using search terms pertaining to cardiometabolic diseases (CMD) and proteomics without language restrictions. In the past decade, hundreds of large population-based observational and genetic studies have investigated the associations between circulating proteins and CMD. Because of the variation in study designs, sample sizes and proteomic methodologies, the associations between circulating proteins and CMD have been inconclusive.
WHAT THIS STUDY ADDS
The present study systematically assessed the direction and magnitude of the associations between circulating proteins and CMD. Meta-analyses of observational and Mendelian randomisation studies identified 560 and 245 CMD-associated proteins, respectively. Out of 291 tier 1 or 2 protein-disease pairs, 288 showed no evidence in drug development databases, and 66 proteins were recognised as drug targets approved for other indications. Furthermore, integration of protein biomarkers into traditional clinical models modestly enhanced the prediction of incident CMD.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
This systematic review and meta-analysis provides a thorough evaluation of the current evidence on the role of circulating proteins in CMD. By integrating proteomics and genomics, the approach we adopted can inform the selection of protein biomarkers to improve risk stratification of CMD. Additionally, this method can be used in the early stages of drug discovery to identify promising targets, and it can be integrated with traditional approaches to improve the assessment of drug repurposing opportunities.
Introduction
Cardiometabolic diseases (CMDs) are the leading cause of death and disability globally.1 Proteins play critical roles in the biological processes involved in CMD and constitute effective drug targets.2 Proteomics informs the holistic and comprehensive understanding of molecular and cellular mechanisms underlying the pathogenesis of diseases.3 In recent years, high-throughput proteomic assays have flourished and proteomics has been widely used in large-scale, population-based studies.
Mendelian randomisation (MR) uses the random allocation of genetic alleles during meiosis and uses genetic variants specifically related to a particular exposure to examine the causal effect of the exposure on the disease.4 Compared with observational studies, MR studies seek to establish whether specific proteins are causally related to CMD risk or represent downstream markers of CMD-related processes. By providing evidence of causation, MR has the potential to accelerate genetics-guided drug discovery.5
In the past decade, hundreds of observational and MR studies have examined the associations between circulating proteins and CMD. However, the study design, sample size and proteomic assays varied across studies. Therefore, we undertook a systematic review and meta-analysis to assess the direction and magnitude of the associations between circulating proteins and CMD, so as to provide clues and references for research on potential biological mechanisms and drug targets of CMD.
Materials and methods
Literature search, study selection and data extraction
We systematically searched PubMed, Embase and Web of Science from inception to 11 July 2023. We included prospective observational and MR studies that investigated the associations between circulating proteins and CMD, including coronary heart disease (CHD, I20–I25), stroke (I60–I69), type 2 diabetes (T2D, E11), heart failure (HF, I50), atrial fibrillation (AF, I48) and atherosclerosis. When two or more studies reported data from the same cohort or consortium, only the study with the largest number of participants was included. We excluded (1) studies that used postmortem blood, tissue or urine samples; (2) studies that recruited patients with non-CMD (eg, dementia, arthritis) at baseline; and (3) in vivo and in vitro studies. We also manually searched the reference list of the retrieved review articles to identify other studies. No language restriction was imposed, and all included studies were in English. Studies were evaluated against the inclusion and exclusion criteria by two independent researchers and any difference was resolved via discussion with a third researcher. Data from included studies were extracted into predefined tables by two researchers independently. The study protocol was registered with PROSPERO (registration number: CRD42022350327). Detailed search strategies and data extraction procedures were shown in online supplemental eMethods.
Supplemental material
Quality assessment
Quality assessment was conducted for observational and MR studies separately. For observational studies, the quality of each study was assessed according to the Newcastle-Ottawa Scale (NOS) by two reviewers independently. NOS covered three domains: subject representativeness, comparability of subjects and ascertainment of risk. The length of follow-up was set at a minimum of 5 years to be considered as adequate. NOS scores were categorised as high quality (seven to nine stars), moderate quality (four to six stars) and low quality (zero to three stars).
Statistical analysis
Before meta-analyses, we standardised names of proteins according to Unified Protein Database (online supplemental table S1).6 Meta-analyses were performed separately for observational and MR studies and for each specific disease outcome. Protein-disease pairs with ≥2 non-overlapping reports were included. Relative risk (RR) estimates for observational studies and OR estimates for MR studies per one SD increase in protein levels were pooled using random effects models via R package ‘metafor’. Between-study heterogeneity was assessed using the I2 statistic.
Supplemental material
For MR studies, analyses were restricted to European population because 91.4% studies were conducted in Europeans and excluded proteins using trans-protein quantitative trait loci (pQTL) to avoid horizontal pleiotropy. We graded the evidence of proteins in MR studies (figure 1). Meanwhile, significant protein-disease pairs reported by only one MR study were also graded and presented for their valuable insights into causality. For proteins with tier 1 or 2 MR evidence (ie, top levels of certainty), we conducted pathway enrichment analysis7 and evaluated the druggability8 (online supplemental eMethods).
To enhance the credibility and interpretability of our results, we compared findings from observational and MR studies. A protein-disease pair was considered consistent if: (1) the association was significant in observational meta-analysis and graded as tier 1 or 2 on MR evidence, and (2) the directions of effect estimates were concordant in both analyses.
A two-sided p value <0.05 was considered statistically significant. A Benjamini-Hochberg false discovery rate <5% was used to account for multiple comparisons. All statistical analyses were conducted using R V.4.2.2.
Patient and public involvement in research
We report no patient or public involvement in the design or implementation of the study.
Results
The overview of analytical approaches and key findings were presented in figure 1. The literature search generated 14 932 records, and 85 studies were included in the final analysis, involving 50 prospective observational studies, 25 MR studies and 10 studies performing both observational and MR analyses (online supplemental figure S1). The characteristics of included studies were summarised in online supplemental table S2. For a full reference list, see online supplemental file 1.
Observational associations between proteins and CMD
A total of 60 studies examined the associations between proteins and incident CMDs, reporting results for 3788 protein-disease pairs. 2318 pairs with two or more reports were included in meta-analysis. Of these, the associations of 748 pairs remained significant in meta-analysis (figures 2 and 3A). The number of proteins included in each stage is summarised by diseases in figure 2. Among all stroke subtypes, only incident ischaemic stroke (IS) was investigated and included in meta-analysis. Moderate heterogeneity was observed for observational pooled results, and 45.8% pairs had I2≥80%. Detailed effect estimates of meta-analysis specific for each disease were summarised in online supplemental tables S3–S8.
In our pooled results, 133 proteins were associated with risk of two or more CMDs, referred to as ‘pleiotropic protein’ (figure 4). These included 94 proteins associated with 2 diseases, 27 proteins with 3 diseases, 9 proteins (FABP4, IBP2, IL6, MMP12, ANFB, TNR1B, TR10B, UPAR, HGF) with 4 diseases and 3 proteins (GDF15, HAVR1, MMP7) with 5 diseases. The directions and strengths of associations between single protein and different diseases differed. 111 showed directionally concordant associations with all disease types, including positive associations for 83 proteins and inverse associations for 28 proteins. In contrast, 22 proteins showed opposite associations with different diseases (ie, positive associations with some and inverse associations with the others).
Genetic associations between proteins and CMD
The evaluation of MR evidence included 35 studies assessing circulating proteins as possible causal biomarkers for CMDs, with 10 531 protein-disease pairs reported and 1614 pairs eligible for meta-analysis. Different from the observational studies, the genetic associations between proteins and six stroke subtypes were investigated, including total stroke, IS, large artery stroke (LAS), cardioembolic stroke (CES), small vessel stroke, haemorrhagic stroke (HS) and subarachnoid haemorrhage. The certainty of evidence derived from MR studies was divided into four tiers, and 245 proteins were graded as tier 1 and tier 2 (figures 1 and 2 and online supplemental figure S2). Moderate heterogeneity was observed for MR pooled results, and 14.2% pairs had I2≥80%. Detailed effect estimates for each disease were summarised in online supplemental tables S9–S21.
When comparing the observational and genetic associations in the same study, 39 of 246 protein-disease pairs (15.8%) showed consistent results (online supplemental table S22). Of 1731 protein-disease pairs investigated in both of observational and MR pooled analyses, only 22 pairs showed directionally consistent associations (ie, satisfying significant observational associations and tier 1–2 proteins on MR evidence, figure 2).
Of the 35 proteins significant in the meta-analysis of observational studies for CHD, only MMP12 was tier 1 or 2 targets in MR studies (figure 2), but the directions of associations were inconsistent with observational studies (OR 1.29; 95% CI 1.09 to 1.52; p=0.003) and MR studies (OR 0.97; 95% CI 0.94 to 1.00; p=0.022).
Within the set of 31 proteins exhibiting significance in observational results for IS, ADML and MMP12 were also identified as tier 1 or 2 targets (figure 2). ADML and MMP12 were associated with higher risk of IS in observational meta-analysis, while both of them were associated with lower risk of IS in MR studies (figure 3).
Among the 323 proteins found significant in observational studies for T2D, 15 proteins belonged to tier 1 or 2 targets (figure 2). Nine proteins showed directionally consistent associations with risk of T2D between observational and MR studies, and the remaining six proteins showed opposite associations (figure 3).
In the set of 286 HF-associated proteins identified in the meta-analysis of observational studies, the MR evidence of 27 proteins was graded as tier 1 or 2 (figure 2). The results of 12 proteins were directionally consistent in observational and MR analyses, and the results of 15 proteins were directionally opposite (figure 3).
There were 57 proteins significant in the observational results for AF, among which only three proteins were classified with tier 1 or 2 MR evidence (figure 2). SPON1 was directionally consistent (RR 1.37; 95% CI 1.11 to 1.69 in observational studies vs OR 1.08; 95% CI 1.02 to 1.15 in MR studies); the remaining two were directionally inconsistent, namely FBLN3 (RR 1.80; 95% CI 1.50 to 2.17 in observational studies vs OR 0.94; 95% CI 0.90 to 0.97 in MR studies) and LEP (RR 0.90; 95% CI 0.81 to 1.00 in observational studies vs OR 1.14; 95% CI 1.00 to 1.29 in MR studies).
Of the 16 proteins significantly associated with atherosclerosis in observational studies, only ANFB was considered as tier 1 or 2 in MR pooled results (figure 2), which was inversely associated with risk of atherosclerosis in MR studies (β, −0.006; 95% CI −0.009 to −0.003; p=4.40×10−5), but showed positive association in observational pooled results (β, 0.006; 95% CI 0.001 to 0.010; p=0.014).
Combining the associations between a single protein and various CMDs, we identified 23 tier 1 and 2 proteins associated with risk of two or more CMDs, referred to as ‘pleiotropic protein’ (online supplemental figure S2). These included 14 proteins associated with 2 diseases, 3 proteins with 3 diseases, 3 proteins (TMPS5, TNF12, TNR5) with 4 diseases, 1 protein (LPA) with 6 diseases and 2 proteins (IL6RA and MMP12) associated with 7 diseases. The directions and strengths of associations between single protein with different diseases differed. Of these 23 proteins, 18 showed directionally concordant associations with all disease types, including positive associations for eight proteins (LPA, BGAT, FGF5, HSPB1, I15RA, MMP3, NELL1, TMPS5) and inverse associations for 10 proteins (CATD, DHPR, ERAP1, FCG2A, IL6RA, MMP12, QSOX2, SCAR5, TFPI1, TNR5). In contrast, five proteins showed directionally opposite associations with different diseases (CFAI, IL1R2, MANBA, SPON1, TNF12).
Quality assessment
For observational studies, 54 studies (90.0%) were graded as high quality, and the remaining six studies with an NOS score of ≤6 were considered as moderate quality (online supplemental table S23). Online supplemental table S24 summarises the validation of three assumptions by each MR study. Assumption 1 was validated in eight studies (22.3%), and assumptions 2 and 3 were verified in 12 studies (34.3%). All three assumptions were validated in six studies (17.7%). To reduce bias due to pleiotropy, 29 studies (82.9%) restricted instrumental variables to cis-pQTLs, and 19 studies (54.3%) employed MR-Egger regression as sensitivity analysis, of which 14 studies (73.7%) reported no significant signs of horizontal pleiotropy (online supplemental table S24).
Risk prediction models including proteins
29 studies constructed risk prediction models for incident CMD including proteins and compared that with clinical risk models (online supplemental table S25). The number of proteins included in the model ranged from 1 to 291 (median 6, IQR 1–20). Although protein models showed better discrimination over the clinical risk model, the majority had limited improvement (figure 5). 13 out of 79 models (16.5%) improved the C-index by ≥0.10 and 32 models (40.5%) reported significant improvement. There were 28 models (35.4%) with proteins reaching a C-index ≥0.8, half of which had a base model without proteins with a C-index <0.8. The most commonly included proteins were ANFB (19 models) and IBP2 (9 models). Disease outcomes with C-index improvement ≥0.10 were atherosclerotic cardiovascular disease (ASCVD) (n=2), T2D (n=1) and HF (n=10). In the top two models that improved the C-index for predicting HF (difference of C-index≈0.19), both included PRELP, LEG9, NEMO and UPAR.
Evaluation of druggability and clinical development activity
Of the proteins identified as tier 1 or 2, there were 102 (41.6%) established drug targets in the database (online supplemental table S26). These included three target-indication pairs (CP3A4-HF, LYAM2-HF and PLMN-HF) that had already been approved as treatments. There were no reports of drug targets or drug development for the remaining 288 protein-disease pairs. Additionally, a total of 66 tier 1 or 2 proteins were targets of licensed drugs for indications different from the diseases implicated by our MR pooled results (online supplemental figure S3).
Functional annotation and enrichment analysis
Of the proteins identified as tier 1 or tier 2, there were 4, 8, 19, 85 and 12 GO biological processes identified for CHD, IS, T2D, HF and AF (p<0.05), and 9 processes were related to ≥2 diseases (online supplemental table S27). The top 20 GO biological processes (ie, terms with lowest p value) were shown in figure 6A. There were 5, 2, 6, 11 and 2 KEGG pathways identified for CHD, IS, T2D, HF and AF (p<0.05), respectively, and 4 pathways were related to ≥2 diseases (online supplemental table S28, figure 6B).
Discussion
In the current study, 560 proteins were observationally associated with CMD (including 133 proteins associated with ≥2 CMD subtypes), while 245 proteins showed genetic associations with CMD (including 23 proteins showing pleiotropic effects). 22 protein-disease pairs showed directionally consistent associations in observational and MR pooled estimates. 288 tier 1 or tier 2 protein-disease pairs were not reported for drug development and 66 proteins were drug targets approved for other indications, providing new possible targets for drug development and repurposing opportunities for existing drug targets. Addition of proteins to a clinical factor model modestly improved risk prediction for incident CMD.
Several proteins showed consistent associations with diseases in both observational and MR analyses (eg, TNF12), whereas some yielded inconsistent results (eg, MMP12). TNF12 (also known as TNFSF12) was inversely associated with CES and AF in MR studies, and with HF in observational studies. TNF12 has been investigated in phase I and II trials for lupus nephritis, rheumatoid arthritis and neoplasms, but not for CMD.8 MMP12 plays an important role in maintaining vein wall structure and function.9 The pooled MR results revealed that MMP12 was associated with lower risk of CHD, IS, LAS, CES, T2D and HF, but a higher risk of HS. In contrast, observational meta-analysis found MMP12 positively associated with risk of CHD, HF and atherosclerosis. As a therapeutic target, lithostat, an MMP12 inhibitor, is used to treat urea splitting bacterial infections of the urinary tract,8 and two other MMP12 inhibitors (ie, neovastat, marimastat) did not improve cancer survival in phase III trials.10 11 No CMD-related drug development for MMP12 was found. The heterogeneity between observational and MR studies might be partly explained by confounding and reverse causality in observational studies12 and the validity of MR assumptions.13
Our findings suggested that a targeted proteomics panel might improve CMD risk prediction. 32 of 79 models included in this study showed better performance of the protein model over the conventional clinical model. However, only 13 of these models improved the C-index by ≥0.10. Previous studies also indicated that protein risk scores14 and polygenic risk scores15 (also applying omics data to CMD risk prediction models) both provided statistically significant but modest improvement in discrimination. The protein risk score increased the C-index by 0.014 for ASCVD prediction,14 while the polygenic risk score improved the C-index by 0.024 for CHD prediction.15 Due to the relatively high economic cost of high-throughput proteomic tests and the heterogeneity of proteins proposed by different studies, it remained to be cautiously determined whether protein biomarkers are clinically useful to screen for future CMD.16
The current study identified 288 protein-disease pairs to be putative causal biomarkers but without evidence of drug development, representing potential new therapeutic targets for CMD. This study also observed putative drug-repurposing opportunities of some existing drugs. For example, PAR1 protein is targeted by vorapaxar,17 which is used to reduce thrombotic cardiovascular events in patients with history of myocardial infarction or peripheral arterial disease.18 Our study showed both observational and genetic evidence supporting the role of inhibition of circulating level of PAR1 on reducing HF risk, implying a repurposing opportunity of vorapaxar on HF prevention.
This study also had several limitations. First, most included studies used high-throughput, targeted proteomic platforms covering 85–5000 proteins, with varying protein content. These platforms selected proteins related to cardiometabolic health and other factors based on hypotheses drawn from previous research. However, it is possible that some unmeasured proteins may still be associated with CMDs. Second, we did not perform subgroup analyses by regions or race/ethnicity as the number of studies in non-European populations was limited. Third, participants included in the meta-analysis were mostly Europeans, and the results might not be directly generalisable to populations with different ethnic/racial backgrounds. Fourth, although our results indicated potential causal roles of certain proteins in CMD, the restriction of biological samples to blood specimens indicated that the results did not specifically address in which tissue the effects may be mediated. Fifth, horizontal pleiotropy (ie, instrumental variables additionally influence the outcome through pathways that bypass the exposure19) is a major consideration and limitation to MR studies. We conducted a quality assessment of MR studies, focusing on addressing horizontal pleiotropy, and only included cis-loci in the meta-analysis to minimise potential bias. Lastly, although adherence to Hardy-Weinberg equilibrium was used to control potential genotyping errors in each MR study included, establishing causality is challenging due to the heterogeneity in study designs and proteomics coverage. Therefore, we employed several approaches to identify significant proteins and enhance results credibility, including meta-analysis, MR evidence grading and druggability evaluation.
In conclusion, this study comprehensively integrated evidence on the observational and genetic associations of proteins with CMD, revealing the important roles of circulating proteins in CMD. The identification of novel protein biomarkers offered promising targets for drug development and risk stratification. These findings enhanced our understanding of CMD aetiology and highlighted the potential of circulating proteins as biomarkers and therapeutic targets, paving the way for future research and clinical applications.
Data availability statement
All data relevant to the study are included in the article or uploaded as supplementary information.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
TW and YK contributed equally.
Contributors YP, TW and YK conceived and designed the research. TW and YK collected, analysed and interpreted the data. TW and YK drafted the manuscript. YP revised the paper. YL, ZW, JL, CY, DS, PY, CK, ZC and LL contributed to significant amendments to the final manuscript. YP is responsible for the overall content as the guarantor. All authors read and approved the final manuscript.
Funding This work was supported by the National Natural Science Foundation of China (82304223, 82192904, 82192901, 82192900).
Disclaimer The funders had no role in the study design, data collection, data analysis and interpretation, writing of the report, or the decision to submit the article for publication.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.