Microarray analysis: a novel research tool for cardiovascular scientists and physicians
- 1Departments of Medicine and Clinical Pathology, University of Naples, Italy
- 2Department of Medicine, Mayo Clinic, Rochester, Minnesota, USA
- 3Department of Pharmacological Sciences, University of Salerno, Italy
- Correspondence to:
Dr Claudio Napoli, Department of Medicine, UN, PO Box Naples 80131, Italy;
- Accepted 31 December 2002
The massive increase in information on the human DNA sequence and the development of new technologies will have a profound impact on the diagnosis and treatment of cardiovascular diseases. The microarray is a micro-hybridisation based assay. The filter, called microchip or chip, is a special kind of membrane in which are spotted several thousands of oligonucleotides of cDNA fragments coding for known genes or expressed sequence tags. The resulting hybridisation signal on the chip is analysed by a fluorescent scanner and processed with a software package utilising the information on the oligonucleotide or cDNA map of the chip to generate a list of relative gene expression. Microarray technology can be used for many different purposes, most prominently to measure differential gene expression, variations in gene sequence (by analysing the genome of mutant phenotypes), or more recently, the entire binding site for transcription factors. Measurements of gene expression have the advantage of providing all available sequence information for any given experimental design and data interpretation in pursuit of biological understanding. This research tool will contribute to radically changing our understanding of cardiovascular diseases.
- CV, coefficient of variation
- IFNγ, interferon γ
- IGF-1, insulin-like growth factor 1
- KLF-2, Kruppel-like factor
- LAR, log average ratio
- RT-PCR, reverse transcriptase polymerase chain reaction
The collection of genes that are expressed or transcribed from genomic DNA is sometimes referred to as the expression profile or “transcriptome”. In terms of understanding the functions of genes, knowing when, where, and to what extent a gene is expressed is central to investigating the activity and biological roles of its encoded protein. In addition, changes in the multigene patterns of expression can provide clues about regulatory mechanisms and broader bioactivity functions. This knowledge may increase the basic understanding of the cause and consequences of diseases, how drugs and drug candidates work in cells and organisms, and what gene products might have therapeutic uses or may be appropriate targets for therapeutic intervention. Furthermore, profiling genomic transcripts may also assist in the identification of diagnostic indicators and prognostic markers that might direct individualised clinical management. Nevertheless, it is important to emphasise that no new approach replaces conventional methods. Indeed, standard methods such as northern blots, western blots, and reverse transcriptase polymerase chain reaction (RT-PCR) are simply used in more targeted fashion to complement the broader measurements and to follow up on the genes, pathways, and mechanisms identified by new techniques such as microarrays. Here we will illustrate the scientific rationale of the microarray technique and its implications for cardiovascular medicine on the basis of previously published work using this investigational approach.
The microarray is a new microhybridisation based assay which encodes as many as 20 000 genes, offers the opportunity to make a parallel hybridisation of thousands different genes simultaneously, and thus provides unprecedented resolution for relatively unbiased genetic profiling. In analogy to hybridisation assays, it involves a microfilter or “chip” made of a porous membrane or materials such as glass, plastic, silicon, gold, or gel,1,2 in which either oligonucleotides or cDNA fragments are spotted or synthesised at high density (for example, 10 000 per cm2). Probes for the microarray can be complementary DNA (cDNA), RNA, genomic DNA, or plasmid libraries, which are appropriately labelled and hybridised to the chip.3 To measure the resulting hybridisation signals, radioactive and fluorescence detection strategies are used. The result is an image obtained by fluorescence scanner or phosphorimager and that can be processed with computer software to generate a spreadsheet of gene expression values (fig 1). The application of multiple types of statistical analysis to microarray data allows classification and clustering of genes according to their upregulation or downregulation. In principle, data can be linked to expression values to define a list of genes.4,5 Furthermore, microarrays also can be used to analyse genomic DNA rather than mRNA, to characterise the interactions of proteins,6,7 and potentially to characterise other types of molecule with double stranded DNA.8
Preparation of chip microarray
One of the most important applications for arrays so far is the monitoring of the gene expression (mRNA abundance). The most significant obstacle for extraction of reproducible data from experiments is the quality of chip filter. One difficulty is spotting oligonucleotides or cDNAs at high density and high efficiency. Several strategies have been developed for this purpose. The two groundbreaking technologies were the photolithography masks for light directed synthesis of oligonucleotides (pioneered at Kirkland, Washington, USA)9 and the robotic deposition of cDNAs introduced by the Brown laboratory (address).2 Rosetta Inpharmatics (Affymetrix, Inc, Santa Clara, California, USA) developed an approach for generating flexible oligonucleotide arrays based on inkjet technology that allows synthesis of more than 25 000 different sequences on a single 2.5 × 7.5 cm glass slide.10 New techniques are also emerging that will give users the flexibility to create custom arrays. Indeed, Singh-Gasson and colleagues described the use of digitally controlled micromirrors instead of photolithographic masks for light directed synthesis of oligonucleotides in an array to increase the density of spotting.11
Data analysis: many options
Once data are collected, correct analyses have to be applied. Outputs can be displayed as lists, but are more typically visualised by some variation of the red/green light, originally introduced by Eisen and colleagues.12 Many computational analysis tools and packages are then available for data analysis of microarray derived data (some examples are listed in table 1) (also reviewed by Stoeckert and colleagues13). The statistical analysis is important to detect whether there are reliable biologically relevant differences in the expression level. Packages typically include spot quantification, data storage and retrieval, and higher level analysis. The statistical analysis is usually done using either clustering algorithms or conventional statistical tests. A selection of published clustering techniques for expression profile data analysis is listed in table 27,12,14–23 (also reviewed by Bittnerr and colleagues24). The processing of data includes background subtraction, normalisation, and detection of outliers.
The difficulties of data analysis derive from the myriad of potential sources of random and systematic measurement error in the microarray process and from the small number of samples relative to the large number of variables (probes). This issue complicates data analysis and interpretation and jeopardises the validity of many of the microarray findings reported to date.25 For example, statistically significant differences in gene expression may reflect the biasing effects of extraneous factors rather than the biology. Lack of statistical significance can indicate low experimental sensitivity rather than absence of biological effect, while low sensitivity can be caused by an inadequate number of replicates or failure to control extraneous factors that contribute to random error. When analysing gene expression data it is also necessary to evaluate the error derived from measurements. Random error is minimised by repeating the measurements (replicates). Systematic error (bias) is controlled experimentally as far as possible, although additional statistical correction is invariably necessary with current microarray technology. Extraneous factors that contribute to random error or bias of microarray expression values include target accessibility, which is affected by variation in absorbency of discrete nylon membrane, target fixation on membrane, and variation in washing procedures.
Microarray analysis has already provided valuable insights into cancer biology and has assisted in the identification and classification of gene expression patterns in several types of cancer. Although the magnitude of changes in gene expression and the availability of tissue for testing purposes are often more limited in cardiovascular medicine, rapidly accumulating data support the considerable potential of this technique in exploring the mechanisms of disease and the effect of treatment.
Quality control for hybridisation
Multiple spotting of target DNA on a slide provides a means of assessing the quality of data for a gene on that slide. The coefficient of variation (CV) is defined as the standard deviation divided by the mean of the spot intensity for each gene.26,27 The quality of data on the expression level is inversely correlated with its CV. When a CV for one gene is greater than 10% among genes used as internal controls, the data for that gene are considered unreliable. In calibration experiments the same sample is labelled with two different dyes and hybridised to the same slide. The calibration experiment provides a control to investigate possible systemic errors, such as slide effect, dye effect, and gene label effect.26,27 The slide effect is caused by variation in imaging between slides hybridised with the same probe. Factors affecting hybridisation may include the amount of probe DNA immobilised on the slide during array fabrication and the amount of labelled cDNA added to the slide and the local environment in each hybridisation chamber. Fluorescent dyes have different efficiency incorporation during the labelling and they are detected by the scanner with different efficiencies.26,27 The effect of these factors is accounted for by the normalisation curve in normalisation experiments, in which two differentially expressed mRNA pool are separately labelled with two dyes and are co-hybridised to the same slide defining the curve calibration.26 Comparison of expression levels across different cDNA microarray experiments is easier when a common reference is co-hybridised to every microarray. Usually this reference consists of one experimental control sample, a pool of cell lines, or a mix of all samples to be analysed. A mix of the products that are printed and their subsequent amplification toward either sense and antisense cRNA provides an excellent alternative common reference.27
Bootstrapping cluster analysis
A general technique for making statistical inferences from clustering tools applied to gene expression microarray data uses an analysis of variance model to achieve normalisation and estimate differential expression of genes across multiple conditions. Statistical inferences are based on the application of a complicated randomisation technique called bootstrapping (reviewed by Kerr and Churchill14). The bootstrapping procedure, based on the reliability of clustering algorithm analysis (various simulated datasets based on the statistical models are created), relies on experimental replication and good design in microarray experiments. When accuracy is high, the bootstrap estimates of relative expression will be more like the original estimates and the bootstrap clustering will be more like the original clustering. Without an assessment of the clusters one cannot make valid inferences about genes that show similar behaviour. Whatever algorithm is chosen, it is mandatory to assess whether the results are statistically reliable relative to the noise in the data.
CARDIOVASCULAR DISEASES AND THE MICROARRAY TECHNIQUE
Endothelium and shear stress
The microarray technique has been important in investigating cellular mechanisms of vascular physiology and pathophysiology. The DNA microarray approach was used to investigate the gene expression profile in cultured endothelial cells exposed for 24 hours to shear stress.28,29 Similarly, it was shown that mechanical stimuli induced changes in transcription profile of smooth muscle cells.30 Endothelial cells exposed to shear stress for seven days increase the expression of several genes, but in particular a unique endothelial transcription factor, Kruppel-like factor (KLF-2).31 KLF-2 is a target for angiotensin II signalling and is an essential regulator of cardiovascular remodelling.32 The biological response of human endothelial cells to Gram negative lipopolysaccharide has also been assessed by microarray.33,34 These studies led through genomic investigation to the identification of molecules downstream of transforming growth factor β and tumour necrosis factor α signalling, and clarified the chemotactic pathways that participate in infection related vascular dysfunction.35,36
Vascular dysfunction, atherogenesis, and pulmonary circulation
Atherosclerosis is a multifactorial and temporally dynamic disease37,38 that can be modulated by a large number of environmental and genetic factors. The microarray technique is therefore particularly suitable for sifting through the spectrum of candidate genes in an attempt to focus on specific culprits and identify novel pathways associated with atherogenesis. For example, it was demonstrated that homocysteine can activate 3-hydroxy 2-methylglutayl coenzyme reductase in vascular endothelial cells, thereby promoting atherogenesis.39 In smooth muscle cells, a new gene pathway participating in vascular inflammation was identified that involved the eotaxin gene and its receptor CCR3.40 Chip experiments on high density lipoprotein deficient mice revealed several novel genes involved in oxidative processes, vascular dysfunction, and sterol metabolism.41 Recently, we have described a large list of early genes involved in vascular dysfunction and atherogenesis in hypercholesterolaemic mice.42 The microarray has also been used more recently to study vascular function in a murine model of hypertension. It has been used in rats which had quantitative trait loci (QTL) for blood pressure to characterise their related phenotypes on every chromosome.43 New information on gene expression profiles in spontaneously hypertensive rats and Wistar-Kyoto rats is now also available.44
The study of the pulmonary circulation has generated much interest owing to unsolved therapeutic issues such as pulmonary hypertension. Recent studies which attempted to correlate gene expression with pulmonary hypertension and tissue remodelling have also used microarrays.45–47
A very recent study identified an upregulation of collagens in a model of flow induced pulmonary vascular remodelling.48 Thus microarray technology will help us to find causal pathogenic mechanisms involved in pulmonary hypertension and vascular remodelling.
Chronic rejection of allografts is a major clinical and therapeutic problem during heart transplantation, and microarrays can identify early genes involved in this phenomenon.49 Microarrays undertaken on Norway rats after cardiac grafts secondary to ischaemia and reperfusion identified a large number of genes involved in this pathology.50 Although the largest numbers of downregulated genes were seen early, the ischaemia and reperfusion effects may be continuing in many transcripts that remain downregulated up to seven days after transplantation.50 In a rat model subjected to cardiac allografts, interferon (IFN) γ inducible genes were specifically upregulated, suggesting that signalling mediated by IFNγ may play an important role in the late phase of acute rejection in vivo.51 However, the fact that IFNγ deficient mice can rapidly reject cardiac allografts suggests an alternative pathway. DNA microarray analysis also revealed a new pattern of mRNA expression in allografts in IFNγ deficient mice that involved a group of specific genes including macrophage inflammatory protein, C-10-like chemokine, and platelet factor 4.52
Finally, using the Brown Norway to Lewis heterotopic heart transplant model, highly purified RNA was isolated from cardiac tissue on postoperative days 3, 5, and 7 and hybridised onto microarrays by using log average ratio (LAR) analysis.53 Of the 8800 transcripts studied, 2864 were increased on day 3, 1418 on day 5, and 2745 on day 7. Downregulated transcripts included many novel molecules such as SC1 and decorin. Thus LAR provides a useful approach to the analysis of microarray data and the results were well correlated with RT-PCR.
Myocardial infarction, hypertrophy, and heart failure
Cardiac hypertrophy, remodelling, and the progression to heart failure represent prime areas for genomic discovery. Little is known about the differentiation of the cardiomyocytes that are directly involved in cardiac disease. Microarray analysis has identified multiple novel genes and expressed sequence tags involved in the cardiovascular system54 and the genetic programme of cardiomyocyte differentiation.55,56
The programmed cell death of cardiomyocytes is a key event in the development of heart failure and is associated with chronic Akt activation in the heart.57,58 Microarray analysis helped to identify the Nix protein upregulated in myocardial hypertrophy which is necessary for apoptosis of cardiomyocytes and results in decompensation in the murine model.59 Experimental myocardial infarction in animals has many features in common with human myocardial infarction. Indeed, Stanton and colleagues found that more than 200 of 4000 genes were differentially expressed in microarray experiments after myocardial infarction in rats,60 and statistical analysis showed patterns of genes expressed in wound healing, cell signalling, and energetics. Sehl and associates also used microarrays to analyse rat myocardial infarction tissue to identify 14 novel mRNAs that changed their expression pattern significantly after infarction.61 The first examination of a human mRNA population from left ventricular tissues of patients with idiopathic dilated cardiomyopathy and from control tissue identified several genes upregulated in affected individuals.62 The gene coding for ventricular myosin light chain type 2 is a potential clinical target.63 Barrans and colleagues constructed the first human cardiovascular cDNA microarray containing 10 368 genes and selected 38 genes from failing hearts.64 Using a human cardiovascular based cDNA microarray, a molecular profile of dilated cardiomyopathy was also obtained.65 A more detailed analysis of failing versus non-failing human hearts revealed 19 of 7000 genes that were differentially expressed, including genes involved in cytoskeleton, protein turnover, and energetics.66 There is also a decreased SLIM1 expression and increased gelsolin expression in failing human hearts.67 These examples of myocardial expression profiling show that myocardial remodelling draws on multiple interactive pathways that are largely undefined.68
Heart failure resulting from dilated cardiomyopathy or hypertrophic cardiomyopathy appears to develop through different remodelling and molecular pathways.69 Microarrays of RNA samples of left ventricular tissue from patients with dilated cardiomyopathy and hypertrophic cardiomyopathy were hybridised against normal adult heart. The results showed that more than 100 genes were highly expressed in both dilated cardiomyopathy and hypertrophic cardiomyopathy, and several genes were differentially expressed.69 Thus microarray technology provides us with a genomic approach to explore the genetic markers and molecular mechanisms leading to heart failure. Moreover, there is now evidence of a divergent transcriptional responses to independent genetic causes of cardiac hypertrophy.70
Similar techniques show great potential for identifying therapeutic targets. For instance, Jin and colleagues evaluated the effects of captopril on myocardial remodelling for one day after infarction in the rat.71 The results of these experiments showed 37 differentially expressed genes. This strategy has the potential to reveal novel targets in remodelling that are not currently affected by angiotensin converting enzyme inhibitors.
Cardiovascular inflammation and cardiac metabolism
In a rat model of abdominal aortic aneurysm, microarray analysis revealed that pro-oxidant/antioxidant and inflammatory genes were increased more than twofold.72 Indeed, an intervention study with an antioxidant (AEOL 10150) showed an attenuated response of inflammatory genes in response to ischaemia and reperfusion in a mouse model.73 Similarly, a novel pathway of vascular inflammation was identified in human atherosclerosis that involved the overexpression of eotaxin and its receptor CCR3.42 Messenger RNA from rodent abdominal aortic aneurysm tissue was harvested for cDNA labelling and hybridisation to microarray.74 Twenty nine genes were differentially expressed in aneurysm tissue (relative intensity/relative intensity of control ratio > 1.5 and < 0.67), including haem oxygenase 1. Interestingly, α tocopherol was found to be as effective as flow loading in limiting aneurysm enlargement. Thus flow loading may attenuate aneurysm enlargement through wall shear or reduction in oxidative stress.
Microarrays can also be useful to explore pathways involved in cardiac metabolism. As insulin can affect a multitude of pathways in cardiovascular disease, microarrays have been used to study the effects of insulin-like growth factor 1 (IGF-1) on the gene expression profile in cardiomyocytes.75 IGF-1 modulated the expression of several functional categories, such as cell cycle, cellular respiration, and mitochondrial function. The results also showed that the majority of IGF-1 regulated genes required the activation of both ERK and PI 3 kinase. Thus PI 3 kinase and ERK coordinately mediate the transcriptional regulatory effects of IGF-1 in cardiac muscle cells. These findings provide a novel insight into how IGF-1 signalling modulates the programming of cardiac muscle gene expression.75 Nicotine is able to change the expression pattern of genes involved in energy metabolism and the signal transduction pathway. Indeed, using microarray technology demonstrated that rats fed with nicotine for three months have altered expression of genes involved in energy metabolism and signal transduction, such as the mitochondrial ATP synthase β subunit, liver mitochondrial aldehyde dehydrogenase 2, metabotropic glutamate receptor 2, calcium calmodulin dependent protein kinase II β subunit, and so on, when compared with rats fed normally.76
LIMITATIONS OF THE MICROARRAY ANALYSIS
One of the most important problems in microarray experiments is the reproducibility in interpreting gene expression results. To identify the complexity of the subcellular changes that take place and to make more rapid progress in identifying causes and cures of cardiovascular disease, we must consider the emerging high throughput gene profiling technologies. Several reports have already appeared identifying changes in classes and clusters of genes whose expression changes during cardiac remodelling hypertrophy or heart failure.60,62,77 Collectively, these studies represent a major leap forward in sorting out the different pathways in the heart or isolated cardiac myocytes, where changes in gene expression and very probably in protein expression have occurred. However, it is necessary to consider several important issues pertinent to data analysis and interpretation. To have good quality data, it is necessary at the outset to produce high quality labelled RNA or cDNA. Recently, the SMART system for synthesis of a cDNA probe was shown to produce highly reproducible results and to yield gene expression profiles that represent the majority of transcripts detected.78
For the data analysis, it is necessary to consider several issues, the first being the volume of data. For storage and manipulation of data, gigabytes of computer storage and a fast network connection are required. Second, we need to consider the magnitude of changes in gene expression that should be considered significant. Liu and colleagues showed how increasing the number of replicates reduces false detection rates of gene expression change during cardiac remodelling.75 Another approach is to conduct multiple statistical tests and bootstrapping on the same data (see the section on statistical analysis above). In this case, the usual error rates applied to each test are no longer valid. Whereas some theoretical work along these lines has been done, it has not yet been extended to the microarray problem to any great extent.79 However, the numbers of original studies employing microarray expression profiling and the availability of computational tools for analysis of large datasets increase every year. The power of these studies is to produce vast quantities of data that represent the changes in transcription profile of cells or biopsy tissues from selected patients. However, this power is also a limitation of the arrays technique,30,80,81 as one of the most challenging aspects of gene expression analysis is the selection among the vast quantities of data those genes that could have a real causal role. Indeed, the change in the transcriptional level of a gene is not correlated with the causal role of that gene. Sometimes small changes in a key gene can produce a large biological effect. Moreover, when considering microarray data few studies address the combinatorial nature of transcription, a well established phenomenon in eukaryotes.82 In addition, changes in gene expression are not invariably associated with changes in protein synthesis, in which case the significance of altered gene expression may be questionable. Furthermore, rather than altered expression, subcellular translocalisation or post-translational modifications of proteins are also considerably meaningful for their biological effects, but would not be disclosed by microarray techniques. Thus the microarray techniques may be viewed as guiding tools to point the investigator in the right direction, with more conclusive functional information collected in subsequent complementary studies (fig 2).
Much progress has been made in the technologies available to assess global alterations in mRNA levels in clinical research samples. Although such transcript profiling can provide a powerful research tool, the broad range of options can be bewildering for the inexperienced investigator, and more often than not the limitations and pitfalls of this approach are not fully appreciated. It is important to recognise that the major goals of transcript profiling experiments can be to identify novel therapeutic targets or delineate complex patterns of gene expression that provide a potentially pathognomonic phenotype. Furthermore, similar technologies are being used to develop antibody based protein microarray assays,83 which can detect and quantify the expression of many proteins simultaneously, and require minimal amounts of reagents and samples.84 Such assays are therefore potentially powerful tools to address the need for high throughput analysis techniques, but a careful postanalysis follow up and validation of microarray experiments will be needed soon.85 We have presented a review of the clinical and scientific applications of DNA microarray technology with an emphasis on important limitations and implications for cardiovascular medicine, and have discussed possible standards for contextual validation. A projected decrease in cost and complexity may facilitate increased availability of this technology in the physician’s office, and future applications may then allow tailoring of clinical diagnosis and treatment strategies based on individual need.