Background As IT dominates cardiology, the ability of centres to link clinical databases to perform outcome based research has increased significantly. Good quality research relies on the ability to accurately identify and characterise the disease of interest in the population. Heart Failure is one such disease that is often challenging to define from datasets. Uniquely we have the ability to link the Tayside echocardiography dataset to other regional datasets including dispensed prescription and hospitalisation data. Echocardiography reports are commonly comprised of structured, usually numerical values, and a free text component to store overall conclusions or impressions. We therefore sought to develop a computer algorithm to determine LV function from the free text and subsequently to validate the ability to define systolic HF based upon LVSD and loop diuretic therapy.
Methods We iteratively the algorithm to process the free text component the reports and determine the degree of impairment. The algorithm was comprised of a lexicon of words and phrases and applied with negation detection. This was repetitively enhanced by recurrent processing of a subset of the data. The final algorithm was subsequently applied to the full dataset and was validated, first, against blinded manual review of a subset of reports and second by blinded review of the stored images. The data were then linked using a unique patient identifier to the dispensed prescribing data to determine loop diuretic use. The specificity of diagnosis of systolic heart failure was examined by blinded case note review.
Results The database contained 153 836 reports on 63 309 individuals. The lexicon comprised 488 keywords or phrases. When applied to the data 145 525 reports were classified (94.4%), while 8584 remained unclassified. (5969 (70%) contained no information in the free text fields, and the remainder provided either insufficient data on left ventricular function or severe spelling or typographical errors, preventing matching.) 19 758 were classified as having LVSD (5378 (27%) mild, 818 (4%) mild to moderate, 4646 (24%) moderate, 583 (3%) moderate to severe and 8333 (42%) severe). The validation of 1000 reports reviewed for the presence or absence of LVSD found concurrence with the algorithm in 980 (98%) cases. Blinded review of the stored movies and images revealed a 90% concordance for the presence or absence of LVSD. Record linkage with the dispensed prescription dataset identified 9875 individuals with LVSD who also received loop diuretic therapy. Validation, by case note review, demonstrated a 91% concordance with a clinical diagnosis of systolic HF.
Conclusion A computer algorithm can quickly and accurately identify the degree of LVSD from the free text component on an echocardiogram report and the presence of LVSD and combined with loop diuretic use is specific for a diagnosis of systolic heart failure.
- Heart failure