Article Text

Download PDFPDF

Diagnostic accuracy of handheld electrocardiogram devices in detecting atrial fibrillation in adults in community versus hospital settings: a systematic review and meta-analysis
  1. Kam Cheong Wong1,2,3,4,
  2. Harry Klimis1,2,5,
  3. Nicole Lowres6,
  4. Amy von Huben1,
  5. Simone Marschner1,
  6. Clara K Chow1,2,5,6
  1. 1 Westmead Applied Research Centre, The University of Sydney, Westmead, New South Wales, Australia
  2. 2 Westmead Clinical School, The University of Sydney, Sydney, New South Wales, Australia
  3. 3 Bathurst Rural Clinical School, School of Medicine, Western Sydney University, Bathurst, New South Wales, Australia
  4. 4 School of Rural Health, Faculty of Medicine and Health, The University of Sydney, Orange, New South Wales, Australia
  5. 5 Department of Cardiology, Westmead Hospital, Westmead, New South Wales, Australia
  6. 6 Heart Research Institute, Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia
  1. Correspondence to Dr Kam Cheong Wong, Westmead Applied Research Centre, The University of Sydney, Westmead, NSW 2145, Australia; kam.wong{at}


With increasing use of handheld ECG devices for atrial fibrillation (AF) screening, it is important to understand their accuracy in community and hospital settings and how it differs among settings and other factors. A systematic review of eligible studies from community or hospital settings reporting the diagnostic accuracy of handheld ECG devices (ie, devices producing a rhythm strip) in detecting AF in adults, compared with a gold standard 12-lead ECG or Holter monitor, was performed. Bivariate hierarchical random-effects meta-analysis and meta-regression were performed using R V.3.6.0. The search identified 858 articles, of which 14 were included. Six studies recruited from community (n=6064 ECGs) and eight studies from hospital (n=2116 ECGs) settings. The pooled sensitivity was 89% (95% CI 81% to 94%) in the community and 92% (95% CI 83% to 97%) in the hospital. The pooled specificity was 99% (95% CI 98% to 99%) in the community and 95% (95% CI 90% to 98%) in the hospital. Accuracy of ECG devices varied: sensitivity ranged from 54.5% to 100% and specificity ranged from 61.9% to 100%. Meta-regression showed that setting (p=0.032) and ECG device type (p=0.022) significantly contributed to variations in sensitivity and specificity. The pooled sensitivity and specificity of single-lead handheld ECG devices were high. Setting and handheld ECG device type were significant factors of variation in sensitivity and specificity. These findings suggest that the setting including user training and handheld ECG device type should be carefully reviewed.

  • atrial fibrillation
  • electrocardiography
  • eHealth/telemedicine/mobile health

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Video Abstract


Atrial fibrillation (AF) affects 33.3 million people worldwide1 and is a major cause of stroke2; however, many individuals are asymptomatic and undiagnosed.3 4 Several guidelines recommend that people 65 years and older have opportunistic screening for AF using pulse palpation followed by ECG.5–7 However, clinicians’ workload, lack of staff and time pressure are barriers to its implementation.8 Handheld ECG devices may be a pragmatic alternative to pulse palpation.9 10 Some recent guidelines advocate use of handheld ECG devices with automatic ECG interpretation as an aid to AF screening.5 These devices may be less time-consuming to use and could potentially relieve time pressure faced by clinicians.

Mass opportunistic AF screening is more likely to occur in the community as opposed to the hospital, and it is important to understand the accuracy of different handheld ECG devices in different settings to inform the implementation of screening at scale. Desteghe et al 11 assessed the accuracy of two single-lead handheld ECG devices (AliveCor and MyDiagnostick) in the hospital and reported suboptimal sensitivity (range 54.5%–81.8%) but good specificity (range 94.2%–97.5%). On the other hand, Lown et al 12 studied three single-lead ECG devices (AliveCor, Polar-H7 and Body Guard 2) in the community and reported high sensitivity (range 87.8%–96.3%) and specificity (range 98.2%–98.8%). The factors contributing to the differences were unknown. However, variations in setting and characteristics of the population screened could potentially affect the diagnostic accuracy of testing devices, that is, a phenomenon called spectrum effect has been reported.13–15 But the potential existence of spectrum effect of handheld ECG devices used in different settings and the possible variation that it may cause in diagnostic accuracy are not well examined. Therefore, we aimed to determine if the accuracy for detection of AF using handheld ECG automated algorithm is influenced by the type of device used, the setting from which participants were recruited or the characteristics of the participants screened.


This systematic review complies with the ‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy’ (PRISMA-DTA) statement16 (online supplementary file1).

Supplemental material

Search strategy

Databases (MEDLINE, EMBASE, Cochrane Central Register of Clinical Trials, Cumulative Index to Nursing and Allied Health Literature, Web of Science and SCOPUS) were searched from 1993 to 17 March 2019, as telehealth was first indexed in MEDLINE in 1993. The search was limited to human studies published in the English language. Four key topics (ECG, handheld/mobile phone, AF and diagnostic accuracy) were explored to find all relevant main headings. Each main heading was further explored to include all subheadings, for example, medical subject heading in MEDLINE and Emtree thesaurus in EMBASE. In addition, text-word searches were performed. The search strategy is detailed in online supplementary file 2. The articles were imported into EndNote software and duplications were removed.

Supplemental material

Data extraction and risk of bias assessment

Two reviewers (KCW and HK) independently screened all titles and abstracts to identify articles for full-text review according to the inclusion and exclusion criteria (table 1). The results were compared and disagreements were resolved through discussion, and if required involved a third reviewer. The following data were extracted from the studies: study design, setting, target population, number and average age of participants, handheld ECG device, reference standard (12-lead ECG or Holter monitor), clinician interpreting the ECG, blinding in interpretation, and primary data for the automated algorithm of the handheld device for sensitivity, specificity, true positive, false positive, true negative and false negative. In studies that involved more than one person interpreting the same rhythm strips, the consensus results were extracted.

Table 1

Inclusion and exclusion criteria

The methodological quality and applicability of the studies were assessed using the ‘Quality Assessment of Diagnostic Accuracy Studies-2’ (QUADAS-2) checklist.17 The risk of bias was assessed by KCW, HK and NL using the signalling questions under the four domains of patient selection, index test, reference standard, and flow and timing of the study, and the risk was classified as low, high or unclear. Disagreement was resolved by discussion.

The primary outcome of this study was sensitivity and specificity of the automated algorithms for the handheld ECG devices according to the setting and comparisons of different handheld ECG devices and participant characteristics.

Data synthesis and meta-analysis

The average age of participants in each study was extracted to compute the overall median age and range. Meta-analysis was performed using the bivariate hierarchical random-effects method18 to obtain the pooled sensitivity and specificity and construct forest plots with 95% CIs. The hierarchical summary receiver operating characteristic (HsROC) model was applied to construct an HsROC plot with 95% CIs. Meta-regression was performed to assess the effect of covariates (setting, type of ECG device and average age of study population) on sensitivity and specificity. Rhythm strip (ECG) was used as the unit of analysis. Sensitivity analyses were performed to assess whether the results were sensitive to exclusion of individual studies that were identified as having either unclear or high risk of bias for any of the QUADAS-2 domains. All analyses were performed using R V.3.6.0 software (R Core Team 2019).19 20

Publication bias was not assessed as there are no reliable methods for analysing publication bias in reviews of diagnostic test accuracy (DTA),16 18 21 and the recent PRISMA-DTA Guidelines16 has removed reporting of publication bias. Heterogeneity I2 index was not performed as it is not applicable in DTA study according to PRISMA-DTA.16


Study and participants’ characteristics

Of the total 858 articles, 175 were identified for full-text review, of which 14 articles were finally included (figure 1). Five studies recruited participants from the community (general practice n=5).12 22–25 Eight studies recruited from the hospital (cardiology wards n=511 26–29; AF centres n=230 31; emergency department n=132). One study33 recruited from both community (general practice) and hospital (cardiology ward and AF centre), reporting the results separately for each setting, and therefore the results for each setting were extracted separately. The resultant 14 studies had a total of 7069 participants, of whom 5323 were from the community and 1746 from the hospital. Overall, patients from the community were older (median 73.9 years, range 67.8–74.6 years) compared with those from the hospital (median 61.7 years, range 55.0–69.4 years). In the community, individuals attending general practices to receive their influenza vaccinations,24 33 patients with hypertension/diabetes mellitus, and/or individuals aged ≥65 years22 ,23 and patients with and without a diagnosis of AF12 25 were recruited. In the hospital, patients with specific morbidities seeking medical attention at cardiology wards,11 26–29 emergency department32 and AF centres30 31 33 were recruited (online supplementary file 3).

Supplemental material

Figure 1

Flow diagram of the literature selection process and the results of searches.

Seven different single-lead handheld ECG devices were used across the studies: AliveCor (Kardia) (n=8),11 12 22–24 27 29 31 MyDiagnostick (n=3),11 25 33 Omron HCG-801 (n=2),30 32 Beurer (n=1),26 ECG Check (n=1),28 Body Guard 2 (n=1)12 and Polar-H7 (n=1).12 Two studies assessed the same participants with multiple devices—Lown et al 12 (AliveCor, Polar HP7 and Body Guard 2) and Desteghe et al 11 (AliveCor and MyDiagnostick)—and extracted the results separately for each device. A total of 18 rows of results were obtained (online supplement 3 – table). In total 8180 unique ECGs were obtained, with the majority performed in the community (n=6064) compared with the hospital (n=2116). In all studies, the reference standard was a conventional 12-lead ECG. General cardiologists interpreted ECGs in all community studies, whereas in the hospital four studies were interpreted by electrophysiologists, three by general cardiologists and two studies did not report the type of ECG interpreter (online supplement 3 – table).

Study quality and the assessment of risk of bias

Two studies12 25 were assessed with potential high risk of bias in patient selection, as they applied case–control methodology in selecting patients in the community. Three studies30 31 33 were assigned unclear risk of bias in patient selection, as they recruited patients in AF centres and included some patients with known AF status. One study11 was assigned unclear risk of bias in the index test (ie, handheld ECG) as it used a modified diagnostic algorithm. Another study27 was assigned unclear risk of bias in the domains of index test and reference standard as the blinding process was not reported, whereas all other studies applied blinding in interpreting rhythm strips from the index test and reference standard. There was a low risk of bias assigned in the domain ‘flow and timing’ in all studies as the handheld ECG and conventional 12-lead ECG were acquired within the same screening session. All studies were assessed to have low risk of bias in applicability in all domains, as the patients, handheld ECG devices and reference standard matched our inclusion criteria and review question (figure 2).

Figure 2

Risk of bias of primary studies assessed using QUADAS-2. AF, atrial fibrillation; QUADAS-2, Quality Assessment of Diagnostic Accuracy Studies-2. Low risk of bias Embedded Image High risk of biasEmbedded Image Unclear risk of biasEmbedded Image.

Comparison of handheld ECG devices

The pooled sensitivity of all devices was 89% (95% CI 81% to 94%) in the community and 92% (95% CI 83% to 97%) in the hospital. When the analysis was restricted to the most common device, AliveCor, the pooled sensitivity was lower in both the community (4 studies,12 22–24 n=4371 ECGs) and the hospital (4 studies,11 27 29 31 n=760 ECGs): sensitivity in the community was 82% (95% CI 65% to 91%) and in the hospital was 91% (95% CI 66% to 98%) (figure 3).

Figure 3

Forest plots of sensitivity and specificity for studies that applied AliveCor single-lead handheld ECG device by setting. AF, atrial fibrillation.

The pooled specificity of all devices in the community was 99% (95% CI 98% to 99%) and in the hospital was 95% (95% CI 90% to 98%). The pooled specificity of AliveCor in the community was 99% (95% CI 99% to 100%) and in the hospital was 97% (95% CI 94% to 98%) (figure 3).

Variations in sensitivity and specificity of each device by setting are shown in figure 4. Across the devices there was a large variation in sensitivity. In contrast specificity varied less across the devices, with one exception, Beurer ME90,26 which reported a specificity of 61.9% (95% CI 51.9% to 71.2%).

Figure 4

Comparison of sensitivity and specificity of handheld ECG devices in detecting atrial fibrillation by community and hospital settings.


Meta-regression showed that setting (community vs hospital, p=0.032), handheld ECG device (p=0.022) and ECG interpreter (p=0.005) had significant contribution to the variations in sensitivity and specificity, but the average age of study population had insignificant contribution (p=0.491).

Meta-regression restricted to the most common device, AliveCor, showed that setting (p=0.002; figure 5) and ECG interpreter (p=0.013) had significant contribution to the variations in sensitivity and specificity, but the average age of study population had insignificant contribution (p=0.265).

Figure 5

Hierarchical summary receiver operating characteristic plot for studies that applied AliveCor single-lead handheld ECG device by setting.

Sensitivity analyses

Excluding the two case–control studies12 25 identified with potential high risk of bias in patient selection did not change the results of meta-regression. Similarly, the results of the meta-regression remained significant after exclusion of the three studies30 31 33 with unclear risk of bias in patient selection, one study27 with unclear risk of bias in blinding process, and one study11 with unclear risk of bias in using a modified diagnostic algorithm (table 2).

Table 2

Summary results of meta-regression on significance of setting (community vs hospital) in variations in sensitivity and specificity of handheld ECG devices


There was substantial variation in sensitivity, although less so for specificity, across the studies. Setting, handheld ECG device and ECG interpreter were significant drivers of variation, but the average age of study population was insignificant. Strategies to screen for AF in the community have become increasingly important due to the high prevalence of undetected AF and high morbidity associated with the cardioembolic stroke-related sequelae. Awareness of the potential factors that influence the diagnostic accuracy of potential screening tests is of utility as it may aid in better estimating the yield and cost-effectiveness of screening tests, but also guide potential training needs for users of handheld ECG devices and address factors that may impact on diagnostic accuracy. The finding that setting influences sensitivity and specificity may suggest that there are factors in the environment in which these tests are used or in how tests are implemented in practice that may affect their accuracy and hence probably should be considered during implementation programmes. It may also be that there is clustering of characteristics of people who present at different settings, such as the type of AF (paroxysmal, persistent and permanent AF) and various types of arrhythmia, which can also account for some of the variations; however, the average age of the study population was not a significant modifier.

Spectrum effect might have contributed to the variations of results observed in this review. Comparing with the community, the participants in the hospital had different morbidities, for example, cardiac diseases and various arrhythmias, which rendered them seeking medical attention. In the hospital, handheld ECG devices had lower specificity in detecting AF (95%, 95% CI 90% to 98%) compared with the community (99%, 95% CI 98% to 99%). However, we found insignificant difference in the sensitivity of the devices in detecting AF between community (89%, 95% CI 81% to 94%) and hospital (92%, 95% CI 83% to 97%).

Community studies aimed to screen for AF in a cohort of community patients, and they usually involved multiple health professionals using the device. On the other hand, hospital studies aimed to test the accuracy of the device in detecting AF among patients seeking medical attention in the hospital and involved fewer health professionals using the device, for example, in an AF centre31 patients with a known history of paroxysmal and persistent AF admitted for antiarrhythmic drug initiation. This difference in intention and usage between settings may potentially affect the results. However, exclusion of the studies conducted in AF centres did not change our finding that setting had significant contribution to the variation in sensitivity and specificity. Hence, handheld ECG devices should be evaluated in the setting in which they are applied in order to assess the overall cost-effectiveness of AF screening.

Specificity was high and less varied. High specificity resulted in lower false positive rate, which could reduce overinvestigation, which is a desirable feature of a screening test. However, the sensitivity may not be high enough to warrant use in mass screening, particularly without evidence of cost-effectiveness. Single-lead handheld ECG may augment the clinicians' ability to detect AF, but evaluation of the yield and cost-effectiveness of such approaches is needed. In view of the large variation in sensitivities that could lead to higher false negatives, clinicians should validate clinically suspected AF cases and take caution to additionally assess high-risk individuals using 12-lead ECG.

The sensitivity of AliveCor varied substantially across the studies over the years (2013–2018). The lower sensitivity in the hospital in Desteghe et al’s study11 was likely due to using a subsequently recalled version of the AliveCor app with an altered diagnostic algorithm.34 This version of the algorithm was recalled by AliveCor as they had attempted to increase the algorithm’s specificity; however, they had inadvertently reduced the sensitivity too much.34 Users should be cautioned that changes made to a diagnostic device can significantly impact the end user without anyone knowing that the accuracy of the device has been altered. However, excluding Desteghe et al’s study11 in our sensitivity analysis did not change our finding (table 2). It is also noteworthy that the earlier version of AliveCor reported ‘possible AF’ or ‘normal’, while the later version reported ‘normal’, ‘unclassified’ and ‘possible AF’. However, subgroup analysis by version of the device could not be performed due to limited studies that reported versions of the diagnostic algorithm. Researchers should report versions of the diagnostic algorithm in all future studies.

A few recent studies reported lower sensitivities in AliveCor; for example, Chan et al reported 71.4% (95% CI 51% to 87%)22 and 66.7% (95% CI 44.7% to 84.4%)23 both in the community, and Desteghe et al 11 reported 54.5% (95% CI 35% to 73%) in cardiology ward and 78.9% (95% CI 56% to 90%) in geriatric ward, but an earlier study conducted by Lowres et al 35 reported a higher sensitivity of 98.5% (95% CI 92% to 100%) in pharmacy. Lowres et al’s study35 was primarily a feasibility study instead of a validation study and it did not apply 12-lead ECG validation in the same screening session (the average time interval between application of AliveCor and 12-lead ECG was 16.6±14.3 days). It is less clear what the reasons for these somewhat substantial variations are; however, it highlights the need for regular audit of devices.

All community studies were interpreted by cardiologists, while most studies in the hospital were interpreted by electrophysiologists. Our meta-regression results implied that the ECG interpreter might contribute to the variations in sensitivity and specificity. However, Welton et al 8 reported that ECG interpreters did not have significant effect on sensitivity and specificity of handheld ECG devices. In addition, the literature36 37 compared electrophysiologists and general cardiologists in interpreting critical and rarer arrhythmias but not in interpreting a common arrhythmia like AF. It is not unreasonable to assume that there is insignificant difference in accuracy of AF interpretation between general cardiologists and electrophysiologists. Hence, ‘ECG interpreter’ was less likely a confounding factor in the variations observed in this review.


The current study is unique in systematically examining using meta-analysis and meta-regression key factors that may moderate variation in sensitivity and specificity of handheld ECG devices. Although all the included studies used either general cardiologists or electrophysiologists to over-read the ECGs (in order to provide a reference standard), we do not consider that this will affect the generalisability of the results. While the size of the literature to enable this analysis was relatively limited and complicated by variation in the type of handheld ECG device used, it is reassuring that the analysis restricted to the most common device used was still consistent with the overall findings. Without individual participant age, we could only perform meta-regression on average participant age at the study level; hence, interpretation of the effect of the average age of study population on sensitivity and specificity was limited due to the wide and overlapping age ranges among the participants. We were also not able to report a more specific analysis on other patient characteristics, such as individual patient comorbidities and the type of AF (paroxysmal, persistent and permanent).


Setting and the type of single-lead handheld ECG device were significant factors influencing variation in sensitivity and specificity. These findings highlight the importance of evaluating sensitivity and specificity in the setting in which the devices are being applied as well as monitoring their validity to optimise AF screening. Our findings encourage the importance of considering factors that may modify sensitivity and specificity when implementing screening or other types of programmes for AF detection using these devices.


We would like to thank the following personnel: Dr Min Jun and Dr Cindy Kok (clinical researchers) for their advice in drafting the initial systematic review protocol, Dr QingTao Meng for resolving discordances between the first and second reviewers, Mr Roderick Dyson (academic librarian at The University of Sydney) for his assistance in establishing the literature search strategies, and Mrs Frances Guinness (librarian at Bathurst and Orange Health Service Libraries, New South Wales) for her assistance in searching and obtaining some articles.



  • Correction notice Since the online publication of this article, Figure 4 has been replaced with a higher quality version.

  • Contributors All authors were involved in the conceptualisation and design of the review. KCW established the search strategies with assistance from an academic librarian. KCW and HK screened the titles, abstracts and full articles as per the protocol. KCW, HK and NL assessed the quality of the included studies. KCW, AvH and SM analysed the data. All authors discussed the findings. KCW drafted the manuscript, and all authors reviewed, discussed, revised and approved the manuscript for publication.

  • Funding There is no funding to perform this systematic review and meta-analysis. However, the following coauthors were supported by their fellowships: HK is supported by a Royal Australasian College of Physicians (RACP) Fellows Research Entry Scholarship. NL is supported by an NSW Health Early Career Fellowship (H16/ 52168). CKC is supported by an NHMRC Career Development Fellowship (App1105447, cofunded by a National Heart Foundation Future Leader Fellowship).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.