Background: Two ways to evaluate the symptoms of heart failure are the New York Heart Association (NYHA) classification and asking patients how far they can walk (walk distance). The NYHA system is commonly used, although it is not clear how individual clinicians apply it.
Aim: To investigate how useful these measures are to assess heart failure and whether other questions might be more helpful.
Methods: 30 cardiologists were asked what questions they used when assessing patients with heart failure. To assess interoperator variability, two cardiologists assessed a series of 50 patients in classes II and III using the NYHA classification. 45 patients who had undergone cardiopulmonary testing were interviewed using a specially formulated questionnaire. They were also asked how far they could walk before being stopped by symptoms, and then tested on their ability to estimate distance.
Results: The survey of cardiologists showed no consistent method for assessing NYHA class and a literature survey showed that 99% of research papers do not reference or describe their methods for assigning NYHA classes. The interoperator variability study showed only 54% concordance between the two cardiologists. 70% of cardiologists asked patients for their walk distance; however, this walk distance correlated poorly with actual exercise capacity measured by cardiopulmonary testing (ρ = 0.04, p = 0.82).
Conclusion: No consistent method of assessing NYHA class is in use and the interoperator study on class II and class III patients gave a result little better than chance. Some potential questions are offered for use in assessment. Walking distance, although frequently asked, does not correlate with formally measured exercise capacity, even after correction for patient perception of distance, and has never been found to have prognostic relevance. Its value is therefore doubtful.
- NYHA, New York Heart Association
- pVO2, peak oxygen consumption
Statistics from Altmetric.com
Despite improvements in pharmacological treatment and prevention, chronic heart failure remains a serious healthcare burden, and carries a poor prognosis. Chronic limitation of exercise aerobic response is a central clinical feature of this syndrome, occurring because of decreased cardiac reserve and altered peripheral responses,1 and is an important determinant of survival. Current measures of disease severity related to exercise tolerance are often heavily reliant on subjective measurements made by both the clinician and the patient. These include use of the New York Heart Association (NYHA) classification to grade the severity of functional limitation and patient estimates of how far they are able to walk before they become breathless.
The NYHA classification (table 1) is commonly used as a method for functional classification in patients with heart failure. It was proposed in 1928 and has been revised several times subsequently, most recently in 1994. Although the 1964 criteria committee of the NYHA described it as “only approximate” and representative of “an expression of [the physician’s] opinion”,2 the NYHA system has been widely used in clinical trials not only as an entrance criterion3–7 but also as an outcome measure.8,9,10,11,12
Which class a clinician decides to assign a patient to will depend on the clinician’s interpretation of what construes “ordinary physical activity” and “slight” and “marked” limitations and it is likely to be this uncertainty that led to the reproducibility value of only 56% between two doctors observed by Goldman et al.13 The value of the NYHA system as a valid outcome measure in clinical trials is therefore questionable. Although some standardised questionnaires are available that can divide patients into different functional classes—for example, the Minnesota Living With Heart Failure Questionnaire—these are unsuitable for routine use as the questions are commercially protected intellectual property (http://www.mlhfq.org/).
How far can you walk?
Patients with heart failure are commonly asked to estimate the distance they are able to walk on a flat surface before becoming breathless. However, it is not known how valid this “self-reported walking distance” is as a measure of exercise capacity and whether it has prognostic relevance. It is also not known how well patients can estimate distance, which is important because it is a potential confounder.
We aimed to investigate the use of the NYHA classification system in current research and clinical practice, its interoperator agreement and alternative questions that may be used. We also aimed to establish the correlation between patients’ self-reported walking distance and their objectively measured exercise capacity. We investigated the ability of the patient population to estimate distances and determined the effect of inaccuracy in distance estimation on the correlation between self-reported distance and formally measured exercise capacity.
This article reports a series of studies on the current use of the NYHA classification system and self-reported walking distance in patients with chronic heart failure.
Review of current use of the NYHA system in research
To assess how the NYHA classification is arrived at in research settings, we evaluated 200 randomly selected papers. To obtain this random selection, we carried out a Medline search using the keyword “NYHA” and limited search results to clinical trials only, published in English. We recorded whether trials used the NYHA system in their inclusion criteria and/or as an outcome measure and whether the paper described or referenced any questions or criteria used to establish the NYHA class of patients enrolled in the trial.
The use of the NYHA system in clinical practice
Thirty senior cardiologists and trainees in cardiology were interviewed regarding their use of the NYHA classification system, their use of specific questions in determining which class a patient belonged to and how they distinguished between patients belonging to class II and class III. The interviews were conducted individually over a period of 5 days and the cardiologists were not allowed to confer with one another.
Interoperator variability in NYHA classification
An interoperator variability study was performed by asking two cardiologists to assess the same patient on the same day. A total of four cardiologists took part in this substudy, with six possible combinations of paired assessors. The patient group selected was a series of 50 patients with chronic heart failure, whose clinical records indicated that they had recently been in class II or class III. Each cardiologist was given time as long as they wished to interview and observe the patient and was then asked to assign the patient to an NYHA class, blinded to their clinical records, and not knowing that this substudy was limited to patients recently classified as class II or class III. Both cardiologists saw each patient on the same day and the interviews were conducted in a random order, with each cardiologist blinded to the diagnosis of the other.
Potential alternatives to NYHA
To assess potential questions that may be of use in distinguishing patients belonging to class II and class III, a questionnaire was constructed which consisted of 23 questions (table 2) formulated by interviewing trainees and senior specialists in cardiology about the questions they use for the NYHA classification and discussion with patients about the effect of the disease on their lives. The questionnaire was administered to patients in both an inpatient and outpatient setting. A total of 45 patients who had undergone cardiopulmonary testing were interviewed, 20 of whom were interviewed the same day as the cardiopulmonary test and 25 who were interviewed a mean of 4 months after their exercise test.
To test the short-term repeatability of the questionnaire, 20 patients were telephoned (n = 11) or interviewed in person (n = 9), using the same set of questions, a mean (SD) of 3.5 (2) weeks after the initial questionnaire. The reproducibility of each question was defined as the percentage of patients whose answers were consistent between interviews.
Inclusion and exclusion criteria
The questionnaire was administered to all patients willing to participate and were attending the heart failure outpatients’ clinic, attending for cardiopulmonary testing or were inpatients at St Mary’s Hospital, London, UK. All volunteers gave informed consent for the procedure, which was approved by the local ethics committee.
Prognostic relevance of self-reported walking distance
To determine whether any studies had looked at the prognostic significance of the self-reported walking distance, a Medline search was carried out using the keywords “self-report*” and “heart failure” and “walk* distance” and “heart failure”.
Self-reported walking distance
Patients were asked how far they could walk on level ground before they became so breathless that they had to rest. They were also asked to estimate the length of a stretch of pavement 50 m long, immediately after having walked along it. Estimates were accepted in any recognised units, and then converted into metres (in practice, all patients used yards or metres).
Cardiopulmonary exercise testing
Cardiopulmonary exercise tests were carried out according to standard clinical protocols. Patients exercised on a treadmill using a smoothed version of the modified Bruce protocol.14 Patients were encouraged by the operators to continue for as long as possible, until symptoms became too great or there was ventricular tachycardia for >5 beats, ST segment depression of >3 mm, systolic blood pressure >200 mm Hg or progressive hypotension.
A t test was used to compare the peak oxygen consumption (pVO2) during cardiopulmonary testing values of patients in different NYHA classes (fig 1). The consistency of answers for the questionnaire was tested by the inclusion of two repeat questions within the question set, asking the same question but phrased slightly differently. The agreement between the two sets of questions was tested using Cohen’s κ test. The daily activities questionnaire was analysed using a t test to determine whether there was a significant difference between the pVO2 of patients answering yes to a question and the pVO2 of those answering no. A t test was used to determine whether age was a confounding variable for any of the significant question pairs.
The self-reported walking distance was tested for normal distribution using the Shapiro–Wilk test and the correlation with pVO2 was tested using the Spearman rank correlation, because the walk distances was not normally distributed. Self-reported walking distance was corrected for inaccuracy in distance estimation by multiplying by 50/x, where x metres was the patient’s estimate of the length of the 50 m stretch of pavement.
Review of current use of the NYHA system in research
A Medline search of 200 clinical trials returned 179 papers with full text accessible to readers in our institution. Of these 179 papers, 99 used an estimate of NYHA class both as an inclusion and an outcome measure, and 80 used the NYHA system in the inclusion criteria only. Of the 99 papers using NYHA both as an inclusion and an outcome measure, only five referenced any source material relating to specific criteria used to determine the NYHA class. Three of these referenced “The Criteria Committee of the New York Heart Association—Nomenclature and Criteria for Diagnosis of the Heart and Great Vessels”, editions 6, 7 and 9, one referenced a cardiology review text and one listed three specific questions that they used to decide on the NYHA class of a patient, focussing on breathlessness during activities of daily living and breathlessness at rest.15 As the criteria committee of the NYHA did not define the abilities of class II or class III patients in any greater detail than “slight limitation” and “marked limitation” of physical activity, this means that, of the papers with full text access, the greatest possible percentage of papers referencing the specific questions/criteria used in determining the NYHA class of patients was 1.1% (table 3).
The use of the NYHA system in clinical practice
There was a 100% response rate in the survey of cardiologists. This showed considerable variety in the different questions and criteria used to determine the NYHA classification of a patient (table 4).
Other activities used to discriminate between classes II and III were shopping, moving around the kitchen, if the patient became breathless while washing themselves, dressing or showering, ability to reach the toilet in the house, breathlessness at night or on lying flat, breathlessness on exercise and breathlessness after walking a few yards. Only three of the cardiologists interviewed admitted to not using the NYHA system on a regular basis and a further four criticised its subjectivity or lack of agreement between operators. Of those who used ability to walk up a flight of stairs as their discriminatory question between class II and class III, 67% would classify a patient who had to stop once up a flight of stairs as class II and 33% would classify this as class III.
Interoperator variability in NYHA classification
For a series of 50 patients, the two cardiologists agreed on the NYHA class for only 54% of patients (table 5). Although the clinical records of the patients selected had indicated them to be in NYHA class II or III, on the day of assessment, two patients were judged to be in class I by an assessor. The cardiologists never disagreed by >1 NYHA class.
Daily activities questionnaire
In all, 45 patients (11 women) with a mean (SD) age of 70 (8) years were recruited from the heart failure clinic at St Mary’s, London, UK. All patients were interviewed using the daily activities questionnaire, and gave a self-reported walking distance (table 6).
Daily activities questionnaire
The questions were ranked according to the percentage of patients answering yes (fig 2) and also by their percentage reproducibility. Application of Cohen’s κ test to the two retest questions gave κ values of 0.65 and 0.77, respectively. Ten questions significantly correlated with the pVO2 value, of which eight had a reproducibility of ⩾90% (fig 3).
Prognostic significance for self-reported walking distance
The literature search returned 347 articles (characteristics of the first 100 papers; table 7). Although self-report of functional status was used in measures of quality of life, no papers investigated the significance of self-reported walking distance in a heart failure population or showed any prognostic value for this measure.
Self-reported walking distance
A total of 34 of 45 patients gave a finite self-reported walking distance and 11 said that their walking distance was not limited by their symptoms. The data were not normally distributed (Shapiro–Wilk coefficient 0.715, p<0.001, fig 4), with the most common responses being becoming breathless after 100 yards (91 m, 22% of respondents) and ability to walk an unlimited distance without becoming breathless (24% of respondents). There was a poor correlation between self-reported exercise distance and pVO2 for patients giving a finite walking distance (ρ = 0.04, p = 0.82, fig 5). A t test performed on patients who reported an unlimited walking distance against those who reported a finite distance showed a significant difference in pVO2 between the two groups (p = 0.029).
Patient estimates of distance
The median estimation of the 50 m distance was 73 m, an overestimate of 46% with a range of estimates from 9 to 274 m and an interquartile range of 55 m. The correlation between self-reported walking distance and pVO2 did not improve significantly when the patient’s self-reported exercise distance was corrected by adjusting for the patient’s ability to estimate an actual distance (ρ = 0.03, p = 0.67).
In this study, we have shown that the NYHA classification system is subjective and poorly reproducible. There is no widespread agreement on how to assign a patient to an NYHA class in clinical practice, with much interoperator variation, and clinical trials rarely reference the criteria used. We have suggested some alternative questions for use in assessment.
We observed that most cardiologists routinely ask patients with heart failure how far they can walk before they become breathless. However, our data suggest that there may be little value in asking patients how far they can walk, apart from being a simple opening gambit for conversation. We also showed that this self-reported walking distance had no predictive value for patients’ actual exercise capacity, even when corrected for patients’ poor perception of distance.
The NYHA functional classification system
The systematic literature sampling showed that although at least 90% of studies accessed used the NYHA class as an inclusion criterion and 50% as an outcome measure, 99% of studies did not reference the methods they used to distinguish between different classes of patients. This would be understandable if the methods of classification were obvious and universally agreed on. However, we found that the criteria for assigning an NYHA class are clearly not standard across operators (if indeed any actual criteria are truly used). It therefore does not come as a surprise that the interoperator study showed only a 54% concordance between cardiologists even when assessing the same patient on the same day. A 50% concordance would be expected merely on the basis of probability, hence this suggests a poor agreement between cardiologists in differentiating between patients belonging to class II and class III. This is the only distinction that requires any formal standardisation, as identification of class I (asymptomatic) and class IV (symptomatic at rest) patients does not require any skill.
Despite this, the NYHA classification system provides a rapid assessment of the functional status during physical exertion. It is exceedingly well established to predict prognosis when used to divide patients dichotomously.16–20
The powerful prognostic ability of the NYHA classification may result from it being the only part of the routine assessment of heart failure that directly pertains to exercise. The other widely used assessment techniques, including examination, ECG and echocardiography, are performed at rest. NYHA class in prognostic studies therefore has the advantage of being the provider of information relating to exercise and, therefore, even though it is usually assessed in an ad hoc fashion, it has the opportunity to shine as a prognostic factor. The fact that the NYHA classification predicts prognosis despite its considerable limitations suggests that functional capacity is, fundamentally, an overwhelmingly important prognostic element.
The poor interobserver agreement and lack of consistency in classification between clinicians is an area of concern as the NYHA system is regularly used as an outcome measure in clinical trials and is even included in guidelines for management of chronic heart failure. For example, in the UK, the National Institute for Health and Clinical Excellence guidelines state that spironolactone and implantable cardiac defibrillators should only be considered for NYHA class III patients and above.21,22 Therefore, it is important to be able to distinguish class II and class III patients and make a reliable, reproducible assessment of functional capacity to standardise treatment decisions between clinicians. As we have shown, the NYHA classification system is currently unsuitable for this purpose. However, simple modifications to this scale, such as recording the specific questions or criteria used to classify a patient, would increase reproducibility while maintaining the strong prognostic relevance of this measure.
As we have shown, specific questions can be validated against more objective measures of functional capacity such as peak oxygen consumption. We suggest that large-scale clinical trials report the questions they use in the NYHA classification, so that they can be validated in a sufficiently large population. We have suggested some questions that correlate with pVO2 and may be used in assessment.
Self-reported walking distance
The survey of cardiologists showed that 70% asked patients for their self-reported walking distance. Self-reported distance is attractive as a measure of exercise capacity, because it is rapidly obtained and does not require any special equipment to measure it. By common sense, it seems to be a reasonable measure of exercise capacity. However, the literature survey failed to show any evidence that it correlated with other validated measures of exercise capacity or any prognostic relevance for this measure.
To attempt to provide criterion validity for the self-reported distance, it was compared with pVO2—an established measure of exercise capacity and a strong prognostic marker. The correlation was not significant, with a correlation coefficient of 0.04 (p = 0.82). Self-reported distance is a subjective measure and many factors influence a patient’s answer, including psychosocial factors and perceptions of distance.
Patients’ ability to estimate a 50 m distance was shown to be poor, with a range of 9–280 m for estimation of the 50 m distance. A similar range of values was shown in a study of patients with peripheral vascular disease.23 However, our study indicates that this poor distance estimation is not the cause of the lack of correlation between self-reported distance and actual exercise capacity, as the correlation did not change drastically after adjustment for each patient’s perception of distance.
Most clinicians ask patients with heart failure how far they can walk. Medical textbooks commonly mention walking distance as an important question to ask in the history, yet this measure has no documented prognostic relevance. Patients are poor at estimating their exercise capacity on many levels. Not only are they poor at estimating distances in general but also, when correcting for this poor distance perception, self-reported walking distances are still completely unrelated to true exercise capacity. In fact, there does not seem to be any value in asking patients how far they can walk.
This study was limited by the relatively small sample size of patients with heart failure (n = 45) taking part in the study. There was also a limited range of NYHA classes represented in the patient sample, with only eight patients in NYHA class I and two in class IV. However, this article largely focuses on the typical patients found in the outpatient environment, and in this context such a distribution of classes is common. Additionally, for some patients, the daily activities questionnaire and self-reported distance were obtained on a day different from the cardiopulmonary test. However, the patients were asked to give their ability on a typical day, and furthermore there was no significant difference in correlation between patients who gave their estimate on the same day as cardiopulmonary testing and those who gave it on a different day.
Another limitation was that the survey of doctors was only conducted within a population of trainees and specialists in cardiology. This population was chosen because they were accessible in large numbers and would be likely to respond. In the event, we found that the response rate was 100%. All these doctors had completed their training in general (internal) medicine and the trainees were completing their cardiology accreditation. Seventeen had completed their research for a doctoral thesis in cardiology. As committed cardiologists, our population of doctors might reasonably be expected to make at least as good an assessment of heart failure as might a generalist group of doctors.
Finally, the specific questions from the daily activities questionnaire were measured against the pVO2 and not against mortality. Mortality data could not have been used in such a limited sample size and length of follow-up; however, pVO2 is known to be closely correlated with mortality.24–26
The results of this study suggest that the NYHA classification system is poorly reproducible. We suggest that research papers using the NYHA classification, either as an inclusion and/or outcome measure, should record the criteria or questions used to ascertain a patient’s functional class. We also suggest that the use of specific questions can markedly improve the reproducibility of this classification system.
Many clinicians ask patients with heart failure how far they can walk. In this study, we found that this self-reported walking distance does not measure exercise capacity or correlate with a known measure of exercise capacity. Even the poor ability of patients to estimate distance does not explain the lack of correlation with objectively measured exercise capacity. Finally, there is no documented evidence of prognostic relevance for this measure. It is therefore doubtful whether this question should be routinely asked.
Published Online First 27 September 2006
Competing interests: None.
Ethical approval: Ethical approval was granted for this study.
Contribution: All authors contributed to the planning and analysis of the study, and interpretation of the results, as well as to the writing of the paper. CR, CB and CM designed and implemented the questionnaire, performed the walk study and the analysis of past studies using NYHA. ZIW and JD designed and implemented the exercise testing and interpretation of the physiological data. RS, JM and DPF designed and managed the overall study and planned the analysis methods. All authors have approved the final manuscript.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.