Table 1

Psychometric tests and criteria*

Psychometric property	Definition/test	Criteria for acceptability
Adapted from Lamping et al*.¹⁶
CABG, coronary artery bypass grafting; CCS, Canadian Cardiovascular Society; CROQ, coronary outcome revascularisation questionnaire; NYHA, New York Heart Association; PTCA, percutaneous transluminal coronary angioplasty; SAQ, Seattle angina questionnaire; SF-36, short form 36.
Acceptability	Quality of data; assessed by completeness of data and score distributions	• Missing data for scales <10%
Acceptability		• Even distribution of endorsement frequencies across response categories; low floor/ceiling effects before revascularisation (percentage scoring lowest/highest scale score)
Reliability
Internal consistency	Extent to which items in a scale measure the same construct (such as homogeneity of the scale); assessed by Cronbach’s α¹⁸ and item-total correlations	• Cronbach’s α for scales >0.70¹⁹
Internal consistency		• Item-total correlations >0.20⁶
Test-retest reliability	Stability of an instrument; assessed by administering it to respondents on two occasions and examining the agreement between test and retest scores	• Intraclass correlation coefficients >0.70²⁰
Tests of scaling assumptions	Evidence that an item belongs in its own scale and not another scale (item convergent and discriminant validity)	• Scaling success/failure (item does/does not correlate significantly higher with own scale than other scales) and probable scaling success/failure (item does/does not correlate more highly, but not significantly, with own scale than other scales)¹⁵
Validity
Content validity	Extent to which content of a scale is representative of the conceptual domain it is intended to cover; assessed qualitatively during questionnaire development through interviews and pretesting with patients, expert opinion, and literature review	• Evidence from interviews and pretesting with patients, expert opinion, and literature review that items are representative of impact of CABG/PTCA
Construct validity (within-scale analyses)	Evidence that each scale measures a single construct and that items can be combined to form summary scores; assessed on the basis of evidence of good internal consistency, factor analysis, and correlations between scale scores	• Internal consistency (Cronbach’s α >0.70)
		• Principal axis factor analysis (factor loadings ⩾30)
		• Moderate intercorrelations between scale scores
Construct validity (analyses against external criteria)
Convergent and discriminant validity	Evidence that scales are correlated with other measures of the same or similar constructs and not correlated with other measures of different constructs; assessed on the basis of correlations between CROQ, SF-36, and SAQ scores	• Magnitude and direction of correlations expected to vary according to the similarity of constructs being measured by each instrument
Known group differences	Evidence that scales differentiate known groups; assessed by comparing CROQ-CABG symptoms scores for patients who differ on disease severity as measured by CCS and NYHA	• CROQ scores should decrease (poorer outcome) with increasing severity of angina (CCS scores) and dyspnoea (NYHA classification) at pre-revascularisation assessment
Responsiveness	Ability of scales to detect clinically important change over time; assessed by comparing change in CROQ scores from before to after revascularisation (t tests and effect sizes)	• CROQ scores should show significant change from before to three months after revascularisation
Responsiveness		• Effect sizes defined as small (0.20), moderate (0.50), or large (0.80 or higher)²¹