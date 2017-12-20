A practical issue is the choice of software. This is a big issue for graphs, but less so for tables. Most standard software packages perform adequately. Nevertheless, detailed knowledge is required to fine-tune and optimise the details.

Clear understanding is about telling the story in the data. This involves organising the data and optimising the use of text.

Names of table elements. The table, from a recent publication in Heart, 12 is redesigned to display the available elements, not for optimal data communication.

Two design principles can be distinguished: clear vision and clear understanding. Clear vision is about maximising the signal to noise ratio in the visualisation. The signal is the data ’ink', that is, all pixels in a graph, all numbers in a table that depict or represent data. The noise (’non-data ink') is all of the supporting elements. In a graph, these include the axes, titles, labels, legends, etc. In a table, these include the supporting extra characters such as parentheses,'±' marks, headings, footers, grids, supporting lines (rules), etc. ( figure 1 ). 12

At the start of the design process, key questions to ask are: Who is the audience? What are my messages (to this audience)? Which of those need visualisation in a table or graph? And what would be the most effective form (for each of the messages)? Tables are best applied in situations where a considerable amount of exact/precise data needs to be reported, and the relationships between the data that need to be brought out are relatively simple.

The series: ‘Graphics and Statistics for Cardiology’ has previously featured articles on comparing categorical and continuous variables, survival analysis, data visualisation for meta-analysis and clinical prediction rules. 5–8 This article focuses on effective table design. Most of the recommendations are based on sound design principles and tradition, informed by the science of human visual perception. There is little empirical evidence to support specific recommendations, so there is room for experimentation and innovation to see what works best. My own experience in data visualisation has been greatly inspired by three sources: Tufte, Cleveland and, specifically for tables, Few. 9–11

It is therefore not surprising that the practice is at best mediocre. A review of articles submitted to BMJ concluded that less than half of the tables and figures met their data presentation potential. 4 Also, external peer reviewers seldom commented on tables or figures.

However, communication has traditionally received short thrift in scientific education, and it is not really stimulated by the parties responsible for dissemination of research: journals, scientific societies, etc. PhD programmes often have some sort of writing tutorage, but there, most time is spent on the body text. Guidance by editors in journals and conferences rarely goes beyond the description of sections, word count and limitation on the number of tables and figures. 2 3 And finally, in my experience handling of tables and graphs has low priority in the production stage, often leading to suboptimal layout (size and placement of tables and figures in the text) and quality issues (especially loss of resolution, i.e. ‘fuzzy’ graphs and letters) that need to be corrected in the proofs.

Effective communication of scientific data is arguably one of the most important skills of a scientist. If the intended audience does not get the message in the data (and acts on it), the research effort is wasted. Data visualisation in tables and graphs can convey complex relationships in a way unmatched by simple text. Good communication implies proper choices between body text, tables and graphs optimised for audience and setting. Wrong choices can lead to misinterpretation and wrong decisions. 1

Clear vision

Two main types of tables can be distinguished: the lookup table and the demonstration table. The former is intended to quickly find data associated with a label (like a telephone list). When required in a scientific report, such tables are usually simple in design, often lengthy and best placed in an online supplementary appendix. The demonstration table, like a graph, is used to bring out relationships in the data. The advantage of a demonstration table above a graph is the possibility to report in great precision, but the price is limited flexibility and interpretability.

Supplementary Material Supplementary Appendix 1 [SP1.pdf]

Creating clear vision in a table involves the following steps: delineating rows and columns, arranging data, formatting text and summarising values where necessary.

Delineating rows and columns Delineation means creating visual cues to guide the eye towards the most important groupings and comparisons. The message determines the kind of grouping, and the extent to which rows, columns or both need to be emphasised. In any case, a minimum of white space around a number is necessary for easy reading. Usually, the amount of white space is OK for the columns as their width is determined by the headings; for the rows, a rule of thumb is that the white space above and below a row should be equal to the height of the numbers in the row. Many journals have their own typesetting conventions that may or may not be good, but it is never wrong to submit your tables in a proper format. Extra emphasis (on rows or columns) can be achieved by a very light background fill; stronger emphasis is achieved by a thin rule (horizontal below a row or vertical to the right of a column). Fills and rules are supporting elements, non-data ink. Thus, they should be as light and thin as possible to avoid obscuring the data. For that reason, grids (figure 1) are best avoided (too much non-data ink).

Arranging data The arrangement of data should facilitate the main comparisons for the message, and follow what the reader expects. Numbers are best compared when they are close together, and a horizontal comparison is slightly easier than a vertical comparison. We read from left to right, so we expect to see source data on the left and calculated results on the right, and time series that run from left to right. In contrast, we expect ranked data to be ordered vertically, usually from highest to lowest. Where the table needs to break (eg, across a page), we expect this to be at a logical point (eg, not in the middle of a category), and we need clear header and footer labels to remain oriented. These principles can be challenging in the face of space limitations: especially the horizontal space on a page or a slide presents a barrier, so a set of categories with long headings (many letters), or a large number of categories can be too wide for the page or slide. Creative tinkering with headings (eg, multiple header rows with progressively indented headings) and column widths can sometimes help. Splitting a table between columns is rarely a good idea, and in such cases, a vertical orientation should be considered. The only exception is when the table can be printed on two adjacent and facing pages, as in an open book. The sequence for the data is also a design issue. Numbers, dates and times are always sorted in ascending or descending order. However, named categories need thought to achieve a sequence that is most meaningful in light of the message. For sure, the ordering is almost never alphabetical, unless the table is a lookup table.

Formatting text Text formatting has several components. First, precision is important: in the body text, tables and figures. Precision should match the purpose (or message): in bookkeeping, cents add up to euros, and rounding can create noticeable errors. In statistical reports, there is a convention to report the mean of a variable with one decimal added to the precision in the source, and the SD with two decimals added. These conventions are followed without question in many clinical study reports produced by industry, with huge impact on readability (see further comments on the format of clinical study reports under ’special cases' below). High precision is mostly irrelevant in the interpretation of scientific data. In most cases, two or three significant numbers are amply sufficient to interpret the message in the data. And the differences or changes that we need to detect (because they are clinically relevant) are usually much larger than those detectable in the source. So, for example, even though a lab can measure alkaline phosphatase in blood with one decimal precision, clinicians are interested in changes of 20 points or more. Likewise, percentages can almost always be reported as integer. Linked to precision is the choice of numeric format that may differ between countries: dates, decimal point or comma and thousands separator. Second, vertical alignment is important. Unfortunately, existing conventions are usually ignored in scientific tables. Numbers are best right aligned, and decimally aligned when there are decimals. Text and dates are best left aligned, and one date format should be chosen so that like numbers match up vertically. Centre alignment is reserved only for special cases, such as a long header label and a single digit or character in the column cells (eg, header ’response' and cell entry ’y' or ’n'). Note that for optimum readability, supporting characters such as the minus sign or parentheses should not be the alignment character, unless this character is present in all cells being aligned. Apart from ignorance, there are several real problems that preclude effortless application of these conventions in scientific tables. The first problem is that scientific tables often contain categories with different kinds of variables, each with their own numeric precision and precision estimator. figure 1 shows a table that reports on results of variables as counts, percentages, ORs and their CIs in one row. Frequently, such variables (with different types of results) are listed below each other in one column (figure 2).13 Figure 2 Example of a recent table from Heart.13 The table suffers from poor readability due to multi-item cell entries and simple left alignment, suboptimal headings, overly high precision. The second problem is that table cells often contain multiple entries. In figure 1, each entry is in its own cell, but we often see a precision estimator (SE, SD, IQR, CI) placed to the right of the point estimate, each with its own set of explanatory symbols: parentheses, hyphens, and the '±' sign (figure 2). This creates clutter that degrades readability and obscures the message. The third problem is that for ranges—containing two numbers—alignment is not straightforward. For all these problems, the solution most frequently applied is to simply left align or centre the data in each column. This creates columns that look the same without improving readability (figure 2). I have been working on the alignment problem for quite a while, and offer several possible solutions, none of them perfect. These comprise: Placing each element in its own column, as the authors of the example table have done (figure 1). This makes alignment easier, and allows deletion of non-data ink, that is, the parentheses as separator, but only works well if the cell elements are the same down the column. Also, the alignment of ranges remains a problem if the type of precision estimator is not constant down the column. Fully disentangling all elements and properly aligning each separately (figure 3). This works but can create large gaps when the precision of the numbers in ranges differs a lot. Placing precision estimators below the point estimate (figure 4). This improves horizontal comparisons by removing the intervening precision numbers but increases the vertical table size. Properly (decimally) aligning the main number/point estimate, inserting white space and then left aligning the precision estimate (figure 5). For long tables, this may be the best in terms of current practice. New design: precision estimators comprising single numbers (eg, SD) remain on the right of the point estimate, but those comprising two numbers (range, CI) are placed on either side, in a smaller font (figure 6). This results in better horizontal comparison, places the point estimate appropriately between the range limits and increases the prominence of the point estimate. Table length is unaltered. Disadvantages include longer design time and readers and editors unaccustomed to this format. Figure 3 Figure 2 data (excerpt); improvement of headers and precision; single item per cell; proper alignment. Decimal points were changed to commas because of software localisation issues: the decimal separator character is a comma in the Dutch localisation of the word processor software. Figure 4 Figure 2 data: precision estimators below point estimate. Better horizontal comparison, but table becomes longer. Figure 5 Figure 2 data: point estimator decimally aligned; precision estimators left aligned. In this case, little change from figure 3, overall slightly less wide. Figure 6 Figure 2 data: novel design. Lower range precision estimator placed on the left of the point estimator. To de-emphasise the precision estimators, they are produced in a smaller font and offset slightly below the point estimators. Point estimate columns are emphasised through extra shading.