This task relates to a sequence of assessments that will be repeated across Chapters 6, 7, 8, 9 and 10(pdf sent after request). Select any example of a visualisation or infographic, maybe your own work or that of others. The task is to undertake a deep, detailed ‘forensic’ like assessment of the design choices made across each of the five layers of the chosen visualisation’s anatomy. In each case your assessment is only concerned with one design layer at a time.
For this task, take a close look at the data representation choices:
- Start by identifying all the charts and their types
- How suitable do you think the chart type choice(s) are to display the data? If they are not, what do you think they should have been?
- Are the marks and, especially, the attributes appropriately assigned and accurately portrayed?
- Go through the set of ‘Influencing factors’ from the latter section of the book’s chapter to help shape your assessment and to possibly inform how you might tackle this design layer differently
- Are there any data values/statistics presented in table/raw form that maybe could have benefited from a more visual representation?
Information Technologydata visualization
Part C Developing Your Design Solution
The Production Cycle
Within the four stages of the design workflow there are two distinct parts.
The first three stages, as presented in Part B of this book, were described
as ‘The Hidden Thinking’ stages, as they are concerned with undertaking
the crucial behind-the-scenes preparatory work. You may have completed
them in terms of working through the book’s contents, but in visualisation
projects they will continue to command your attention, even if that is
reduced to a background concern.
You have now reached the second distinct part of the workflow which
involves developing your design solution. This stage follows a production
cycle, commencing with rationalising design ideas and moving through to
the development of a final solution.
The term cycle is appropriate to describe this stage as there are many loops
of iteration as you evolve rapidly between conceptual, practical and
technical thinking. The inevitability of this iterative cycle is, in large part,
again due to the nature of this pursuit being more about optimisation rather
than an expectation of achieving that elusive notion of perfection. Trade-
offs, compromises, and restrictions are omnipresent as you juggle ambition
and necessary pragmatism.
How you undertake this stage will differ considerably depending on the
nature of your task. The creation of a relatively simple, single chart to be
slotted into a report probably will not require the same rigour of a formal
production cycle that the development of a vast interactive visualisation to
be used by the public would demand. This is merely an outline of the most
you will need to do – you should edit, adapt and participate the steps to fit
with your context.
There are several discrete steps involved in this production cycle:
Conceiving ideas across the five layers of visualisation design.
Wireframing and storyboarding designs.
Developing prototypes or mock-up versions.
219
Testing.
Refining and completing.
Launching the solution.
Naturally, the specific approach for developing your design solution (from
prototyping through to launching) will vary hugely, depending particularly
on your skills and resources: it might be an Excel chart, or a Tableau
dashboard, an infographic created using Adobe Illustrator, or a web-based
interactive built with the D3.js library. As I have explained in the book’s
introduction, I’m not going to attempt to cover the myriad ways of
implementing a solution; that would be impossible to achieve as each task
and tool would require different instructions.
For the scope of this book, I am focusing on taking you through the first
two steps of this cycle – conceiving ideas and wireframing/storyboarding.
There are parallels here with the distinctions between architecture (design)
and engineering (execution) – I’m effectively chaperoning you through to
the conclusion of your design thinking.
To fulfil this, Part C presents a detailed breakdown of the many design
options you will face when conceiving your visualisation design and
provides you with an appreciation of the key factors that will influence the
actual choices you make. The next few chapters are therefore concerned
with the design thinking involved with each of these five layers of the
visualisation design anatomy, namely:
Chapter 6: Data representation
Chapter 7: Interactivity
Chapter 8: Annotation
Chapter 9: Colour
Chapter 10: Composition
The sequencing of these layers is deliberate, based on the need to prioritise
your attention: what will be included and how will it appear. Initially, you
will need to make decisions about what choices to make around data
representation (charts), interactivity and annotation. These are the layers
that result in visible design content or features being included in your
work. You will then complete your design thinking by making decisions
about the appearance of these visible components, considering their colour
and composition.
220
Conceiving: This will cover all your initial thinking across the various
layers of design covered in the next few chapters. The focus here is on
conceiving ideas based on the design options that seem to fit best with the
preparatory thinking that has gone before during the first three stages. As
you fine-tune your emerging design choices the benefit of sketching re-
emerges, helping you articulate your thoughts into a rough visual form. As
mentioned in Chapter 3, for some people the best approach involves
sketching with the pen, for others it is best expressed through the medium
of technical fluency. Whichever approach suits you best, it is helpful to
start to translate your conceptual thinking into visual thinking, particularly
when collaborating. This sketching might build on your instinctive
sketched concepts from stage 1, but you should now be far better informed
about the realities of your challenge to determine what is relevant and
feasible.
‘I tend to keep referring back to the original brief (even if it’s a brief I’ve
made myself) to keep checking that the concepts I’m creating tick all the
right boxes. Or sometimes I get excited about an idea but if I talk about it
to friends and it’s hard to describe effectively then I know that the concept
isn’t clear enough. Sometimes just sleeping on it is all it takes to separate
the good from the bad! Having an established workflow is important to
me, as it helps me cover all the bases of a project, and feel confident that
my concept has a sound logic.’ Stefanie Posavec, Information Designer
Wireframing and storyboarding: Wireframing involves creating a low-
fidelity illustration of the potential layout for those solutions that will
generally occupy a single page of space, such as a simple interactive
visualisation or an infographic. There is no need to be too precise just yet,
you are simply mapping out what will be on your page/screen (charts,
annotations), how they will be arranged and what things (interactive
functions) it will do. If your project is going to require a deeper
architecture, like a complex interactive, or will comprise sequenced views,
like presentations, reports or animated graphics, each individual wireframe
view will be weaved together using a technique called storyboarding. This
maps out the relationships between all the views of your content to form
an overall visual structure. Sometimes you might approach things the other
way round, beginning with a high-level storyboard to provide a skeleton
structure within which you can then form your more detailed thinking
about the specific wireframe layouts within each page or view.
221
Prototypes/mock-ups: Whereas wireframing and storyboarding are
characterised by the creation of low-fi ‘blueprints’, the development of
mock-ups (for example, Figure C.1) or prototypes (the terms tend to be
used interchangeably) involves advancing your decisions about the content
and appearance of your proposed solution. This effectively leads to the
development of a first working version that offers a reasonably close
representation of what the finished product might look like.
Figure C.1 Mockup designs for ‘Poppy Field’
Testing: Once you have an established prototype version, you must then
seek to have it tested. Firstly, you do this ‘internally’ (i.e. by you or by
collaborators/colleagues) to help iron out any obvious immediate
problems. In software development parlance, this would be generally
consistent with alpha testing. Naturally, beta follows alpha and this is
where you will seek others to test it, evaluate it, and feedback on it. This
happens regardless of the output format; it doesn’t need to be a digital,
interactive project to merit being tested. There will naturally be many
different aspects to your proposed solution that will need checking and
evaluating. The three principles of good visualisation design that I
presented earlier offer a sensible high-level structure to guide this testing:
Trustworthy design testing concerns assessing the reliability of the
work, in terms of the integrity of its content and performance. Are
there any inaccuracies, mistakes or even deceptions? Are there any
design choices that could lead to misunderstandings? Any aspects in
how the data has been calculated or counted that could undermine
222
trust? If it is a digital solution, what is the speed of loading and are
there any technical bugs or errors? Is it suitably responsive and
adaptable in its use across different platforms? Try out various user
scenarios: multiple and concurrent users, real-time data, all data vs
sample data, etc. Ask the people testing your solution to try to break it
so you can find and resolve any problems now.
Accessible design testing relates to how intuitive or sufficiently well
explained the work is. Do they understand how to read it and what all
the encodings mean? Is the viewer provided with a sufficient level of
assistance that would be required as per the characteristics of the
intended audience? Can testers find the answers to the questions you
intended them to find and quickly enough? Can they find answers to
the questions they think are most relevant?
Elegant design testing relates to questions such as: Is the solution
suitably appealing in design? Are there any features which are
redundant or superfluous design choices that are impeding the process
of using the solution?
Who you invite to test your work will vary considerably from one project
to the next but generally you will have different possible people to
consider participating in this task:
Stakeholders: the ultimate customers/clients/colleagues who have
commissioned the work may need to be included in this stage, if not
for full testing then at least to engage them in receiving initial concept
feedback.
Recipients: you might choose a small sample of your target audience
and invite those viewers to take part in initial beta testing.
Critical friends: peers/team/colleagues with suitable knowledge and
appreciation about the design process may offer a more sophisticated
capacity to test out your work.
You: sometimes (often) it may ultimately be down to you to undertake
the testing, through either lack of access to other people or most
typically a simple lack of time. To accomplish this effectively you
have to find a way almost to detach yourself from the mindset of the
creator and occupy that of the viewer: you need to see the wood and
the trees.
The timing of when to seek feedback through testing/evaluation will vary
across different contexts again. Sometimes the pressure from stakeholders
223
who request to see progress will determine this. Otherwise, you will need
to judge carefully the right moment to do so. You don’t want to get
feedback when it is too late to change or you have invested too much effort
creating a prototype that might require widespread changes in approach.
Likewise, it can be risky showing far-too-undercooked concepts to
stakeholders or testers when they might not have the capacity to realise
this is just an early indication of the direction of travel. The least valuable
form of testing feedback is when pedantic stakeholders spend time
pointing out minutiae that of course need correcting but have no
significance at this stage. No-one comes away with anything of value from
this kind of situation.
‘We can kid ourselves that we are successful in what we “want” to
achieve, but ultimately an external and critical audience is essential.
Feedback comes in many forms; I seek it, listen to it, sniff it, touch it, taste
it and respond.’ Kate McLean, Smellscape Mapper and Senior
Lecturer Graphic Design
Refining and completing: Based on the outcome of your testing process,
this will likely trigger a need to revisit some of the issues that have
emerged and resolve them satisfactorily. Editing your work involves:
correcting issues;
stripping away the superfluous content;
checking and enhancing preserved content;
adding extra degrees of sophistication to every layer of your design;
improving the consistency and cohesion of your choices;
double-checking the accuracy of every component.
As your work heads towards a state of completion your mindset will need
to shift from a micro-level checking back to a macro-level assessment of
whether you have truly delivered against the contextual requirements and
purpose of your project.
In any creative process a visualiser is faced with having to declare work as
being complete. Judging this can be quite a tough call to make in many
projects. As I have discussed plenty of times, your sense of ‘finished’ often
needs to be based on when you have reached the status of good enough.
While the presence of a looming deadline (and at times increasingly
agitated stakeholders) will sharpen the focus, often it comes down to a
fingertip sense of when you feel you are entering the period of diminishing
224
returns, when the refinements you make no longer add sufficient value for
the amount of effort you invest in making them.
‘You know you’ve achieved perfection in design, not when you have
nothing more to add, but when you have nothing more to take away.’
Antoine de Saint-Exupéry, Writer, Poet, Aristocrat, Journalist, and
Pioneering Aviator
‘Admit that nothing you create on a deadline will be perfect. However, it
should never be wrong. I try to work by a motto my editor likes to say: No
Heroics. Your code may not be beautiful, but if it works, it’s good enough.
A visualisation may not have every feature you could possibly want, but if
it gets the message across and is useful to people, it’s good enough. Being
“good enough” is not an insult in journalism – it’s a necessity.’ Lena
Groeger, Science Journalist, Designer and Developer at ProPublica
‘It was intimidating to release to the public a self-initiated project on such
a delicate subject considering some limitation with content and data
source. But I came to appreciate that it’s OK to offer a relevant way of
looking at the subject, rather than provide a beginning-to-end conclusion.’
Valentina D’efilippo, Information Designer, discussing her ‘Poppy
Field’ project that looked at the history of world conflicts and the
resulting loss of life
Launching: The nature of launching work will again vary significantly
based, as always, on the context of your challenge. It may simply be
emailing a chart to a colleague or you might be presenting your work to an
audience. For other cases it could be a graphic going to print for a
newspaper or involve an anxious go-live moment with the launch of a
digital project on a website, to much fanfare and public anticipation.
Whatever the context of your ‘launch’ stage, there are a few characteristic
matters to bear in mind – these will not be relevant to all situations but
over time you might need to consider their implications for your setting:
Are you ready? Regardless of the scope of your work, as soon as you
declare work completed and published you are at the mercy of your
decisions. You are no longer in control of how people will interpret
your work and in what way they will truly use it. If you have
particularly large, diverse and potentially emotive subject matter, you
will need to be ready for the questions and scrutiny that might head in
your direction.
225
Communicating your work is a big deal. The need to publicise and
sell its benefits is of particular relevance if you have a public-facing
project (you might promote it strongly or leave it as a slow burner
that spreads through ‘word of mouth’). For more modest and personal
audiences you might need to consider directly presenting your work
to these groups, coaching them through what it offers. This is
particularly necessary on those occasions when you may be using a
less than familiar representation approach.
What ongoing commitment exists to support the work? This clearly
refers to specific digital projects. Do you have to maintain a live data
feed? Will it need to sustain operations with variable concurrent
visitors? What happens if it goes viral – have you got the necessary
infrastructure? Have you got the luxury of ongoing access to the skill
sets required to keep this project alive and thriving?
Will you need to revise, update and rerelease the project? As I
discussed in the contextual circumstances, will you need to replicate
this work on a repeated basis? What can you do to make the
reproduction as seamless as possible?
What is the work’s likely shelf life? Does it have a point of expiry
after which it could be archived or even killed? How might you
digitally preserve it beyond its useful lifespan?
226
6 Data Representation
In this chapter you will explore in detail the first, and arguably the most
significant, layer of the visualisation design anatomy: data representation.
This is concerned with deciding in what visual form you wish to show
your data.
To really get under the skin of data representation, we are going to look at
it from both theoretical and pragmatic perspectives. You will start by
learning about the building blocks of visual encoding, the real essence of
this discipline and something that underpins all data representation
thinking. Whereas visual encoding is perhaps seen as the purist ‘bottom-
up’ viewpoint, the ‘top-down’ perspective possibly offers more pragmatic
value by framing your data representation thinking around the notion of
chart types. For most people facing up to this stage of data representation,
this is conceptually the more practical entry point from which to shape
their decisions.
To substantiate your understanding of this design layer you will take a tour
through a gallery of 49 different chart type options, reflecting the many
common and useful techniques being used to portray data visually in the
field today. This gallery will then by supplemented by an overview of the
key influencing factors that will inform and determine the choices you
make.
6.1 Introducing Visual Encoding
As introduced in the opening chapter, data representation is the act of
giving visual form to your data. As viewers, when we are perceiving a
visual display of data we are decoding the various shapes, sizes, positions
and colours to form an understanding of the quantitative and categorical
values represented. As visualisers, we are doing the reverse through visual
encoding, assigning visual properties to data values. Visual encoding
forms the basis of any chart or map-based data representation, along with
the components of chart apparatus that help complete the chart display.
There are many different ways of encoding data but these always comprise
227
combinations of two different properties, namely marks and attributes.
Marks are visible features like dots, lines and areas. An individual mark
can represent a record or instance of data (e.g. your phone bill for a given
month). A mark can also represent an aggregation of records or instances
(e.g. a summation of individual phone charges to produce the bill for a
given month). A set of marks would therefore represent a set of records or
instances (e.g. the 12 monthly phone bills for 2015).
Attributes are variations applied to the appearance of marks, such as the
size, position, or colour. They are used to represent the values held by
different quantitative or categorical variables against each record or
instance (or, indeed, each aggregation). If you had 12 marks, one for each
phone bill during 2015, you could use the size attribute of each mark to
represent the various phone bill totals.
Figure 6.1 offers a more visual illustration. In the dataset there are six
records, one for each record listed. ‘Gender’ is a categorical variable and
‘Years Since First Movie’ is a quantitative variable. ‘Male’ and ‘43’ are
the specific values of these variables associated with Harrison Ford. In the
associated chart, each actor from the table is represented by the mark of a
line (or bar). This represents their record or instance in the table. Harrison
Ford’s bar is proportionally sized in scale to represent the 43 years since
his first movie and is coloured purple to distinguish his gender as ‘Male’.
Each of the five other actors similarly has a bar sized according to the
years since their first movie and coloured according to their gender.
Figure 6.1 Illustration of Visual Encoding
The objective of visual encoding is to find the right blend of marks and
attributes that most effectively will portray the angle of analysis you wish
to show your viewers. The factors that shape your choice and define the
notion of what is considered ‘effective’ are multiple and varied in their
influence. Before getting on to there, let’s take a closer look at the range of
228
different marks and attributes that are commonly found in the data
representation toolkit.
It is worth noting upfront that while the organisation of the ‘attributes’, in
particular, suggests a primary role, several can be deployed to encode both
categorical (nominal, ordinal) variables and quantitative variables.
Furthermore, as you see in the bar chart in Figure 6.1, combinations of
several attributes are often applied to marks (such as colour and size) to
encode multiple values.
Although beyond the scope of this book, there are techniques being
developed in the field exploring the use of non-visual senses to portray
data, using variations in properties for auditory (sound), haptic (touch),
gustatory (taste) and olfactory (smell) senses.
Figure 6.2 List of Mark Encodings
Figure 6.3 List of Attribute Encodings
229
230
Grasping the basics of visual encoding and its role in data visualisation is
one of the fundamental pillars of understanding this discipline. However,
when it comes to the reality of considering your data representation
options you do not necessarily need to always approach things from this
somewhat bottom-up perspective. For most people’s needs when creating a
data visualisation it is more pragmatic (and perhaps more comprehensible)
to think about data representation from a top-down perspective in the
shape of chart types.
If marks and attributes are the ingredients, a chart ‘type’ is the recipe
offering a predefined template for displaying data. Different chart types
offer different ways of representing data, each one comprising unique
combinations of marks and attributes onto which specific types of data can
be mapped.
Recall that I am using chart type as the all-encompassing term, though
this is merely a convenient singular label to cover any variation of map,
graph, plot and diagram based around the representation of data.
Let’s work through a few examples to illustrate the relationship between
some selected chart types demonstrating different combinations of marks
and attributes.
To begin with Figure 6.4, visualises the recent fortunes of the world’s
billionaires. The display shows the relative ranking of each profiled
billionaire in the rich list, grouping them by the different sectors of
industry in which they have developed their wealth. This data is encoded
using the point mark and two attributes of position. The point in this
deployment is depicted using small caricature face drawings representative
of each individual – effectively unique symbols to represent the distinct
‘category’ of each different billionaire. Note that these are points, as
distinct from area marks, because their size is constant and insignificant in
terms of any quantitative implication. The position in the allocated column
signifies the industry the individuals are associated with, while the vertical
position signifies the rank (higher position = higher rank towards number
1).
For reference, this is considered a derivative of the univariate scatter
231
plot, which usually shows the dispersal of a range of absolute values
rather than rank.
Figure 6.4 Bloomberg Billionaires
As seen in Chapter 1, the clustered bar chart in Figure 6.5 displays a series
of line marks (normally described as bars). There are 11 pairs of bars, one
for each of the football seasons included in the aggregated analysis. The
attribute of colour is used to distinguish the bars between the two
quantitative measures displayed: blue is for ‘games’, purple is for ‘goals’.
The size dimension of ‘height’ (the widths are constant) along the y-axis
scale then represents the quantitative values associated with each season
and each measure.
Figure 6.6 is called a bubble chart and displays a series of geometric area
marks to represent the top 100 blog posts on my website based on their
popularity over the previous 100 days. Each circle represents an individual
post and is sized to show the quantitative value of ‘total visits’ and then
coloured according to the seven different post categories I use to organise
my content.
Figure 6.5 Lionel Messi: Games and Goals for FC Barcelona
232
Figure 6.6 Image from the home page of visualisingdata.com
233
Figure 6.7 How the Insane Amount of Rain in Texas Could Turn Rhode
Island Into a Lake
234
Figure 6.7 demonstrates the use of the form, which is more rarely used. My
advice is that it should remain that way as it is hard for us to judge scales
of volume in 2D displays. However, it can be of merit when values are
extremely diverse in size as in this good example. The chart displayed
contextualises the amount of water that had flowed into Texas reservoirs in
the 30 days up to 27 May 2015. The size (volume) of a cube is used to
display the amount of rain, with 8000 small cubes representing 1000 acre-
feet of water (43,560,000 cubic feet or 1233.5 mega litres) to create the
whole (8 million acre-feet), which is then compared against the heights of
235
the Statue of Liberty and what was then the world’s tallest building, the
Burj Khalifa, to orient in height terms at least.
6.2 Chart Types
For many people, creating a visualisation involves using tools that offer
chart menus: you might select a chart type and then ‘map’ the records and
variables of data against the marks and attributes offered by that particular
chart type. Different tools will offer the opportunity to work with a
different range of chart types, some with more than others.
As you develop your capabilities in data visualisation and become more
‘expressive’ – trying out unique combinations of marks and attributes –
your approach might lean more towards thinking about representation
from a bottom-up perspective, considering the visual encodings you wish
to deploy and arriving at a particular chart type as the destination rather
than an origin. This will be especially likely if you develop or possess a
talent for creating visualisations through programming languages.
As the field has matured over the years, and a greater number of
practitioners have been experimenting with different recipes of marks and
attributes, there is now a broad range of established chart types. Once
again I hesitate to use the universal label of chart type (some mapping
techniques are not chart types per se) but it will suffice. While all of us are
likely to be familiar with the ‘classic three’ – namely, the bar, pie and line
chart – there are many other chart type options to consider.
To acquaint you with a broader repertoire of charting options, over the
coming pages I present you with a gallery. This offers a curated collection
of some of the common and useful chart types being used across the field
today. This gallery aims to provide you with a valuable reference that will
directly assist your judgements, helping you to pick (conceptually, at least)
from a menu of options.
I have attempted to assign each chart to one of five main families based on
their primary analytical purpose. What type of angle of analysis does each
one principally show? Using the five-letter mnemonic CHRTS this should
provide a useful taxonomy for organising your thinking about which chart
or charts to use for your data representation needs.
236
I know what you’re thinking: ‘well that’s a suspiciously convenient
acronym’! Honestly, if it was as intentional as that I would have tried
harder to somehow crowbar in an ‘A’ family. OK, I did spend a lot of
time, but I couldn’t find it and it’s now my life’s ambition to do so.
Only then will my time on this planet have been truly worthwhile. In the
meantime, CHRTS is close enough. Besides, vowels are hugely
overrated.
Each chart type presented is accompanied by an array of supporting details
that will help you fully acquaint yourself with the role and characteristics
of each option.
A few further comments about what this gallery provides:
The primary name used to label each chart type as well as some
further alternative names that are often used
An indication of which CHRTS family each chart belongs to, based
on their specific primary role, as well as a sub-family definition for
further classification
An indicator for each chart type to show which ones I consider to be
most useful for undertaking Exploratory Data Analysis (the black
magnifying glass symbol)
An indicator for whether I believe a chart would typically require
interactive features to offer optimum usability (the black cursor
symbol)
A description of the chart’s representation: what it shows and what
encodings (marks, attributes) it is comprised of
A working example of the chart type in use with a description of what
it specifically shows
A ‘how to read’ guide, advising on the most effective and efficient
approach to making sense of each chart type and what features to look
out for
Presentation tips offering guidance on some of the specific choices to
be considered around interactivity, annotation, colour or composition
237
design
‘Variations and alternatives’ offer further derivatives and chart
‘siblings’ to consider for different purposes
Exclusions: It is by no means an exhaustive list: the vast
permutations of different marks and attributes prevents any finite
limit to how one might portray data visually. I have, however,
consciously excluded some chart types from the gallery mainly
because they were not different enough from other charts that have
been profiled in detail. I have mentioned charts that represent
legitimate derivatives of other charts where necessary but simply did
not deem it worthy to assign a whole page to profile them separately.
The Voronoi treemap, for example, is really just a circular treemap
that uses different algorithms to arrange its constituent pieces. While
the construction task is different, its usage is not. The waterfall chart
is a single stacked bar chart broken down into sequenced stages.
Inclusions: I have wrestled with the rights and wrongs of including
some chart types, unquestionably. The radar chart, for example, has
many limitations and flaws but is not entirely without merit if
deployed in a very specific way and only for certain contexts. By
including profiles of partially flawed charts like these I am using the
gallery as much to signpost their shortcomings so that you know to
use them sparingly. There will be some purists gathering in angry
mobs and foaming at the mouth in reaction to the audacity of my
including the pie chart and word cloud. These have limited roles,
absolutely, but a role nonetheless. Put down your pitchforks, return to
your homes and have a good read of my caveats. Rather than being
the poacher of all bad stuff, I think a gamekeeper role is equally
important.
Although I have excluded several charts on grounds of demonstrating
only a slight variation on profiled charts, there are some types
included that do exhibit only small derivations from other charts
(such as the bar chart and the clustered bar, or the scatter plot and the
bubble plot). In these cases I felt there was sufficient difference in
their practical application, and they were in common usage, to merit
their separate inclusion, despite sharing many similarities with other
profiled siblings.
‘Interestingly, visualisations of textual data are not as developed as one
238
would expect. There is a great need for such visualisations given the
amount of textual information we generate daily, from social media to
news media and so on, not to mention all the materials generated in the
past and that are now digitally available. There are opportunities to
contribute to the research efforts of humanists as well as social scientists
by devising ways to represent not only frequencies of words and topics,
but also semantic content. However, this is not at all trivial.’ Isabel
Meirelles, Professor, OCAD University (Toronto), discussing one of
the many remaining unknowns in visualisation
Categorical comparisons: All chart types can feasibly facilitate
comparisons between categories, so why have a separate C family?
Well, the distinction is that those charts belonging to the H, R, T and
S families offer an additional dimension of analysis as well as
providing comparison between categories.
Dual families: Some charts do not fit just into a single family.
Showing connected relationships (e.g. routes or flows) on a map is
ticking the requirements across at least two or family groups
(Relational, Spatial). In each case I have tried to best-fit the family
classifications around the primary angle of analysis portrayed by each
chart – what is the most prominent aspect that characterises each
representation technique.
Text visualisation: As I noted in the discussion about data types,
when it comes to working with textual-based data you are almost
always going to need to perform some transformation, maybe through
value extraction or by applying a statistical technique. The text itself
can otherwise largely function only as an annotated device. Chart
types used to visualise text actually visualise the properties of text.
For example, the word cloud visualises the quantitative frequency of
the use of words: text might be the subject, but categories (words) and
their quantities (counts) are the data mappings. Varieties of network
diagrams might show the relationship between word usage, such as
the sequence of words used in sentences (word trees), but these are
still only made possible through some quantitative, categorical or
semantic property being drawn from the original text.
Dashboard: These methods are popular in corporate settings or any
context where you wish to create instrumentation that offers both at-
a-glance and detailed views of many different analytical and
information monitoring dimensions. Dashboards are not a unique
chart type themselves but rather should be considered projects that
comprise multiple chart types from across the repertoire of options
239
presented in the gallery. Some of the primary demands of designing
dashboards concern editorial thinking (what angles to show and why)
and composition choices (how to get it all presented in a unified page
layout).
Small multiples: This is an invaluable technique for visualising data
but not necessarily a chart type per se and, once again, more a
concern for about editorial thinking and composition design. Small
multiples involve repeated display of the same chart type but with
adjustments to the framing of the data in each panel. For example,
each panel may show the same angle of analysis but for different
categories or different points in time. Small multiples are highly
valued because they exploit the capabilities of our visual perception
system when it comes to comparing charts in a simultaneous view,
overcoming our weakness at remembering and recalling chart views
when consumed through animated sequences or across different
pages.
A note about ‘storytelling’: Storytelling is an increasingly popular
term used around data visualisation but I feel it is often misused and
misunderstood, which is quite understandable as we all have different
perspectives. I also feel it is worth clarifying my take on what I
believe storytelling means practically in data visualisation and
especially in this discussion about data representation, which is where
it perhaps most logically resides in terms of how it is used.
Stories are constructs based on the essence of movement, change or
narrative. A line chart shows how a series of values have changed
over a temporal plane. A flow map can reveal what relationships exist
across a spatial plane between two points separated by distance – they
may be evident of a journey. However, aside from the temporal and
spatial families of charts, I would argue that no other chart family
realistically offers this type of construct in and of itself.
The only way to create a story from other types of charts is to
incorporate a temporal dimension (video/slideshow) or provide a
verbal/written narrative that itself involves a dimension of time
through the sequence of its delivery.
For example, a bar chart alone does not represent a story, but if you
show a ‘before’ and ‘after’ pair of bar charts side by side or between
slides, you have essentially created ‘change’ through sequence. If you
show a bar chart with a stack on top of it to indicate growth between
two points in time, well, you have added a time dimension. A
network diagram shows relationships, but stood alone this is not a
240
Sabin Bajracharya
story – its underlying structure and arrangement are in abstract space.
Just as you do when showing friends a photograph from your holiday,
you might use this chart as a prop to explain how relationships
between some of the different entities presented are significant.
Making the chart a prop allows you to provide a narrative. In this case
it is the setting and delivery that are consistent with the notion of
storytelling, not the chart itself. I made a similar observation about
the role of exhibitory visualisations used as props within explanatory
settings.
A further distinction to make is between stories as being presented
and stories as being interpreted. The famous six-word story ‘for sale:
baby shoes, never worn’ by Ernest Hemingway is not presented as a
story, the story is triggered in our mind when we dissect this passage
and start to infer meaning, implication and context. The imagined bar
chart I mentioned earlier in the book that could show the 43 white
presidents and 1 black president is only presenting a story if it is
accompanied by an explanatory narrative (in which case the chart was
again really just a prop) or if you understand the meaning of the
significance of this statistic without this description and are able to
form the story in your own mind.
Charts Comparisons
Bar chart
ALSO KNOWN AS Column chart, histogram (wrongly)
REPRESENTATION DESCRIPTION
A bar chart displays quantitative values for different categories. The
chart comprises line marks (bars) – not rectangular areas – with the size
attribute (length or height) used to represent the quantitative value for
each category.
241
EXAMPLE Comparing the number of Oscar nominations for the 10
actors who have received the most nominations without actually
winning an award.
Figure 6.8 The 10 Actors with the Most Oscar Nominations but No
Wins
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with which categorical value each bar is
associated and what the range of the quantitative values is (min to max).
Think about what high and low values mean: is it ‘good’ to be large or
small? Glance across the entire chart to locate the big, small and
medium bars and perform global comparisons to establish the high-
level ranking of biggest > smallest. Identify any noticeable exceptions
and/or outliers. Perform local comparisons between neighbouring bars,
to identify larger than and smaller than relationships and estimate the
relative proportions. Estimate (or read, if labels are present) the absolute
values of specific bars of interest. Where available, compare the
quantities against annotated references such as targets, forecast, last
year, average, etc.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines,
in particular, can be helpful to increase the accuracy of the reading of
the quantitative values. If you have axis labels you should not need
direct labels on each bar – this will lead to label overload, so generally
decide between one or the other.
242
COMPOSITION: The quantitative value axis should always start from
the origin value of zero: a bar should be representative of the true, full
quantitative value, nothing more, nothing less, otherwise the perception
of bar sizes will be distorted when comparing relative sizes. There is no
significant difference in perception between vertical or horizontal bars
though horizontal layouts tend to make it easier to accommodate and
read the category labels for each bar. Unlike the histogram, there should
be a gap, even if very small, between bars to keep each category’s value
distinct. Where possible, try to make the categorical sorting meaningful.
VARIATIONS & ALTERNATIVES
A variation in the use of bar charts is to show changes over time. You
would use a bar chart when the focus is on individual quantitative
values over time rather than (necessarily) the trend/change between
points, for which a line-chart would be best. ‘Spark bars’ are mini bar
charts that aim to occupy only a word’s length amount of space. They
are often seen in dashboards where space is at a premium and there is a
desire to optimise the density of the display. To show further
categorical subdivisions, you might consider the ‘clustered bar chart’ or
a ‘stacked bar chart’ if there is a part-to-whole angle. ‘Dot plots’ offer a
particularly useful alternative to the bar chart for situations where you
have to show large quantitative values with a narrow range of
differences.
Charts Comparisons
Clustered bar chart
ALSO KNOWN AS Clustered column chart, paired bar chart
REPRESENTATION DESCRIPTION
A clustered bar chart displays quantitative values for different major
categories with additional categorical dimensions included for further
243
breakdown. The chart comprises line marks (bars) – not rectangular
areas – with the size attribute (length or height) used to represent the
quantitative value for each category and colours used to distinguish
further categorical dimensions.
EXAMPLE Comparing the number of Oscar nominations with the
number of Oscar awards for the 10 actors who have received the most
nominations.
Figure 6.9 The 10 Actors who have Received the Most Oscar
Nominations
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with which categorical value each bar is
associated and what the range of the quantitative values is (min to max).
Learn about the colour associations to understand what sub-categories
the bars within each cluster represent. Glance across the entire chart to
locate the big, small and medium bars and perform global comparisons
to establish the high-level ranking of biggest > smallest. Identify any
noticeable exceptions and/or outliers. Perform local comparisons within
clusters to identify the size relationship (which is larger and by how
much?) and estimate (or read, if labels are present) the absolute values
of specific bars of interest.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines,
244
in particular, can be helpful to increase the accuracy of the reading of
the quantitative values. If you have axis labels you should not need
direct labels on each bar – this will lead to label overload, so generally
decide between one or the other.
COMPOSITION: The quantitative value axis should always start from
the origin value of zero: a bar should be representative of the true, full
quantitative value, nothing more, nothing less, otherwise the perception
of bar sizes will be distorted when comparing relative sizes. If your
categorical clusters involve a breakdown of more than three bars, it
becomes a little too busy, so you might therefore consider giving each
cluster its own separate bar chart and using small multiples to show a
chart for each major category. Sometimes one bar might be slightly
hidden behind the other, implying a before and after relationship, often
when space is at a premium – just do not hide too much of the back bar.
There is no significant difference in perception between vertical or
horizontal bars though horizontal layouts tend to make it easier to
accommodate and read the category labels for each bar. The individual
bars should be positioned adjacent to each other with a noticeable gap
and then between each cluster to help direct the eye towards the
clustering patterns first and foremost. Where possible try to make the
categorical sorting meaningful.
VARIATIONS & ALTERNATIVES
Clustered bar charts are also sometimes used to show how two
associated sub-categories have changed over time (like the Lionel
Messi bar chart discussed in Chapter 1). Alternatives would include the
‘dot plot’ or, if you have just two categories forming the clusters and
these categories have a binary state (male, female or yes %, no %), the
‘back-to-back bar chart’ would be effective.
Charts Comparisons
Dot plot
245
ALSO KNOWN AS Dot chart
REPRESENTATION DESCRIPTION
A dot plot displays quantitative values for different categories. In
contrast to the bar chart, rather than using the size of a bar, point marks
(typically circles but any ‘symbol’ is legitimate) are used with the
position along a scale indicating the quantitative value for each
category. Sometimes an area mark is used to indicate one value through
position and another value through size. Additional categorical
dimensions can be accommodated in the same chart by including
additional marks differentiated by colour or symbol.
EXAMPLE Comparing the number and percentage of PhDs awarded
by gender across different academic subjects.
Figure 6.10 How Nations Fare in PhDs by Sex
HOW TO READ IT & WHAT TO LOOK FOR
For single-series dot plots (i.e. just one dot per row), look at the axes so
you know with which categorical value each row is associated and what
the range of the quantitative values is (min to max). Where you have
multiple series dot plots (i.e. more than one dot), establish what the
different colours/symbols represent in terms of categorical breakdown.
Glance across the entire chart to locate the big, small and medium
values and perform global comparisons to establish the high-level
ranking of biggest > smallest. Identify any noticeable exceptions and/or
246
outliers. Where you have multiple series look across each series of dot
values separately and then perform local comparisons within rows to
identify the relative position of each dot, observing the gaps, big and
small. Estimate the absolute values of specific dots of interest. Where
available, compare the quantities against annotated references such as
targets, forecast, last year, average, etc.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines,
in particular, can be helpful to increase the accuracy of the reading of
the quantitative values.
COMPOSITION: Given that the quantitative value axis does not need
to commence from a zero origin it is important to label clearly the axis
values when the baseline is not commencing from a minimum of zero.
There is no significant difference in perception between vertical or
horizontal arrangement though horizontal layouts tend to make it easier
to accommodate and read the category labels for each row. Where
possible try to make the categorical sorting meaningful, maybe
organising values in ascending/descending size order.
VARIATIONS & ALTERNATIVES
Alternatives would include the ‘bar chart’, to show the size of
quantitative values for different categories. The ‘connected dot plot’
would be used to focus on the difference between two measures. The
‘univariate scatter plot’ would be used to show the range of multiple
values across categories, to display the diversity and distribution of
values.
Charts Comparisons
Connected Dot Plot
247
ALSO KNOWN AS Barbell chart, dumb-bell chart
REPRESENTATION DESCRIPTION
A connected dot plot displays absolute quantities and quantitative
differences between two categorical dimensions for different major
categories. The display is formed by two points (normally circles but
any ‘symbol’ is legitimate) to mark the quantitative value positions for
two comparable categorical dimensions. There is a row of connected
dots for each major category. Colour or difference in symbol is
generally used to distinguish these points. Joining the two points
together is a connecting line which effectively represents the ‘delta’
(difference) between the two values.
EXAMPLE Comparing the typical salaries for women and men across
a range of different job categories in the US.
Figure 6.11 Gender Pay Gap US
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with which major categorical values each
row is associated and what the range of the quantitative values is (min
to max). Determine which dots resemble which categorical dimension
(could be colour, symbol or a combination) and see if there is any
meaning behind the colouring of the connecting bars. Think about what
the quantitative values mean to determine whether it is a good thing to
be higher or lower. Glance across the entire chart to locate the big,
248
small and medium connecting bars in each direction. Perform global
comparisons to establish the high-level ranking of biggest > smallest
differences as well as the highest and lowest values. There may be
deliberate sorting of the display based on one of the quantitative
measures. Identify any noticeable exceptions and/or outliers. Estimate
(or read, if labels are present) the absolute values, direction and size of
differences for specific categories of interest.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines,
in particular, can be helpful to increase the accuracy of the reading of
the quantitative values. Consider labelling categories adjacent to the
plotted points rather than next to the axis line (and possibly far away
from the values) to make it easier for the reader to understand the
category–row association.
COLOUR TIPS: Colour may be used to indicate and emphasise the
directional basis of the connecting line differences.
COMPOSITION: If the two plotted measures are very similar, and the
point markers effectively overlap, you will need to decide which should
be positioned on top. As the representation of the quantitative values is
through position along a scale and not size (it is the difference that is
sized, not the absolutes) the quantitative axis does not need to have a
zero origin. However, a zero origin can be helpful to establish the scale
of the differences. Where possible try to make the sorting meaningful
using any one of the three quantitative measures to optimise the layout.
VARIATIONS & ALTERNATIVES
Variations in the use of the ‘connected dot plot’ would show before and
after analysis between two points in time, possibly using the ‘arrow
chart’ to indicate the direction of change explicitly. Similarly, the
‘carrot chart’ uses line width tapering to indicate direction, the fatter
end the more recent values. The ‘univariate scatter plot’ would be used
to show the range of multiple values across categories, to display the
diversity and distribution of values rather than comparing differences
between values.
Charts Comparisons
249
Pictogram
ALSO KNOWN AS Isotype chart, pictorial bar chart, stacked shape
chart, tally chart
REPRESENTATION DESCRIPTION
A pictogram displays quantitative values for different major categories
with additional categorical dimensions included for further breakdown.
In contrast with the bar chart, rather than using the size of a bar,
quantities of point marks, in the form of symbols or pictures, are
stacked to represent the quantitative value for each category. Each point
may be representative of one or many quantitative units (e.g. a single
shape may represent 1000 people) but note that, unless you use symbol
portions, you will not be able to represent decimals. Pictograms may be
used to offer a more emotive (humanising or more light-hearted)
display than a bar can offer. Additional categorical dimensions can be
accommodated in the same chart by using marks differentiated by
variations in colour, symbol or picture. Always ensure the markers used
are as intuitively recognisable as possible and consider minimising the
variety as this makes it cognitively harder for the viewer to identify
associations easily and make sense of the quantities.
EXAMPLE Comparing the number of players with different facial hair
types across the four teams in the NHL playoffs in 2015.
Figure 6.12 Who Wins the Stanley Cup of Playoff Beards?
250
HOW TO READ IT & WHAT TO LOOK FOR
Look at the major categorical axis to establish with which category each
row is associated. Establish the mark associations to understand what
categorical dimensions each colour/shape variation represents. Glance
across the entire chart to locate the big, small and medium stacks of
shapes and perform global comparisons to establish the high-level
ranking of biggest > smallest. Identify any noticeable exceptions and/or
outliers. Perform local comparisons between neighbouring categories,
to identify larger than and smaller than relationships and estimate the
relative proportions. Estimate (or read, if labels are present) the absolute
values of specific groups of markers of interest.
PRESENTATION TIPS
ANNOTATION: The choice of symbol/ picture should be as
recognisably intuitive as possible and locate any legends as close as
possible to the display.
COLOUR TIPS: Maximise the variation in marker by using different
combinations in both colour and shape, rather than just variation of one
attribute.
COMPOSITION: If the quantities of markers exceed a single row, try
to make the number of units per row logically ‘countable’, such as
displaying in groups of 5, 10 or 100. To aid readability, make sure there
is a sufficiently noticeable gap between rows, otherwise sometimes the
eye struggles to form the distinct clusters of shapes for each category
displayed. Where possible try to make the categorical sorting
meaningful, maybe organising values in ascending/descending size
order.
251
VARIATIONS & ALTERNATIVES
Extending the idea of using repeated quantities of representative
symbols, some applications take this further by using large quantities of
individual symbols to get across the feeling of magnitude and scale.
When showing a part-to-whole relationship, the ‘waffle chart’ can use
simple symbol devices to differentiate the constituent parts of a whole.
Charts Comparisons
Proportional shape chart
ALSO KNOWN AS Area chart (wrongly)
REPRESENTATION DESCRIPTION
A proportional shape chart displays quantitative values for different
categories. The chart is based on the use of different area marks, one for
each category, sized in proportion to the quantities they represent. By
using the quadratic dimension of area size rather than the linear
dimension of bar length or dot position, the shape chart offers scope for
displaying a diverse range of quantitative values within the same chart.
Typically the layout is quite free-form with no baseline or central
gravity binding the display together.
EXAMPLE Comparing the market capitalisation ($) of companies
involved in the legal sale of marijuana across different industry sectors.
Figure 6.13 For These 55 Marijuana Companies, Every Day is 4/20
252
HOW TO READ IT & WHAT TO LOOK FOR
Look at the shapes and their associated labels so you know with what
major categorical values each is associated. If there are only direct
labels, find the largest shape to establish its quantitative value as the
maximum and do likewise for the smallest – this will help calibrate the
size judgements. Otherwise, if it exists, acquaint yourself with the size
key. Glance across the entire chart to locate the big, small and medium
shapes and perform global comparisons to establish the high-level
ranking of biggest > smallest. Identify any noticeable exceptions and/or
outliers. Perform local comparisons between neighbouring shapes to
identify larger than and smaller than relationships and estimate the
relative proportions. Estimate (or read, if labels are present) the absolute
values of specific shapes of interest.
PRESENTATION TIPS
ANNOTATION: Sometimes a quantitative size key will be included
rather than direct labelling (usually when there are many shapes and
limited empty space) though direct labels will help overcome some of
the limitations of judging area size. You will have to decide how to
handle label positioning for those shapes with exceptionally small sizes.
COLOUR TIPS: Colours are not fundamentally necessary to encode
category (the position/separation of different shapes achieves that
already) but they can be useful as redundant encodings to make the
category even more immediately distinguishable.
COMPOSITION: Estimating and comparing the size of areas with
253
accuracy is not as easy as it is for judging bar length or dot position, so
only use this chart type if you have a diverse range of quantitative
values. The geometric accuracy of the size calculations is paramount.
Mistakes are often made, in particular, with circle size calculations: it is
the area you are modifying, not the diameter/radius. Arrangement
approaches vary: sometimes you see the shapes anchored to a common
baseline (bottom or central alignment) while on other occasions they
might just ‘float’. If you use an organic shape, like a human figure, to
represent different quantities you need to adjust the entire shape area,
not just the height. Often the approach for this type of display is to treat
the figure as a rudimentary rectangular shape. Sometimes the volume of
a shape is used rather than area to represent quantitative values
(especially if there are almost exponentially different values to show)
but this increases the perceptual difficulty in estimating and comparing
values. Where possible try to make the categorical sorting meaningful,
maybe organising values in ascending/descending size order.
VARIATIONS & ALTERNATIVES
The ‘bubble chart’ uses clusters of sized bubbles to compare categorical
values and, sometimes, to represent part-to-whole analysis. The ‘nested
shape chart’ might include secondary, smaller area sizes nested within
each shape to display local part-to-whole relationships.
Charts Comparisons
Bubble chart
ALSO KNOWN AS Circle packing diagram
EXAMPLE Comparing the Public sector capital expenditure (£
million) on services by function of the UK Government during 2014/15.
REPRESENTATION DESCRIPTION
254
A bubble chart displays quantitative values for different major
categories with additional categorical dimensions included for further
breakdown. It is based on the use of circles, one for each category, sized
in proportion to the quantities they represent. Sometimes several
separate clusters may be used to display further categorical dimensions,
otherwise the colouring of each circle can achieve this. It is similar in
concept to the proportional shape chart but differs through the typical
layout being based on clustering, which therefore also enables it as a
device for showing part-to-whole relationships as well.
Figure 6.14 UK Public Sector Capital Expenditure, 2014/15
HOW TO READ IT & WHAT TO LOOK FOR
Look at the shapes and their associated labels so you know with what
major categorical values each is associated, noting any size and colour
legends to assist in forming associations. If there are multiple clusters,
learn about the significance of the grouping/separation in each case. If
there are direct labels, find the largest shape to establish its quantitative
value as the maximum and do likewise for the smallest – this will help
calibrate other size judgements. Glance across the entire chart to locate
the big, small and medium shapes and perform global comparisons to
establish the high-level ranking of biggest > smallest. Identify any
noticeable exceptions and/or outliers. Perform local comparisons
between neighbouring shapes to identify larger than and smaller than
relationships and estimate the relative proportions. Estimate (or read, if
labels are present) the absolute values of specific shapes of interest. If
there are multiple clusters, note the general relative size and number of
members in each case.
255
PRESENTATION TIPS
INTERACTIVITY: Bubble charts may often be accompanied by
interactive features that let users select or mouseover individual circles
to reveal annotated values for the quantity and category.
ANNOTATION: If interactivity is not achievable, a quantitative size
key should be included or direct labelling; the latter may make the
display busy (and be hard to fit into smaller circles) but will help
overcome some of the limitations of judging area size.
COLOUR TIPS: Colours are sometimes used as redundant encodings
to make the quantitative sizes even more immediately distinguishable.
COMPOSITION: Estimating and comparing the size of areas with
accuracy is not as easy as it is for judging bar length or dot position, so
only use this chart type if you have a diverse range of quantitative
values. The use of this chart will primarily be about facilitating a gist, a
general sense of the largest and smallest values. The geometric
accuracy of the circle size calculations is paramount. Mistakes are often
made with circle size calculations: it is the area you are modifying, not
the diameter/radius. If you wish to make your bubbles appear as 3D
spheres you are essentially no longer representing quantitative values
through the size of a geometric area mark; rather the mark will be a
‘form’ and so the size calculation will be based on volume, not area.
There is no categorical or quantitative sorting applied to the layout of
the bubble chart, instead the tools that offer these charts will generally
use a layout algorithm that applies a best-fit clustering to arrange the
circles radially about a central ‘gravity’ force.
VARIATIONS & ALTERNATIVES
When the collection of quantities represents a whole, this evolves into a
chart known as a ‘circle packing diagram’ and usually involves many
parts that pack neatly into a circular layout representing the whole.
Another variation of the packing diagram is when the adjacency
between circle ‘nodes’ indicates a connected relation, offering a
variation of the node–link diagram for showing networks of
relationships. The bubble plot also uses differently sized circles but the
position in each case is overlaid onto a scatter plot structure, based on
two dimensions of further quantitative variables. Removing the size
attribute (and effectively replacing area with point mark) you could
simply use the quantity of points clustered together for different
categories to create a ’tally chart’.
256
Charts Comparisons
Radar chart
ALSO KNOWN AS Filled radar chart, star chart, spider diagram, web
chart
EXAMPLE Comparing the global competitive scores (out of 7) across
12 ‘pillars’ of performance for the United Kingdom.
REPRESENTATION DESCRIPTION
A radar chart shows values for three or more different quantitative
measures in the same display for, typically, a single category. It uses a
radial (circular) layout comprising several axes emerging from the
centre-like spokes on a wheel, one for each measure. The quantitative
values for each measure are plotted through position along each scale
and then joined by connecting lines to form a unique geometric shape.
Sometimes this shape is then filled with colour. A radar chart should
only be considered in situations where the cyclical ordering (and
neighbourly pairings) has some significance (such as data that might be
plotted around the face of a clock or compass) and when the
quantitative scales are the same (or similar) for each axis. Do not plot
values for multiple categories on the same radar chart, but use small
multiples formed of several radar charts instead.
Figure 6.15 Global Competitiveness Report 2014—2015
257
HOW TO READ IT & WHAT TO LOOK FOR
Look around the chart and acquaint yourself with the quantitative
measure represented by each axis and note the sequencing of the
measures around the display. Is there any significance in this
arrangement that can assist in interpreting the overall shape? Note the
range of values along each independent axis so you understand what
positions along the scales mean in a value sense for each measure. Scan
the shape to locate the outliers both towards the outside (larger values)
and inside (smaller values) of the scales. It is more important to pay
attention to the position of values along an axis than the nature of the
connecting lines between axes, unless the axis scales are consistent or at
least if the relative position along the scale has the same implied
meaning. If the variable sequencing has cyclical relevance, the spiking,
bulging or contracting shape formed will give you some sense of the
balance of values. Perform local comparisons between neighbouring
axes to identify larger than and smaller than relationships. Estimate (or
read, if labels are present) the absolute values of specific shapes of
interest.
PRESENTATION TIPS
258
ANNOTATION: The inclusion of visible annotated features like axis
lines, tick marks, gridlines and value labels can naturally aid the
readability of the radar chart. Gridlines are only relevant if there are
common scales across each quantitative variable. If so, the gridlines
must be presented as straight lines, not concentric arcs, because the
connecting lines joining up the values are themselves straight lines.
COLOUR TIPS: Often the radar shapes are filled with a colour,
sometimes with a degree of transparency to allow the background
apparatus to be partially visible.
COMPOSITION: The cyclical ordering of the quantitative variables
has to be of optimum significance as the connectors and shape change
for every different ordering permutation. This will have a major impact
on the readability and meaning of the resulting chart shape. As the axes
will be angled all around the radial display, you will need to make sure
all the associated labels are readable (i.e. not upside down or at difficult
angles).
VARIATIONS & ALTERNATIVES
A ‘polar chart’ is an alternative to the radar chart that removes some of
the main shortcomings caused by connecting lines in the radar chart. If
you have consistent value scales across the different quantitative
measures, a ‘bar chart’ or ‘dot plot’ would be a better alternative. While
not strictly a variation, ‘parallel coordinates’ display a similar technique
for plotting several independent quantitative measures in the same
chart. The main difference is that parallel coordinates use a linear layout
and can accommodate many categories in one display.
Charts Comparisons
Polar chart
ALSO KNOWN AS Coxcomb plot, polar area plot
259
REPRESENTATION DESCRIPTION
A polar chart shows values for three or more different quantitative
measures in the same display. It uses a radial (circular) layout
comprising several equal-angled circular sectors like slices of a pizza,
one for each measure. In contrast to the radar chart (which uses position
along a scale), the polar chart uses variation in the size of the sector
areas to represent the quantitative values. It is, in essence, a radially
plotted bar chart. Colour is an optional attribute, sometimes used
visually to indicate further categorical dimensions. A polar chart should
only be considered in situations where the cyclical ordering (and
neighbourly pairings) has some significance (such as data that might be
plotted around the face of a clock or compass) and when the
quantitative scales are the same (or similar) for each axis.
EXAMPLE Comparing the quantitative match statistics across 14
different performance measures for a rugby union player.
Figure 6.16 Excerpt from a Rugby Union Player Dashboard
HOW TO READ IT & WHAT TO LOOK FOR
260
Look around the chart and acquaint yourself with the quantitative
measures each sector represents and note the sequencing of the
measures around the display. Is there any significance in this
arrangement that can assist in interpreting the overall shape? Note the
range of values included on the quantitative scale and acquaint yourself
with any colour associations. Glance across the entire chart to locate the
big, small and medium sectors and perform global comparisons to
establish the high-level ranking of biggest > smallest. Identify any
noticeable exceptions and/or outliers. Perform local comparisons
between neighbouring variables to identify the order of magnitude and
estimate the relative sizes. Estimate (or read, if labels are present) the
absolute values of specific sectors of interest. Where available, compare
the quantities against annotated references such as targets, forecast, last
year, average, etc. If there is significance behind the sequencing of the
variables, look out for any patterns that emerge through spiking,
bulging or contracting shapes.
PRESENTATION TIPS
ANNOTATION: The inclusion of visible annotated features like tick
marks and value labels can naturally aid the readability of the polar
chart. Gridlines are only relevant if there are common scales across
each quantitative variable. If so, the gridlines must be presented as arcs
reflecting the outer shape of each sector. Connecting lines joining up
the values are themselves straight lines. Each sector typically uses the
same quantitative scale for each quantitative measure but, on the
occasions when this is not the case, each axis will require its own, clear
value scale.
COLOUR TIPS: Often polar chart sectors are filled with a meaningful
colour, sometimes with a degree of transparency to allow the
background apparatus to be partially visible.
COMPOSITION: The cyclical ordering of the quantitative variables
has to be of some significance to legitimise the value of the polar chart
over the bar chart. As the sectors will be angled all around the radial
display, you will need to make sure all the associated labels are
readable (i.e. not upside down or at difficult angles). The quantitative
values represented by the size of the sectors need to be carefully
calculated. It is the area of the sector, not the radius length, that will be
modified to portray the values accurately. If you make maximum
quantitative value equivalent to the largest sector area, all other sector
sizes can be calculated accordingly. Knowing how many different
quantitative variables you are showing means you can easily calculate
261
the angle of any given sector. The quantitative measure axes should
always start from the origin value of zero: a sector should be
representative of the true, full quantitative value, nothing more, nothing
less, otherwise the perception of size will be distorted when comparing
relative sizes.
VARIATIONS & ALTERNATIVES
Unless the radial layout provides meaning through the notion of a
‘whole’ or through the cyclical arrangement of measures, you might be
best using a ‘bar chart’. Variations in approach tend to see
modifications in the sector shape with measure values represented by
individual bars lengths or, in the example of the Better Life Index
project, through variations in ‘petal’ sizes.
Charts Distributions
Range Chart
ALSO KNOWN AS Span chart, floating bar chart, barometer chart
REPRESENTATION DESCRIPTION
A range chart displays the minimum to maximum distribution of a
series of quantitative values for different categories. The display is
formed by a bar, one for each category, with the lower and upper
position of the bars shaped by the minimum and maximum quantitative
values in each case. The resulting bar lengths thus represent the range
of values between the two limits.
EXAMPLE Comparing the highest and lowest temperatures (°F)
recorded across the top 10 most populated cities during 2015.
Figure 6.17 Range of Temperatures Recorded in Top 10 Most
Populated Cities (2015)
262
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with what major categorical values each
range bar is associated and what the range of the quantitative values is
(min to max). Glance across the entire chart to locate the big, small and
medium bars and perform global comparisons to establish the high-
level ranking of biggest > smallest differences as well as the highest and
lowest values. Identify any noticeable exceptions and/or outliers.
Perform local comparisons between neighbouring bars, to identify
larger than and smaller than relationships and estimate the relative
proportions. There may be deliberate sorting of the display based on
one of the quantitative measures. Estimate (or read, if labels are
present) the absolute values of specific bars of interest. Where
available, compare the quantities against annotated references such as
targets, forecast, last year, average, etc.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines,
in particular, can be helpful to increase the accuracy of the reading of
the quantitative values. If you have axis labels you may not need direct
labels on each bar – this will be lead to label overload, so generally
decide between one or the other.
COMPOSITION: The quantitative value axis does not need to
commence from zero, unless it means something significant to the
263
interpretation, as the range of values themselves does not necessarily
start from zero and the focus is more on the range and difference
between the outer values. There is no significant difference in
perception between vertical or horizontal layouts, though the latter tend
to make it easier to accommodate and read the category labels. Where
possible, try to make the categorical sorting meaningful, maybe
organising values in ascending/descending size order.
VARIATIONS & ALTERNATIVES
‘Connected dot plots’ will also emphasise the difference between two
selected measure values (as opposed to min/max) or where the
underlying data is a change over time between two observations. ‘Band
charts’ will often be used to show how the range of data values has
changed over time, displaying the minimum and maximum bands at
each time unit. These are often used in displays like weather forecasts.
Charts Distributions
Box-and-whisker plot
ALSO KNOWN AS Box plot
REPRESENTATION DESCRIPTION
A box-and-whisker plot displays the distribution and shape of a series
of quantitative values for different categories. The display is formed by
a combination of lines and point markers to indicate (through position
and length), typically, five different statistical measures. Three of the
statistical values are common to all plots: the first quartile (25th
percentile), the second quartile (or median) and the third quartile (75th
percentile) values. These are displayed with a box (effectively a wide
bar) positioned and sized according to the first and third quartile values
with a marker indicating the median. The remaining two statistical
values vary in definition: usually either the minimum and maximum
264
values or the 10th and 90th percentiles. These statistical values are
represented by extending a line beyond the bottom and top of the main
box to join with a point marker indicating the appropriate position.
These are the whiskers. A plot will be produced for each major
category.
EXAMPLE Comparing the distribution of annual earnings 10 years
after starting school for graduates across the eight Ivy League schools.
Figure 6.18 Ranking the Ivies
HOW TO READ IT & WHAT TO LOOK FOR
Begin by looking at the axes so you know with which category each
plot is associated and what the range of quantitative values is (min to
max). Establish the specific statistics being displayed, by consulting any
legends or descriptions, especially in order to identify what the
‘whiskers’ are representing. Glance across the entire chart to locate the
main patterns of spread, identifying any common or noticeably different
265
patterns across categories. Look across the shapes formed for each
category to learn about the dispersal of values: starting with the median,
then observing the extent and balance of the ‘box’ (the interquartile
range between the 25th and 75th percentiles) and then check the
‘whisker’ extremes. Is the shape balanced or skewed around the
median? Is the interquartile range wide or narrow? Are the whisker
extremes far away from the edges of the box? Then return to comparing
shapes across all categories to identify more precisely any interesting
differences or commonalities for each of the five statistical measures.
PRESENTATION TIPS
ANNOTATION: If you have axis labels you may not need direct labels
on each bar – this will lead to label overload, so generally decide
between one or the other.
COMPOSITION: The quantitative value axis does not need to
commence from zero, unless it means something significant to the
interpretation, as the range of values themselves do not necessarily start
from zero and the focus is on the statistical properties between the outer
values. There is no significant difference in perception between vertical
or horizontal box-and-whisker plots, though horizontal layouts tend to
make it easier to accommodate and read the category labels. Try to keep
a noticeable gap between plots to enable greater clarity in reading.
When you have several or many plots in the same chart, where possible
try to make the categorical sorting meaningful, maybe organising values
in ascending/descending order based on the median value.
VARIATIONS & ALTERNATIVES
Variations involve reducing the number of statistical measures included
in the display by removing the whiskers to just show the 25th and 75th
percentiles through the lower and upper parts of the box. The
‘candlestick chart’ (or OHLC chart) involves a similar approach and is
often used in finance to show the distribution and milestone values of
stock performances during a certain time frame (usually daily), plotting
the opening, highest, lowest and closing prices, using colour to indicate
an up or down trend.
Charts Distributions
266
Univariate scatter plot
ALSO KNOWN AS 1D scatter plot, jitter plot
REPRESENTATION DESCRIPTION
A univariate scatter plot displays the distribution of a series of
quantitative values for different categories. In contrast to the box-and-
whisker plot, which shows selected statistical values, a univariate
scatter plot shows all values across a series. For each category, a range
of points (typically circles but any ‘symbol’ is legitimate) are used to
mark the position along the scale of the quantitative values. From this
you can see the range, the outliers and the clusters and form an
understanding about the general shape of the data.
EXAMPLE Comparing the distribution of average critics score (%)
from the Rotten Tomatoes website for each movie released across a
range of different franchises and movie theme collections.
Figure 6.19 Comparing Critics Scores for Major Movie Franchises
267
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know what each scatter row/column relates to
in terms of which category it is associated with and what the range of
the quantitative values is (min to max). If colour has been used to
emphasise or separate different marks, establish what the associations
are. Also, learn about how the design depicts multiple marks on the
same value – these may appear darker or indeed larger. Glance across
the entire chart to observe the main patterns of clustering and identify
any noticeable exceptions and/or outliers across all categories. Then
look more closely at the patterns within each scatter to learn about each
category’s specific dispersal of values. Look for empty regions where
no quantitative values exist. Estimate the absolute values of specific
dots of interest. Where available, compare the quantities against
annotated references such as the average or median.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like gridlines can be helpful
to increase the accuracy of the reading of the quantitative values. Direct
labelling is normally restricted to including values for specifically
noteworthy points only.
COLOUR: Colour may be used to establish focus of certain points
268
and/or distinction between different sub-category groups to assist with
interpretation. When several points have the exact same value you
might need to use unfilled or semi-transparent filled circles to facilitate
a sense of value density.
COMPOSITION: The representation of the quantitative values is
based on position and not size, therefore the quantitative axis does not
need to have a zero origin. There is no significant difference in
perception between vertical or horizontal arrangement, though
horizontal layouts tend to make it easier to accommodate and read the
category labels. Where possible try to make the categorical sorting
meaningful, maybe organising values in ascending/descending size
order.
VARIATIONS & ALTERNATIVES
To overcome occlusion caused by plotting several marks at the same
value, a variation of the univariate scatter plot may see the points
replaced by geometric areas (like circles), where the position attribute is
used to represent a quantitative value along a scale and the size attribute
is used to indicate the frequency of observations of similar value.
Adding a second quantitative variable axis would lead to the use of a
’scatter plot’.
Charts Distributions
Histogram
ALSO KNOWN AS Bar chart (wrongly)
REPRESENTATION DESCRIPTION
A histogram displays the frequency and distribution for a range of
quantitative groups. Whereas bar charts compare quantities for different
categories, a histogram technically compares the number of
269
observations across a range of value ‘bins’ using the size of lines/bars
(if the bins relate to values with equal intervals) or the area of
rectangles (if the bins have unequal value ranges) to represent the
quantitative counts. With the bins arranged in meaningful order (that
effectively form ordinal groupings) the resulting shape formed reveals
the overall pattern of the distribution of observations.
EXAMPLE Comparing the distribution of movies released over time
starring Michael Caine across five-year periods based on the date of
release in the US.
Figure 6.20 A Career in Numbers: Movies Starring Michael Caine
HOW TO READ IT & WHAT TO LOOK FOR
Begin by looking at the axes so you know what the chart depicts in
terms of the categorical bins and the range of the quantitative values
(zero to max). Glance across the entire chart to establish the main
pattern. Is it symmetrically shaped, like a bell or pyramid (around a
median or average value)? Is it skewed to the left or right? Does it dip
in the middle and peak at the edges (known as bimodal)? Does it have
several peaks and troughs? Maybe it is entirely random in its pattern?
270
All these characteristics of ‘shape’ will inform you about the underlying
distribution of the data.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines
in particular can be helpful to increase the accuracy of the reading of
the quantitative values. Axis labels more than direct value labels tend to
be used so as not to crowd the shape of the histogram.
COMPOSITION: Unlike the bar chart there should be no (or at most a
very thin) gap between bars to help the collective shape of the
frequencies emerge. The sorting of the quantitative bins must be in
ascending order so that the reading of the overall shape preserves its
meaning. The number of value bins and the range of values covered by
each have a prominent influence over the appearance of the histogram
and the usefulness of what it might reveal: too few bins may disguise
interesting nuances, patterns and outliers; too many bins and the most
interesting shapes may be abstracted by noise above signal. There is no
singular best approach, the right choice simply arrives through
experimentation and iteration.
VARIATIONS & ALTERNATIVES
For analysis that looks at the distribution of values across two
dimensions, such as the size of populations for age across genders, a
‘back-to-back histogram’ (with male on one side, female on the other),
also commonly known as a ‘violin plot’ or ‘population pyramid’, is a
useful approach to see and compare the respective shapes. A ‘box-and-
whisker plot’ reduces the distribution of values to five key statistical
measures to describe key dimensions of the spread of values.
Charts Distributions
Word cloud
271
ALSO KNOWN AS Tag cloud
REPRESENTATION DESCRIPTION
A word cloud shows the frequency of individual word items used in
textual data (such as tweets, comments) or documents (passages,
articles). The display is based around an enclosed cluster of words with
the font (not the word length) sized according to the frequency of usage.
In modifying the size of font this is effectively increasing the area size
of the whole word. All words have a different shape and size so this can
make it quite difficult to avoid the prominence of long words,
irrespective of their font size. Word clouds are therefore only useful
when you are trying to get a quick and rough sense of some of the
dominant keywords used in the text. They can be an option for working
with qualitative data during the data exploration stage, more so as a
means for reporting analysis to others.
EXAMPLE Comparing the frequency of words used in Chapter 1 of
this book.
Figure 6.21 Word Cloud of the Text from Chapter 1
HOW TO READ IT & WHAT TO LOOK FOR
The challenge with reading word clouds is to avoid being drawn to the
length and/or area of a word – they are simply attributes of the word,
not a meaningful representation of frequency. It is the size of the font
that you need to focus on. Scan the display to spot the larger text
showing the more frequently used words. Consider any words of
specific interest to see if you can find them; if they are not significantly
visible, that in itself could be revealing. While most word cloud
272
generators will dismiss many irrelevant words, you might still need to
filter out perceptually the significance of certain dominantly sized text.
PRESENTATION TIPS
INTERACTIVITY: Interactive features that let users interrogate, filter
and scrutinise the words in more depth, perhaps presenting examples of
their usage in a passage, can be quite useful to enhance the value of a
word cloud.
ANNOTATION: While the absolutes are generally of less interest than
relative comparisons, to help viewers get as much out of the display as
possible a simple legend explaining how the font size equates to
frequency number can be useful.
COLOUR: Colours may be used as redundant encoding to accentuate
further the larger frequencies or categorically to create useful visual
separation.
COMPOSITION: The arrangement of the words within a word cloud
is typically based on a layout process. Although not random, this will
generally prioritise the placement of words to occupy optimum
collective space that preserves an overall shape (with essentially a
central gravity) over and above any arrangement that might better
enable direct comparison.
VARIATIONS & ALTERNATIVES
The alternative approach would be to use any other method in this
categorical family of charts that would more usefully display the counts
of text, such as a bar chart.
Charts Part-to-whole
Pie chart
273
ALSO KNOWN AS Pizza chart
REPRESENTATION DESCRIPTION
A pie chart shows how the quantities of different constituent categories
make up a whole. It uses a circular display divided into sectors for each
category, with the angle representing each of the percentage
proportions. The resulting size of the sector (in area terms) is a spatial
by-product of the angle applied to each part and so offers an additional
means for judging the respective values. The role of a pie chart is
primarily about being able to compare a part to a whole than being able
to compare one part to another part. They therefore work best when
there are only two or three parts included. There are a few important
rules for pie charts. Firstly, the total percentage values of all sector
values must be 100%; if the aggregate is greater than or less than 100%
the chart will be corrupted. Secondly, the whole has to be meaningful –
often people just add up independent percentages but that is not what a
pie chart is about. Finally, the category values must represent exclusive
quantities; nothing should be counted twice or overlap across different
categories. Despite all these warnings, do not be afraid of the pie chart –
just use it with discretion.
EXAMPLE Comparing the proportion of eligible voters in the 2015
UK election who voted for the Conservative Party, for other parties and
who did not vote.
Figure 6.22 Summary of Eligible Votes in the UK General Election
2015
274
HOW TO READ IT & WHAT TO LOOK FOR
Begin by establishing which sectors relate to what categories. This may
involve referring to a colour key legend or through labels directly
adjacent to the pie. Quickly scan the pie to identify the big, medium and
small sectors. Notice if there is any significance behind the ordering of
the parts. Unless there are value labels, you next will attempt to judge
the individual sector angles. This usually involves mentally breaking
the pie into 50% halves (180°) or 25% quarters (90°) and using those
guides to perceptually measure the category values. Comparing parts
against other parts with any degree of accuracy will only be possible
once you have formed estimates of the individual sector sizes. If you
are faced with the task of judging the size of many parts it is quite
understandable if you decide to give up quite soon.
PRESENTATION TIPS
ANNOTATION: The use of local labelling for category values can be
useful but too many labels can become cluttered, especially when
attempting to label very small angled sectors.
COLOUR: Colour is generally vital to create categorical separation
275
and association of the different sectors so aim to use the difference in
colour hue and not colour saturation to maximise the visible difference.
COMPOSITION: Positioning the first slice at the vertical 12 o’clock
position gives a useful baseline to help judge the first sector angle
value. The ordering of sectors using descending values or ordinal
characteristics helps with the overall readability and allocation of effort.
Do not consider using gratuitous decoration (like 3D, gradient colours,
or exploding slices).
VARIATIONS & ALTERNATIVES
Sometimes a pie chart has a hole in the centre and is known as a
‘doughnut chart’, continuing the food-related theme. The function is
exactly the same as a pie but the removal of the centre, often to
accommodate a labelling property, removes the possibility of the reader
judging the angles at the origin. One therefore has to derive the angles
from the resulting arc lengths. If you want to display multiple parts
(more than three) the bar chart will be a better option and, for many
parts, the ‘treemap’ is best. Depending on the allocated space, a
‘stacked bar chart’ may provide an alternative to the pie. Unlike most
chart types, the pie chart does not work well in the form of small
multiples (unless there is only a single part being displayed). A ‘nested
shape chart’, typically based on embedded square or circle areas,
enables comparison across a series of one-part-to-whole relationships
based on absolute numbers, rather than percentages, where the wholes
may vary in size.
Charts Part-to-whole
Waffle chart
ALSO KNOWN AS Square pie, unit chart, 100% stacked shape chart
REPRESENTATION DESCRIPTION
276
A waffle chart shows how the quantities of different constituent
categories make up a whole. It uses a square display usually
representing 100 point ‘cells’ through a 10 × 10 grid layout. Each
constituent category proportion is displayed through colour-coding a
proportional number of cells. Difference in symbol can also be used.
The role of the waffle chart is to simplify the counting of proportions in
contrast to the angle judgements of the pie chart, though the display is
limited to rounded integer values. This is easier when the grid layout
facilitates quick recognition of units of 10. As with the pie chart, the
waffle chart works best when you are showing how a single part
compares to the whole and perhaps offers greater visual impact when
there are especially small percentages of a whole. Rather than just
colouring in the grid cells, sometimes different symbols will be used to
associate with different categories. For example, you might see figures
or gender icons used to show the makeup of a given sample population.
EXAMPLE Comparing the proportion of total browser usage for
Internet Explorer and Chrome across key milestone moments.
Figure 6.23 The Changing Fortunes of Internet Explorer and Google
Chrome
HOW TO READ IT & WHAT TO LOOK FOR
Begin by establishing how the different shapes or colours are associated
with different categories. Assess the grid layout to understand the
dimension of the chart and the quantity of cell ‘units’ forming the
display (e.g. is it a 10 x 10 grid?). Quickly scan the chart to identify the
big, medium and small sectors. Notice if there is any significance
277
behind the ordering of the parts. Unless there are value labels, you will
need to count/estimate the number of units representing each category
value. Comparing parts against other parts will only be possible once
you have established the individual part sizes. If several related waffle
charts are shown, possibly for different categories or points in time,
identify the related colours/shapes in each chart and establish the
patterns of size between and across the various charts, looking for
trends, declines and general differences.
PRESENTATION TIPS
ANNOTATION: Direct labelling can become very cluttered and hard
to incorporate elegantly without the need for long arrows.
COLOUR: Borders around each square cell are useful to help establish
the individual units, but do not make the borders too thick to the point
where they dominate attention.
COMPOSITION: Always start each row of values from the same side,
for consistency and to make it easier for people to estimate the values.
When you have several parts in the same waffle chart, where possible
try to make the categorical sorting meaningful, maybe organising values
in ascending/descending size order or based on a logical categorical
order.
VARIATIONS & ALTERNATIVES
Sometimes the waffle chart approach is used to show stacks of absolute
unit values and indeed there are overlaps in concept between this
variation in the waffle chart and potential applications of the pictogram.
Aside from the pie chart, a ‘nested shape chart’ will provide an
alternative way of showing a part-to-whole relationship while also
occupying a squarified layout.
Charts Part-to-whole
Stacked bar chart
278
ALSO KNOWN AS
REPRESENTATION DESCRIPTION
A stacked bar chart displays a part-to-whole breakdown of quantitative
values for different major categories. The percentage proportion of each
categorical dimension or ‘part’ is represented by separate bars,
distinguished by colour, that are sized according to their proportion and
then stacked to create the whole. Sometimes the whole is standardised
to represent 100%, at other times the whole will be representative of
absolute values. Stacked bar charts work best when the parts are based
on ordinal dimensions, which enables ordering of the parts within the
stack to help establish the overall shape of the data. If the parts are
representative of nominal data, it is best to keep the number of
constituent categories quite low, as estimating the size of individual
stacked parts when there are many becomes quite hard.
EXAMPLE Comparing the percentage of adults (16–65 year olds)
achieving different proficiency levels in literacy across different
countries.
Figure 6.24 Literarcy Proficiency: Adult Levels by Country
279
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with what major categorical values each
bar is associated and what the quantitative values are, determining if it
is a 100% stacked bar or an absolute stacked bar (in which case identify
the min and the max). Establish the colour association to understand
what categories the bars within each stack represent. Glance across the
entire chart. If the categorical data is ordinal, and the sorting/colour of
the stacks is intuitive, you should be able to derive meaning from the
overall balance of colour patterns, especially where any annotated
gridlines help to guide your value estimation. If the categorical data is
nominal, seek to locate the dominant colours and the least noticeable
ones. Comparing across different stacked bars is made harder by the
lack of a common baseline for anything other than the bottom stack on
the zero baseline (and for 100% stacked bars, those final ones at the
top) and so a general sense of magnitude will be your focus. Study
closer the constituent parts within each stack to establish the high-level
ranking of biggest > smallest. Estimate (or read, if labels are present)
280
the absolute values of specific stacked parts of interest.
PRESENTATION TIPS
ANNOTATION: Direct value labelling can become very cluttered
when there are many parts or stacks and you are comparing several
different major categories. You might be better with a table if that is
your aim. Definitely include value axis labels with logical intervals and
it is very helpful to annotate, through gridlines, key units such as the
25%, 50% and 75% positions when based on a 100% stacked bar chart.
COLOUR: If you are representing categorical ordinal data, colour can
be astutely deployed to give a sense of the general balance of values
within the whole, but this will only work if their sorting arrangement
within the stack is logically applied. For categorical nominal data,
ensure the stacked parts have sufficiently different colours so that their
distinct bar lengths can be efficiently observed.
COMPOSITION: Across the main categories, once again consider the
optimum sorting option, maybe organising values in
ascending/descending size order or based on a logical categorical order.
Judging the size of the stacks with accuracy is harder for those that are
not on the zero baseline, so maybe consider which ones are of most
importance to be more easily read and place those on the baseline.
VARIATIONS & ALTERNATIVES
The main alternative would be to use ‘multi-panel bar charts’, where
separate bar charts each include just one ’stack’/part and they are then
repeated for each subsequent constituent category. In the world of
finance the ‘waterfall chart’ is a common approach based on a single
stacked bar broken up into individual elements, almost like a step-by-
step narrative of how the components of income look on one side and
then how the components of expenditure look on the other, with the
remaining space representing the surplus or deficit. Like their unstacked
siblings, stacked bar charts can also be used to show how categorical
composition has changed over time.
Charts Part-to-whole
281
Back-to-back bar chart
ALSO KNOWN AS Paired bar chart
REPRESENTATION DESCRIPTION
A back-to-back bar chart displays a part-to-whole breakdown of
quantitative values for different major categories. As with any bar chart,
the length of a bar represents a quantitative proportion or absolute value
for each part and across all major categories. In contrast to the stacked
bar chart, where the constituent bars are simply stacked to form a
whole, in a back-to-back bar chart the constituent parts are based on
diverging categorical dimensions with a ‘directional’ essence such as
yes/no, male/female, agree/disagree. The values for each dimension are
therefore presented on opposite sides of a shared zero baseline to help
reveal the shape and contrast differences across all major categories.
EXAMPLE Comparing the responses to a survey question asking for
opinions about ‘the government collection of telephone and Internet
data as part of anti-terrorism efforts’ across different demographic
categories.
Figure 6.25 Political Polarization in the American Public
282
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with which major categorical values each
bar is associated and what the range of the quantitative values is (min to
max). Establish what categorical dimensions are represented by the
respective sides of the display and any colour associations. Glance
across the entire chart to locate the big, small and medium bars and
perform global comparisons to establish the high-level ranking of
biggest > smallest. Repeat this for each side of the display, noticing any
patterns of dominance of larger values on either side. Identify any
noticeable exceptions and/or outliers. Perform local comparisons for
each category value to estimate the relative sizes (or read, if labels are
present) of each bar.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines
in particular can be helpful to increase the accuracy of the reading of
the quantitative values.
COLOUR: The bars either side of the axis do not need to be coloured
but often are to create further visual association.
COMPOSITION: The quantitative value axis should always start from
the origin value of zero: a bar should be representative of the true, full
quantitative value, nothing more, nothing less, otherwise the perception
283
of bar sizes will be distorted when comparing relative sizes. There is no
significant difference in perception between vertical or horizontal bars,
though horizontal layouts tend to make it easier to accommodate and
read the category labels. Where possible try to make the categorical
sorting meaningful, maybe organising values in ascending/descending
size order or based on a logical categorical order.
VARIATIONS & ALTERNATIVES
Back-to-back bar charts facilitate a general sense of the shape of
diverging categorical dimensions. However, if you want to facilitate
direct comparison, a ‘clustered bar chart’ showing adjacent bars helps
to compare respective heights more precisely. For analysis that looks at
the distribution values across two dimensions, such as the size of
populations for age across genders, a ‘back-to-back histogram’ (with
male on one side, female on the other), also commonly known as a
‘violin plot’ or ‘population pyramid’, is a useful approach to see and
compare the respective shapes. Some back-to-back applications do not
show a part-to-whole relationship but simply compare quantities for
two categorical values. Further variations may appear as ‘back-to-back
area charts’ showing mutual change over time for two contrasting
states.
Charts Part-to-whole
Treemap
ALSO KNOWN AS Heat map (wrongly)
REPRESENTATION DESCRIPTION
A treemap is an enclosure diagram providing a hierarchical display to
show how the quantities of different constituent parts make up a whole.
It uses a contained rectangular layout (often termed ‘squarified’)
representing the 100% total divided into proportionally sized
284
rectangular tiles for each categorical part. Colour can be used to
represent an additional quantitative measure, such as an indication of
amount of change over a time period. The absolute positioning and
dimension of each rectangle is organised by an underlying tiling
algorithm to optimise the overall space usage and to cluster related
categories into larger rectangle-grouped containers. Treemaps are most
commonly used, and of most value, when there are many parts to the
whole but they are only valid if the constituent units are legitimately
part of the same ‘whole’.
EXAMPLE Comparing the relative value of and the daily performance
of stocks across the S&P 500 index grouped by sectors and industries.
Figure 6.26 FinViz: Standard and Poor’s 500 Index
HOW TO READ IT & WHAT TO LOOK FOR
Look at the high-level groupings to understand the different containing
arrangements and establish what the colour association is. Glance
across the entire chart to seek out the big, small and medium individual
rectangular sizes and perform global comparisons to establish a general
ranking of biggest > smallest values. Also identify the largest through
to smallest container group of rectangles. If the colour coding is based
on quantitative variables, look out for the most eye-catching patterns at
the extreme end of the scale(s). If labels are provided (or offered
through interactivity), browse around the display looking for categories
and values of specific interest. As with any display based on the size of
the area of a shape, precise reading of values is hard to achieve and so it
is important to understand that treemaps can only aim to provide a
285
single-view gist of the properties of the many components of the whole.
PRESENTATION TIPS
INTERACTIVITY: Typically, a treemap will be presented with
interactive features to enable selection/mouseover events to reveal
further annotated details and/or drill-down navigation.
ANNOTATION: Group/container labels are often allocated a cell of
space but these are not to be read as proportional values. Effective
direct value labelling becomes difficult as the rectangles get smaller, so
often only the most prominent values might be annotated. Interactive
features will generally offer visibility of the relevant labels where
possible.
COLOUR: Colour can also be used to provide further categorical
grouping distinction if not already assigned to represent a quantitative
measure of change.
COMPOSITION: As the tiling algorithm is focused on optimising the
dimensions and arrangement of the rectangular shapes, treemaps may
not always be able to facilitate much internal sorting of high to low
values. However, generally you will find the larger shapes appear in the
top left of each container and work outwards towards the smaller
constituent parts.
VARIATIONS & ALTERNATIVES
A variation of the treemap sees the rectangular layout replaced by a
circular one and the rectangular tiles replaced by organic shapes. These
are known as ‘Voronoi treemaps’ as the tiling algorithm is informed by
a Voronoi tessellation. The ‘circle packing diagram’, a variation of the
‘bubble chart’, similarly shows many parts to a whole but uses a non-
tessellating circular shape/layout. The ‘mosaic plot’ or ‘Marimekko
chart’ is similar in appearance to a treemap but, in contrast to the
treemap’s hierarchical display, presents a detailed breakdown of
quantitative value distributions across several categorical dimensions,
essentially formed by varied width stacked bars.
Charts Part-to-whole
286
Venn diagram
ALSO KNOWN AS Set diagram, Euler diagram (wrongly)
REPRESENTATION DESCRIPTION
A Venn diagram shows collections of and relationships between
multiple sets. They typically use round or elliptical containers to
represent all different ‘membership’ permutations to include all
independent and intersecting containers. The size of the contained area
is (typically) not important: what is important is in which containing
region a value resides, which may be represented through the mark of a
text label or ‘point’.
EXAMPLE Comparing sets of permutations for legalities around
marijuana usage and same-sex marriage across states of the USA.
Figure 6.27 This Venn Diagram Shows Where You Can Both Smoke
Weed and Get a Same-Sex Marriage
287
HOW TO READ IT & WHAT TO LOOK FOR
To read a Venn diagram firstly establish what the different containers
are representative of in terms of their membership. Assess the
membership of the intersections (firstly ‘all’, then ‘partial’ intersections
when involving more than two sets) then work outwards towards the
independent container regions where values are part of one set but not
part of others. Occasionally there will be a further grouping state
outside of the containers that represents values that have no
membership with any set at all.
PRESENTATION TIPS
ANNOTATION: Unless you are using point markers to represent
membership values, clear labels are vital to indicate how many or which
elements hold membership with each possible set combination.
COLOUR: Colour is often used to create more immediate distinction
between the intersections and independent parts or members of each
288
container.
COMPOSITION: As the attributes of size and shape of the containers
are of no significance there is more flexibility to manipulate the display
to fit the number of sets around the constraint of real estate you are
facing and to get across the set memberships you are attempting to
show. The complexity of creating containers to accommodate all
combinations of intersection and independence states increases as the
number of sets increases, especially to preserve all possible
combinations of intersections between and independencies from all sets.
As the number of sets increases, the symmetry of shape reduces and the
circular containers are generally replaced with ellipses. While it is
theoretically possible to exceed four and five set diagrams, the ability of
readers to make sense of the displays diminishes and so they commonly
involve only two or three different sets.
VARIATIONS & ALTERNATIVES
A common variation or alternative to the Venn (but often mistakenly
called a Venn) is the ‘Euler diagram’. The difference is that an Euler
diagram does not need to present all possible intersections with and
independencies from all sets. A different approach to visualising sets
(especially larger numbers) can be achieved using the ‘UpSet’
technique.
Charts Hierarchies
Dendrogram
ALSO KNOWN AS Node–link diagram, layout tree, cluster tree, tree
hierarchy
REPRESENTATION DESCRIPTION
A dendrogram is a node–link diagram that displays the hierarchical
289
relationship across multiple tiers of categorical dimensions. It displays a
hierarchy based on multi-generational ‘parent-and-child’ relationships.
Starting from a singular origin root node (or ‘parent’) each subsequent
set of constituent ‘child’ nodes, a tier below and represented by points,
is connected by lines (curved or straight) to indicate the existence of a
relationship. Each constituent node may have further sub-constituencies
represented in the same way, continuing down through to the lowest tier
of detail. Each ‘generational’ tier is presented at the same relative
distance from the origin. The layout can be based on either a linear tree
structure (typically left to right) or radial tree (outwards from the
centre).
EXAMPLE Showing a breakdown of the 200+ beer brands belonging
to SAB InBev across different countries grouped by continent.
Figure 6.28 The 200+ Beer Brands of SAB InBev
HOW TO READ IT & WHAT TO LOOK FOR
290
Reading a dendrogram will generally be a highly individual experience
based on your familiarity with the subject and your interest in exploring
certain hierarchical pathways. The main focus of attention will likely be
to find the main clusters from where most constituent parts branch out
and to contrast these with the thinner, lighter paths comprising fewer
parts. Work left to right (linear) or in to out (radial) through the
different routes that stoke your curiosity.
PRESENTATION TIPS
ANNOTATION: With labelling required for each node, depending on
the number of tiers and the amount of nodes, the size of the text will
need to be carefully considered to ensure readability and minimise the
effect of clutter.
COLOUR: Colour would be an optional choice for accentuating certain
nodes or applying some further visual categorisation.
COMPOSITION: There are several different layout options to display
tree hierarchies like the dendrogram. The common choice is a cluster
layout based on the ‘Reingold–Tilford’ tree algorithms that offers a
tidying and optimisation treatment for the efficiency of the arrangement
of the nodes and connections. The sequencing of sub-constituencies
under each node could be logically arranged in some more meaningful
way than just alphabetical, though the cataloguing nature of A–Z may
suit your purpose. The choice of a linear or radial tree structure will be
informed largely by the space you have to work in as well as by the
cyclical or otherwise nature of the content in your data. The main issue
is likely to be one of legibility if and when you have numerous layers of
divisions and many constituent parts to show in a single view.
VARIATIONS & ALTERNATIVES
More advanced applications of dendrograms are used to present
hierarchical clustering (in fields such as computational biology) and
apply more quantitative meaning to the length of the links and the
positioning of the nodes. The ‘tree hierarchy diagram’ offers a similar
tree structure but introduces quantitative attributes to the nodes using
area marks, such as circles, sized according to a quantitative value. An
alternative approach to the dendrogram could involve a ‘linear bracket’.
This might show hierarchical structures for data-related sporting
competitions with knock-out format. The outer nodes would be the
starting point representing all the participating competitors/teams. Each
subsequent tier would represent those participants who progressed to
the next round, continuing through to the finalists and eventual victors.
291
Charts Hierarchies
Sunburst
ALSO KNOWN AS Adjacency diagram, icicle chart, multi-level pie
chart
EXAMPLE Showing a breakdown of the types of companies
responsible for extracting different volumes of carbon-based fuels
through various activities.
REPRESENTATION DESCRIPTION
A sunburst chart is an adjacency diagram that displays the hierarchical
and part-to-whole relationships across multiple tiers of categorical
dimensions. In contrast to the dendrogram, the sunburst uses layers of
concentric rings, one layer for each generational tier. Each ring layer is
divided into parts based on the constituent categorical dimensions at
that tier. Each part is represented by a different circular arc section that
is sized (in length; width is constant) according to the relative
proportion. Starting from the centre ‘parent’ tier, the outward adjacency
of the constituent parts of each tier represents the ‘parent-and-child’
hierarchical composition.
Figure 6.29 Which Fossil Fuel Companies are Most Responsible for
Climate Change?
292
HOW TO READ IT & WHAT TO LOOK FOR
Reading a sunburst chart will be a highly individual experience based
on your familiarity with the subject and your interest in exploring
certain hierarchical pathways. The main focus of attention will likely be
to find the largest arc lengths, representing the largest single constituent
parts, and those layers or tiers with the most constituent parts. Work
from the centre outwards through the different routes that stoke your
curiosity. Depending on the deployment of colour, this may help you
identify certain additional categorical patterns.
PRESENTATION TIPS
INTERACTIVITY: Often interactive mouseover/selection events are
the only way to reveal the annotations here.
ANNOTATION: Labelling can be quite difficult to fit into the narrow
spaces afforded by small proportion ‘parts’. If interactivity is not an
293
option you may decide to label only those parts that can accommodate
the text space.
COLOUR: Colours are often used to achieve further categorical
distinction.
COMPOSITION: Sometimes the parent–child (and other generational)
relationships could be legitimately reversed, so decisions need to be
made about the best hierarchy sequencing to suit the curiosities of the
audience. The sequencing of sub-constituencies under each node could
also be logically arranged in a meaningful way, more so than just
alphabetical, unless the cataloguing nature of A–Z ordering suits your
purpose.
VARIATIONS & ALTERNATIVES
Where the sunburst chart uses a radial layout, the ‘icicle chart‘ uses a
vertical, linear layout starting from the top and moving downwards. The
choice of a linear or radial tree structure will be informed largely by the
space you have to work in as well as by the legitimacy of the cyclical
nature of the content in your data. A variation on the sunburst chart
would be the ‘ring bracket’. This might show a reverse journey for
hierarchical data based on something like sporting competitions with
knock-out formats. The outer concentric partitions would represent the
participant competitors/teams at the start of the process. The length of
these arc line parts would be equally distributed across all constituent
parts with each subsequent tier representing ‘participants’ who progress
forward to the next ‘round’, continuing through to the finalists and
eventual victors in the centre.
Charts Correlations
Scatter plot chart
ALSO KNOWN AS Scatter graph
294
REPRESENTATION DESCRIPTION
A scatter plot displays the relationship between two quantitative
measures for different categories. Scatter plots are used to explore
visually the potential existence, extent or absence of a significant
relationship between the plotted variables. The display is formed by
points (usually a dot or circle), representing each category and plotted
positionally along quantitative x- and y-axes. Sometimes colour is used
to distinguish categorical dimensions across all the points. Scatter plots
do not work too well if one or both of the quantitative measures has
limited variation in value as this especially causes problems of
‘occlusion’, whereby multiple instances of the similar values are plotted
on top of each other and essentially hidden from the reader.
EXAMPLE Exploring the relationship between life expectancy and the
percentage of healthy years across all countries.
Figure 6.30 How Long Will We Live — And How Well?
HOW TO READ IT & WHAT TO LOOK FOR
Learn what each quantitative axis relates to and make a note of the
range of values in each case (min to max). Look at what category or
observation each plotted value on the chart refers to and look up any
colour associations being used for categorical distinction. Scan the chart
looking for the existence of any diagonal trends that might suggest a
295
linear correlation between the variables, or note the complete absence
of any pattern, to mean no correlation. Annotations will often assist in
determining the significance of any patterns like this. Identify any
clusters of points and also look at the gaps, which can be just as
revealing. Some of the most interesting observations come from
individual outliers standing out separately from others. Look out for any
patterns formed by points with similar categorical colour. One approach
to reading the ‘meaning’ of the plotted positions involves trying to
break down the chart area into a 2 × 2 grid translating what marks
positioned in those general areas might mean – which corner is ‘good’
or ‘bad’ to be located in? Remember that ruling out significant
relationships can be just as useful as ruling them in.
PRESENTATION TIPS
ANNOTATION: Gridlines can be useful to help make the value
estimates clearer and reference lines (such as a trend line of best fit)
might aid interpretation. It is usually hard to make direct labelling of all
values work well. Firstly, it can be tricky making it clear which value
relates to which point, especially when several points may be clustered
together. Secondly, it creates a lot of visual clutter. Labelling choices
should be based on values that are of most interest to include editorially
unless interactive features enable annotations to be revealed through
selection or mouseover events. If possible, you might consider putting a
number inside the marker to indicate a count of the number of points at
the same position if this occurs.
COLOUR: If colours are being used to distinguish the different
categories, ensure these are as visibly different as possible. On the
occasion where multiple values may be plotted close to or on top of
each other, you might need to use semi-transparency to enable
overlapping of points to build up a recognisably darker colour
compared to other points, indicating an underlying stack of values at the
same location on the chart.
COMPOSITION: As the encoding of the plotted point values is based
on position along an axis, it is not necessary to start the axes from a
zero baseline, so just make the scale ranges as representative as possible
of the range of values being plotted. Ideally a scatter plot will have a
1:1 aspect ratio (equally as tall as it is wide), creating a squared area to
help patterns surface more evidently. If one quantitative variable (e.g.
weight) is likely to be affected by the other variable (e.g. height), it is
general practice to place the former on the y-axis and the latter on the x-
axis. If you have to use a logarithmic quantitative scale on either or both
296
axes, you need to make this clear to readers so they avoid making
incorrect conclusions from the resulting patterns (that might imply
correlation if the values were linear, for example).
VARIATIONS & ALTERNATIVES
A ‘ternary plot’ is a variation of the scatter plot through the inclusion of
a third quantitative variable axis. The ‘bubble plot’ also incorporates a
third quantitative variable, this time through encoding the size of a
geometric shape (replacing the point marker). A ‘scatter plot matrix’
involves a single view of multiple scatter plots presenting different
combinations of plotted quantitative variables, used to explore possible
relationships among larger multivariate datasets. A ‘connected scatter
plot’ compares the shifting state of two quantitative measures over time.
Charts Correlations
Bubble plot
ALSO KNOWN AS Bubble chart
REPRESENTATION DESCRIPTION
A bubble plot displays the relationship between three quantitative
measures for different categories. Bubble plots are used visually to
explore the potential existence, extent or absence of a significant
relationship between the plotted variables. In contrast to the scatter plot,
the bubble plot plots proportionally sized circular areas, for each
category, across two quantitative axes with the size representing a third
quantitative measure. Sometimes colour is used to distinguish
categorical dimensions across all the shapes.
EXAMPLE Exploring the relationship between rates of murders,
burglaries (per 100,000 population) and population across states of the
USA.
297
Figure 6.31 Crime Rates by State
HOW TO READ IT & WHAT TO LOOK FOR
Learn what each quantitative axis relates to and make a note of the
range of values in each case (min to max). Look at what category or
observation each plotted value on the chart refers to. Establish the
quantitative size associations for the bubble areas and look up any
colour associations being used for categorical distinction. Scan the chart
looking for the existence of any diagonal trends that might suggest a
linear correlation between the variables, or note the complete absence
of any pattern, to mean no correlation. Annotations will often assist in
determining the significance of any patterns like this. Identify any
clusters of points and also look at the gaps, which can be just as
revealing. Some of the most interesting observations come from
individual outliers standing out separately from others. Look out for any
patterns formed by points with similar categorical colour. What can you
learn about the distribution of small, medium or large circles: are they
clustered together in similar regions of the chart or quite randomly
scattered? One approach to reading the ‘meaning’ of the plotted
positions involves trying to break down the chart area into a 2 × 2 grid
translating what marks positioned in those general areas might mean –
which corner is ‘good’ or ‘bad’ to be located in? Remember that ruling
out significant relationships can be just as useful as ruling them in.
298
Estimating and comparing the size of areas is not as easy as it is for
judging bar length or dot position. This means that the use of this chart
type will primarily be about facilitating a gist – a general sense of the
hierarchy of the largest and smallest values.
PRESENTATION TIPS
ANNOTATION: Gridlines can be useful to help make the value
estimates clearer and reference lines (such as a trend line of best fit)
might aid interpretation. It is usually hard to make direct labelling of all
values work well. Firstly, it can be tricky making it clear which value
relates to which point, especially when several points may be clustered
together. Secondly, it creates a lot of visual clutter. Labelling choices
should be based on values that are of most interest to include editorially
unless interactive features enable annotations to be revealed through
selection or mouseover events.
COLOUR: If colours are being used to distinguish the different
categories, ensure these are as visibly different as possible. When a
circle has a large value its size will often overlap in spatial terms with
other values. The use of outline borders and semi-transparent colours
helps with the task of avoiding occlusion (visually hiding values behind
others).
COMPOSITION: As the encoding of the plotted area marker values is
based on position along an axis, it is not necessary to start the axes from
a zero baseline – just make the scale ranges as representative as possible
of the range of values being plotted. Make sensible decisions about how
large to make the maximum bubble size; this will usually require trial
and error experimentation to find the right balance. Ideally a bubble plot
will have a 1:1 aspect ratio (equally as tall as it is wide), creating a
squared area to help patterns surface more evidently. If one quantitative
variable (e.g. weight) is likely to be affected by the other variable (e.g.
height), it is general practice to place the former on the y-axis and the
latter on the x-axis. Geometric accuracy of the circle size calculations is
paramount, since mistakes are often made with circle size calculations:
it is the area you are modifying, not the diameter/radius. If you wish to
make your bubbles appear as 3D spheres you are essentially no longer
representing quantitative values through the size of a geometric area
mark, rather the mark will be a ‘form’ and so the size calculation will
be based on volume, not area.
VARIATIONS & ALTERNATIVES
If the third quantitative variable is removed, the display would just
299
become a ‘scatter plot’. Variations on the bubble plot might see the use
of different geometric areas as the markers, maybe introducing extra
meaning from the underlying data through the shape, size and
dimensions used.
Charts Correlations
Parallel coordinates
ALSO KNOWN AS Parallel sets
REPRESENTATION DESCRIPTION
Parallel coordinates display multiple quantitative measures for different
categories in a single display. They are used visually to explore the
relationships and characteristics of multi-dimensional, multivariate data.
Parallel coordinates are based on a series of parallel axes representing
different quantitative measures with independent axis scales. The
quantitative values for each measure are plotted and then connected to
form a single line. Each connected line represents a different category
record. Colour may be used to differentiate further categorical
dimensions. As more data is added the collective ’shape’ of the data
emerges and helps to inform the possibility of relationships existing
among the different measures. Parallel coordinates look quite
overwhelming but remember that they are almost always only used to
assist in exploratory work of large and varied datasets, more so than
being used for explanatory presentations of data. Generally the greater
the number of measures, the more difficult the task of making sense of
the underlying patterns will be, so be discerning in your choice of
which variables to include. This method does not work for showing
categorical (nominal) measures nor does it really offer value with the
inclusion of low-range, discrete quantitative variables used (e.g. number
of legs per human). Patterns will mean very little when intersecting with
such axes (they may be better deployed as a filtering parameter or a
coloured categorical separator).
300
EXAMPLE Exploring the relationship between nutrient contents for 14
different attributes across 1,153 different items of food.
Figure 6.32 Nutrient Contents — Parallel Coordinates
HOW TO READ IT & WHAT TO LOOK FOR
Look around the chart and acquaint yourself with what each
quantitative measure axis represents. Also note what kind of sequencing
of measure has been used: are neighbouring measures significantly
paired? Note the range of values along each independent axis so you
understand what positions along the scales represent and can determine
what higher and lower positions mean. If colour has been used to group
related records then identify what these represent. Scan the overall mass
of lines to identify any major patterns. Study the patterns in the space
between each pair of adjacent axes. This is where you will really see the
potential presence or absence of, and nature of, relationships between
measures. The main patterns to identify involve the presence of parallel
lines (showing consistent relationships), lines converging in similar
directions (some correlation) and then complete criss-crossing (negative
relationship). Look out for any associations in the patterns across colour
groupings. Remember that ruling out significant relationships can be
just as useful as ruling them in.
PRESENTATION TIPS
INTERACTIVITY: Parallel coordinates are particularly useful when
offered with interactive features, such as filtering techniques, enabling
the user to interrogate and manipulate the display to facilitate visual
exploration. Additionally, the option to rearrange the sequence of the
measures can be especially useful.
301
ANNOTATION: The inclusion of visible annotated features like axis
lines, tick marks, gridlines and value labels can naturally aid the
readability of the data but be aware of the impact of clutter.
COLOUR: When you are plotting large quantities of records,
inevitably there will be over-plotting and this might disguise the real
weight of values, so the variation in the darkness of colour can be used
to establish density of observations.
COMPOSITION: The ordering of the quantitative variables has to be
of optimum significance as the connections between adjacent axes will
offer the main way of seeing the local relationships: the patterns will
change for every different ordering permutation. Remember that the
line directions connecting records are often inconsequential in their
meaning unless neighbouring measures have a common scale and
similar meaning: the connections are more about establishing
commonality of pattern across records, rather than there being anything
too significant behind the absolute slope direction/length.
VARIATIONS & ALTERNATIVES
The ‘radar chart’ has similarities with parallel coordinates in that they
include several independent quantitative measures in the same chart but
on a radial layout and usually only showing data for one record in the
same display. A variation on the parallel coordinate would be the
‘Sankey diagram’, which displays categorical composition and
quantitative flows between different categorical dimensions or ‘stages’.
Charts Correlations
Heat map
ALSO KNOWN AS Matrix chart, mosaic plot
REPRESENTATION DESCRIPTION
302
A heat map displays quantitative values at the intersection between two
categorical dimensions. The chart comprises two categorical axes with
each possible value presented across the row and column headers of a
table layout. Each corresponding cell is then colour-coded to represent a
quantitative value for each combination of category pairing. It is not
easy for the eye to determine the exact quantitative values represented
by the colours, even if there is a colour scale provided, so heat maps
mainly facilitate a gist of the order of magnitude.
EXAMPLE Exploring the connections between different Avengers
characters appearing in the same Marvel comic book titles between
1963 and 2015.
Figure 6.33 How the ‘Avengers’ Line-up Has Changed Over the Years
HOW TO READ IT & WHAT TO LOOK FOR
Learn what each categorical dimension relates to and make a note of the
range of values in each case, paying attention to the significance of any
ordering. Establish the quantitative value associations for the colour
scales, usually found via a legend. Glance across the entire chart to
locate the big, small and medium shades (generally darker = larger) and
perform global comparisons to establish the high-level ranking of
biggest > smallest. Scan across each row and/or column to see if there
are specific patterns associated with either set of categories. Identify
any noticeable exceptions and/or outliers. Perform local comparisons
between neighbouring cell’s areas, to identify larger than and smaller
303
than relationships and estimate the relative proportions. Estimate (or
read, if labels are present) the absolute values of specific colour scales
of interest.
PRESENTATION TIPS
ANNOTATION: Direct value labelling is possible, otherwise a clear
legend to indicate colour associations will suffice.
COLOUR: Sometimes multiple different colour hues may be used to
subdivide the quantitative values into further distinct categorical
groups. Decisions about how many colour-scale levels and what
intervals each relates to in value ranges will affect the patterns that
emerge. There is no single right answer – you will arrive at it largely
through trial and error/experimentation – but it is important to consider,
especially when you have a diverse distribution of values.
COMPOSITION: Logical sorting (and maybe even sub-grouping) of
the categorical values along each axis will aid readability and may help
surface key relationships.
VARIATIONS & ALTERNATIVES
A ‘radial heat map’ offers a structure variation whereby the table may
be portrayed using a circular layout. As with any radial display this is
only really of value if the cyclical ordering means something for the
subject matter. A variation would see colour shading replaced by a
measure of pattern density, using a scale of ‘packedness’ to indicate
increasing quantitative values. An alternative approach would be the
‘matrix chart’ using size of a shape to indicate the quantitative or a
range of point marker to display categorical characteristics.
Charts Connections
Matrix chart
304
ALSO KNOWN AS Table chart
REPRESENTATION DESCRIPTION
A matrix chart displays quantitative values at the intersection between
two categorical dimensions. The chart comprises two categorical axes
with each possible value presented across the row and column headers
of a table layout. Each corresponding cell is then marked by a
geometric shape with its area sized to represent a quantitative value and
colour often used visually to distinguish a further categorical
dimension. While they are most commonly seen using circles, you can
use other proportionally sized shapes.
EXAMPLE Exploring the perceived difficulty of fixtures across the
season for teams in the premier league 2013–14.
Figure 6.34 Interactive Fixture Molecules
HOW TO READ IT & WHAT TO LOOK FOR
Learn what each categorical dimension relates to and make a note of the
range of values in each case, paying attention to the significance of any
ordering. Establish the quantitative size associations for the area marks
and look up any colour associations being used, both usually found via
a legend. Glance across the entire chart to locate the big, small and
medium areas and perform global comparisons to establish the high-
level ranking of biggest > smallest. Scan across each row and/or column
to see if there are specific patterns associated with either set of
categories. Identify any noticeable exceptions and/or outliers. Perform
local comparisons between neighbouring circular areas, to identify
larger than and smaller than relationships and estimate the relative
proportions. Estimate (or read, if labels are present) the absolute values
305
of specific geometric areas of interest.
PRESENTATION TIPS
ANNOTATION: Direct value labelling is possible, otherwise be sure
to include a clear size legend. Normally this will be more than sufficient
as the reader may simply be looking to get a gist of the order of
magnitude.
COLOUR: If colours are being used to distinguish the different
categories, ensure these are as visibly different as possible.
COMPOSITION: If there are large outlier values there may be
occasions when the size of a few circles outgrows the cell it occupies.
You might editorially decide to allow this, as the striking shape may
create a certain impact, otherwise you will need to limit the largest
quantitative value to be represented by the maximum space available
within the table’s cell layout. Logical sorting (and maybe even sub-
grouping) of the categorical values along each axis will aid readability
and may help surface key relationships. The geometric accuracy of the
circle size calculations is paramount. Mistakes are often made with
circle size calculations: it is the area you are modifying, not the
diameter/radius.
VARIATIONS & ALTERNATIVES
A variation may be to remove the quantitative attribute of the area
marker, replacing it with a point marker to represent a categorical status
to indicate simply a yes/no observation through the presence/absence of
a point or through the quantity of points to represent a total. An
application of this might be in calendar form whereby a marker in a
date cell indicates an instance of something. It could also employ a
broader range of different categorical options; in practice any kind of
marker (symbol, colour, photograph) could be used to show a
characteristic of the relationship at each coordinate cell. An alternative
might be the ‘heat map’ which colour-codes the respective cells to
indicate a relationship based on a quantitative measure.
Charts Connections
306
Node–link diagram
ALSO KNOWN AS Network diagram, graph, hairballs
REPRESENTATION DESCRIPTION
Node–link diagrams display relationships through the connections
between categorical ‘entities’. The entry-level version of this type of
diagram displays entities as nodes (represented by point marks and
usually including a label) with links or edges (represented by lines)
depicting the existence of connections. The connecting lines will often
display an attribute of direction to indicate the influencer relationship.
In some versions a quantitative weighting is applied to the show
relationship strength, maybe through increased line width. Replacing
point marks with a geometric shape and using attributes of size and
colour is a further variation. Often the complexity seen in these displays
is merely a reflection of the underlying complexity of the subject and/or
system upon which the data is based, so oversimplifying can
compromise the essence of such content.
EXAMPLE Exploring the connections of voting patterns for
Democrats and Republicans across all members of the US House of
Representatives from 1949 to 2012.
Figure 6.35 The Rise of Partisanship and Super-cooperators in the U.S.
307
HOW TO READ IT & WHAT TO LOOK FOR
The first thing to consider is what entity each node (point or circular
area) represents and what the links mean in relationship terms. There
may be several other properties to acquaint yourself with, including
attributes like the size of the node areas, the categorical nature of
colouring, and the width and direction of the connections. Across the
graph you will mainly be seeking out the clusters that show the nodes
with the most relationships (representative of influencers or hubs) and
those without (including outliers). Small networks will generally enable
you to look closely at specific nodes and connections and easily see the
emerging relationships. When datasets are especially large, consisting
of thousands of nodes and greater numbers of mutual connections, the
displays can seem overwhelmingly cluttered and will be too dense to
make many detailed observations at node–link level. Instead, just relax
and know that your readability will be about a higher level sense-
making of the clusters/hubs and main outliers.
PRESENTATION TIPS
INTERACTIVITY: Node–link diagrams are particularly useful when
offered with interactive features, enabling the user to interrogate and
manipulate the display to facilitate visual exploration. The option to
apply filters to reduce the busy-ness of the visual and enable isolation of
individual node connections helps users to focus on specific parts of the
network of interest.
308
ANNOTATION: The extent of annotated features tends to be through
the inclusion of value labels for each entity. Accommodating the
relative word sizes on each node can be difficult to achieve with real
elegance (once again that is where interactivity adds value, through the
select/mouseover event to reveal the label).
COLOUR: Aside from the possible categorical colouring of each node,
decisions need to be made about the colour of the connecting lines,
especially on deciding how prominent these links will be in contrast to
the nodes.
COMPOSITION: Composition decisions are where most of the
presentation customisation exists. There are several common
algorithmic treatments used to compute custom arrangements to
optimise network displays, such as force-directed layouts (using the
physics of repulsion and springs to amplify relationships) and
simplifying techniques (such as edge bundling to aggregate/summarise
multiple similar links).
VARIATIONS & ALTERNATIVES
There are many derivatives of the node–link diagram, as explained,
based on variations in the use of different attributes. ‘Hive plots’ and
‘BioFabric’ offer alternative approaches based on replacing nodes with
vertices.
Charts Connections
Chord diagram
ALSO KNOWN AS Radial network diagram, arc diagram (wrongly)
REPRESENTATION DESCRIPTION
A chord diagram displays relationships through the connections
between and within categories. They are formed around a radial display
309
with different categories located around the edge: either as individual
nodes or proportionally sized segments (arcs) of the circumference
according to a part-to-whole breakdown. Emerging inwards from each
origin position are curved lines that join with other related categorical
locations around the edge. The connecting lines are normally
proportionally sized according to a quantitative measure and a
directional or influencing relationship is often indicated. The perceived
readability of the chord diagram will always be influenced by the
quantity and range of values being plotted. Small networks will enable a
reader to look closely at specific categories and their connections to see
the emerging relationships easily; larger systems will look busy through
the network of lines but they can still provide windows into complex
networks of influence. Often the complexity seen in these displays is
merely a reflection of the underlying complexity of the subject and/or
system upon which the data is based, so oversimplifying can
compromise the essence of such content.
EXAMPLE Exploring the connections of migration between and
within 10 world regions based on estimates across five-year intervals
between 1990 and 2010.
Figure 6.36 The Global Flow of People
310
HOW TO READ IT & WHAT TO LOOK FOR
First determine how categories are displayed around the circumference,
either as nodes or part-to-whole arcs, and identify each one
individually. Consider the implication of the radial sorting of these
categorical values and, if based on part-to-whole sizes, establish a sense
of the largest > smallest arc lengths. Colour-coding may be applied to
the categories so note any associations. Look inside the display to
determine what relationships the connecting lines represent and check
for any directional significance. Look closer at the tangled collection of
lines criss-crossing this space, noting the big values (usually through
line weight or width) and the small ones. Avoid being distracted by the
distance a line travels, which is just a by-product of the outer
categorical arrangement: a long connecting line is just as significant a
relationship as a short one. For this reason, pay close attention to any
connecting lines that have very short looping distances to adjacent
categories. Are there any patterns of lines heading towards or leaving
certain categories?
311
PRESENTATION TIPS
INTERACTIVITY: Chord diagrams are particularly useful when
offered with interactive features, enabling the user to interrogate and
manipulate the display to facilitate visual exploration. The option to
apply filters to reduce the busy-ness of the visual and enable isolation of
individual node connections helps users to focus on specific parts of the
network of interest.
ANNOTATION: Annotated features tend to be limited to value
labelling of the categories around the circumference and, occasionally,
directly onto the base or ends of the connecting lines (usually just those
that are large enough to accommodate them).
COLOUR: Aside from the categorical colouring of each node,
decisions need to be made about the colour of the connecting lines,
especially on deciding how prominent these links will be in contrast to
the nodes. Sometimes the connections will match the origin or
destination colours, or they will combine the two (with a start and end
colour to match the relationship).
COMPOSITION: The main arrangement decisions come through
sorting, firstly by generating as much logical meaning from the
categorical values around the edge of the circle and secondly by
deciding on the sorting of the connecting lines in the z-dimension – if
many lines are crossing, there is a need to think about which will be on
top and which will be below. Showing the direction of connections can
be difficult as there is so little room for manoeuvring many more visual
attributes, such as arrows or colour changes. One common, subtle
solution is to pull the destination join back a bit, leaving a small gap
between the connecting line and the destination arc. This then contrasts
with connecting lines that emerge directly from the categorical arcs,
showing it is their origin.
VARIATIONS & ALTERNATIVES
The main alternatives would be to consider variations of the ‘node–link
diagram’ or, specifically, the ‘arc diagram’, which offers a further
variation on the theme of networked displays, placing all the nodes
along a baseline and forming connections using semi-circular arcs,
rather than using a graph or radial layout.
Charts Connections
312
Sankey diagram
ALSO KNOWN AS Alluvial diagram
REPRESENTATION DESCRIPTION
Sankey diagrams display categorical composition and quantitative
flows between different categorical dimensions or ‘stages’. The most
common contemporary form involves a two-sided display, with each
side representing different (but related) categorical dimensions or
different states of the same dimension (such as ‘before and after’). On
each side there is effectively a stacked bar chart displaying
proportionally sized and differently coloured (or spaced apart)
constituent parts of a whole. Curved bands link each side of the display
to represent connecting categories (origin and destination) with the
proportionally sized band (its thickness) indicating the quantitative
nature of this relationship. Some variations involve multiple stages and
might present attrition through the diminution size of subsequent stacks.
Traditionally the Sankey has been used as a flow diagram to visualise
energy or material usage across engineering processes. It is closely
related to the ‘alluvial diagram’, which tends to show changes in
composition and flow over time, but the Sankey label is often applied to
these displays also.
EXAMPLE Exploring the seat changes among political parties
between the 2010 and 2015 UK General Elections.
Figure 6.37 UK Election Results by Political Party, 2010 vs 2015
313
HOW TO READ IT & WHAT TO LOOK FOR
Based on the basic two-sided version of the Sankey diagram, look down
both sides of the chart to learn what states are represented and what the
constituent categories are. Pay close attention to the categorical sorting
and pick out the large and small values on each side. Then look at the
connecting lines, making observations about the largest and narrowest
bands and noting any that seem to be mostly redistributed into a
different category compared to those that just join with the same. Notice
any small break-off bands that seem to cross the height of the whole
chart, perhaps representing a more dramatic change or diversion
between states. As with most network-type visualisations, the perceived
readability of the Sankey diagram will always be influenced by the
quantity and range of values being plotted, as well as the number of
different states presented.
PRESENTATION TIPS
INTERACTIVITY: Sankey diagrams are particularly useful when
offered with interactive features, enabling the user to interrogate and
manipulate the display to facilitate visual exploration. The option to
apply filters to reduce the busy-ness of the visual and enable isolation of
314
individual node connections helps users to focus on specific parts of the
network of interest.
ANNOTATION: Annotated features tend to be limited to value
labelling of the categories that make up each ‘state’ stack.
COLOUR: Colouring is often used visually to indicate the categories
of the connecting bands, though it can get a little complicated when
trying to combine a sense of change through an origin category colour
blending with a destination category colour when there has been a
switch.
COMPOSITION: The main arrangement decisions come through
sorting, firstly by generating as much logical meaning from the
categorical values within the stacks and, secondly, by deciding on the
sorting of the connecting lines in the z-dimension – if many lines are
crossing, there is a need to think about which will be on top and which
will be below. There is no significant difference between a landscape or
portrait layout, which will depend on the subject matter ‘fit’ and the
space within which you have to work. Try to ensure that the sorting of
the categorical dimensions is as logical and meaningful as possible.
VARIATIONS & ALTERNATIVES
The concept of a Sankey diagram showing composition and flow can
also be mapped onto a geographical projection as one of the variations
of the ‘flow map’. You could use a ‘chord diagram’ as an alternative to
show how larger networks are composed proportionally and in their
connections. Showing how component parts have changed over time
could just be displayed using a ‘stacked area chart’. A ‘funnel chart’ is a
much simplified display to show how a single value changes (usually
diminishing) across states, for topics like sales conversion. This often is
based on a funnel-like shape formed by a wide bar at the top (those
entering the system) and then gradually narrower bars, stage by stage
towards the end state.
Charts Trends
315
Line chart
ALSO KNOWN AS Fever chart, stock chart
REPRESENTATION DESCRIPTION
A line chart shows how quantitative values for different categories have
changed over time. They are typically structured around a temporal x-
axis with equal intervals from the earliest to latest point in time.
Quantitative values are plotted using joined-up lines that effectively
connect consecutive points positioned along a y-axis. The resulting
slopes formed between the two ends of each line provide an indication
of the local trends between points in time. As this sequence is extended
to plot all values across the time frame it forms an overall line
representative of the quantitative change over time story for a single
categorical value. Multiple categories can be displayed in the same
view, each represented by a unique line. Sometimes a point (circle/dot)
is also used to substantiate the visibility of individual values. The lines
used in a line chart will generally be straight. However, sometimes
curved line interpolation may be used as a method of estimating values
between known data points. This approach can be useful to help
emphasise a general trend. While this might slightly compromise the
visual accuracy of discrete values if you already have approximations,
this will have less impact.
EXAMPLE Showing changes in percentage income growth for the Top
1% and Bottom 90% of earners in the USA between 1917 and 2012.
Figure 6.38 The Fall and Rise of U.S. Inequality, in 2 Graphs
316
HOW TO READ IT & WHAT TO LOOK FOR
Firstly, learn about the axes: what is the time period range presented on
the x-axis (and in what order) and what is the range of quantitative
values shown on the y-axis, paying particular attention to the origin
value (which may not be zero)? Inside the chart, determine what
categories each line represents: for single lines this will usually be clear
from the chart title, for multiple lines you might have direct labelling or
a legend to learn colour associations. Think about what high and low
values mean: is it ‘good’ to be large/small, increasing or decreasing?
Glance at the general patterns (especially if there are many) looking for
observations such as any trends (short or long term), any sudden
moments of a rise or fall (V- or W -shapes, or inverted), any sense of
seasonal or cyclical patterns, any points of interest where lines cross
each other or key thresholds that are reached/exceeded. Can you
mentally extrapolate from the values shown any sense of a forecasted
trend? Avoid jumping to spurious interpretations if you see two line
series following a similar pattern; this does not necessarily mean that
one thing has caused the other, it might just be coincidence. Then look
more closely at categories of interest and at patterns around specific
moments in time, and pick out the peak, low, earliest and latest values
for each line. Where available, compare the changing quantities against
annotated references such as targets, forecast, previous time periods,
range bands, etc.
PRESENTATION TIPS
317
INTERACTIVITY: Interactivity may be especially helpful if you have
many categories and wish to enable the user to isolate (in focus terms) a
certain line category of interest.
ANNOTATION: Chart apparatus devices like tick marks and gridlines
in particular can be helpful to increase the accuracy of the reading of
the quantitative values. If you have axis labels you should not need
direct labels on each value point – this will be label overload. You
might choose to annotate specific values of interest (highest, lowest,
specific milestones). Think carefully about what is the most useful and
meaningful interval for your time axis labelling. When several
categories are being shown, if possible, try directly to label the
categories shown by each line, maybe at the start or end position.
COLOUR: When many categories are shown it may be that only
certain emphasised lines of interest possess a colour and a label – the
rest are left in greyscale for context.
COMPOSITION: Composition choices are mostly concerned with the
chart’s dimensions: its aspect ratio, how high and wide to make it. The
sequencing of values tends to be left to right for the sequence of the
time-based x-axis and low rising to high values on the y-axis; you will
need a good (and clearly annotated) reason to break this convention.
Line charts do not always need the y-axis to start at zero, as we are not
judging the size of a bar, rather the position along an axis. You should
expect to see a zero baseline if zero has some critical significance in the
interpretation of the trends. If your y-axis origin is not going to be zero,
you might include a small gap between the x-axis and the minimum so
that it is not implied. Be aware that the upward and downward trends on
a line chart can seem more significant if the chart width is narrow and
less significant if it is more stretched out. There is no single rule to
follow here but a useful notion involves ‘banking to 45°’ whereby the
average slope angle across your chart heads towards 45°. While it is
impractical to actually measure this, judging by eye tends to be more
than sufficient.
VARIATIONS & ALTERNATIVES
Variations of the line chart may include the ‘cumulative line chart’ or
‘step chart’. ‘Spark lines’ are mini line charts that aim to occupy almost
only a word’s length amount of space. Often seen in dashboards where
space is at a premium and there is a desire to optimise the density of the
display. ‘Bar charts’ can also be used to show how values look over
time when there is perhaps greater volatility in the quantitative values
across the time period and when the focus is on the absolute values at
318
each point in time, more so than trends. Sometimes a line chart can
show quantitative trends over continuous space rather than time. For
showing ranking over time, consider the ‘bump chart’, and for before
and after comparisons, the ‘slope graph’.
Charts Trends
Bump chart
ALSO KNOWN AS
REPRESENTATION DESCRIPTION
A bump chart shows how quantitative rankings for categories have
changed over time. They are typically structured around a temporal x-
axis with equal intervals from the earliest to latest point in time.
Quantitative rankings are plotted using joined-up lines that effectively
connect consecutive points positioned along a y-axis (typically top =
first). The resulting slopes formed between the two ends of each line
provide an indication of the local ranking trends between points in time.
As this sequence is extended to plot all values across the time frame it
forms an overall line representative of the ranking story for a single
categorical value. Multiple categories are often displayed in the same
view, showing how rankings have collectively changed over time.
Sometimes a point (circle/dot) mark is also used to substantiate the
connected visibility of category lines, as is colour (for the lines and/or
the points).
EXAMPLE Showing changes in rank of the most populated US cities
at each census between 1790 and 1890.
Figure 6.39 Census Bump: Rank of the Most Populous Cities at Each
Census, 1790—1890
319
HOW TO READ IT & WHAT TO LOOK FOR
Firstly, you need to learn about the axes. What is the time period range
presented on the x-axis (and in what order)? What are the range of
quantitative rankings shown on the y-axis (check that the ranks start at 1
from the top downwards)? Inside the chart, determine what categories
each line represents: this might be explained through direct labelling, a
colour legend, interactivity or through differentiating point marker
attributes of colour/shape/pattern. Think about what high and low ranks
mean: is it ‘good’ to be high up the rankings and is it better to be
moving up or down? Consider the general patterns to look for
observations such as consistent trends (largely parallel lines) or
completely non-relational patterns (lines moving in all directions). Are
there any prominent stories of categories that have had a sudden rise or
fall (V- or W-shapes, or inverted)? Is there any evidence of seasonal or
cyclical patterns, any key points of interest where lines cross each other
or key thresholds that are reached/exceeded? Next, look more closely at
categories of interest and at patterns around specific moments in time,
and pick out the peak, low, earliest and latest values for each line.
PRESENTATION TIPS
INTERACTIVITY: Interactivity is usually necessary with bump
charts, especially if you have many categories and wish to enable the
user to isolate (in focus terms) a certain line category of interest.
320
ANNOTATION: The ranking labels can be derived from the vertical
position along the scale so direct labelling is usually unnecessary. You
might choose to annotate specific values of interest (highest, lowest,
specific milestones). Think carefully about what is the most useful and
meaningful interval for your time axis labelling.
COLOUR: Often, with many categories to show in the same chart, the
big challenge is to distinguish each line, especially as they likely criss-
cross often with others. Using colour association can be useful for less
than 10 categories, but for more than that you really need to offer the
interactivity or maybe decide that only certain emphasised lines of
interest will possess a colour and the rest are left in greyscale for
context.
COMPOSITION: The sequencing of values tends to be left to right for
the sequence of the time-based x-axis with high rankings (low number)
on the y-axis moving downwards. You will therefore need a good (and
clearly annotated) reason to break this convention.
VARIATIONS & ALTERNATIVES
Alluvial diagrams (similar to Sankey diagrams) can show how rankings
have changed over time while also incorporating a component of
quantitative magnitude. This approach is effectively merging the ‘bump
chart’ with the ‘stacked area chart’. Consider ‘line charts’ and ‘area
charts’ if the ranking is of secondary interest to the absolute values.
Charts Trends
Slope graph chart
ALSO KNOWN AS Slope chart
REPRESENTATION DESCRIPTION
A slope graph shows a ‘before and after’ display of changes in
321
quantities for different categories. The display is based on (typically)
two parallel quantitative axes with a consistent scale range to cover all
possible quantitative values. A line is plotted for each category
connecting the two axes together with the vertical position on each axis
representing the respective quantitative values. Sometime a dot is also
used to further substantiate the visibility of the value positions. These
connecting lines form slopes that indicate the upward, downward or
stable trend between points in time. The resulting display incorporates
absolute values, reveals rank and, of course, shows change between
time. Colours are often used visually to distinguish different categorical
lines, otherwise this can be used to surface visibly the major trend states
(up, down, no change). A slope graph works less well when all values
(or the majority) are going in the same direction; consider alternatives if
this is the case.
EXAMPLE Showing changes in the share of power sources across all
US states between 2004 and 2014.
Figure 6.40 Coal, Gas, Nuclear, Hydro? How Your State Generates
Power
HOW TO READ IT & WHAT TO LOOK FOR
322
Firstly, learn about the axes: what are the two points in time being
presented and what is the possible range of quantitative values shown
on the y-axis, checking that the ranks start from the top down? Inside
the chart, learn what each category line relates to and determine what
categories each line represents: this might be explained through direct
labelling, a colour legend, or through interactivity. Think about what
upward, downward and stable trends mean: is it ‘good’ to be moving up
or down? Is it more interesting to show no change? Look at the general
patterns to observe such things as consistent trends (largely parallel
lines in either direction) or completely non-relational patterns (lines
moving in all directions). Colour may be used to accentuate the
distinction between upward and downward trends. Are there any
prominent stories of categories that have had a dramatic rise or fall?
Even if no values have dramatically altered, that in itself can be an
important finding, especially if change was expected. Next, look more
closely at categories of interest and pick out the highest and lowest
values on each side to learn about those stories. Look for the gaps
where there are no values, and at outlier values too, to see if some sit
outside the normal value clusters.
PRESENTATION TIPS
INTERACTIVITY: Depending on the number of category values
being presented, slope graphs can become quite busy, especially if there
are bunches of similar values and slope transitions. This also causes a
problem with accommodating multiple labels on the same value. On
these occasions you might find interactive slope graphs to help
filter/exclude certain values.
ANNOTATION: Labelling of each category will get busy, especially
when there are shared values, so you might choose to annotate specific
values of interest (highest, lowest, of editorial interest).
COLOUR: Often when you have many categories to show in the same
chart the big challenge is to distinguish each line, especially as they
likely criss-cross often with others. Using colour association can be
useful for less than 10 categories usually with direct labelling on the left
and/or right of the chart.
COMPOSITION: The aspect ratio of the slope graph (height and
width) will often be determined by the space you have to work with.
VARIATIONS & ALTERNATIVES
Rather than showing a before and after story, some slope graphs are
323
used to show the relationship between different quantitative measures
for linked categories. In this case the connecting line is not indicative of
a directional relationship, just the relationship itself. An alternative
option would be the ‘connected dot plot’ which can also show before
and after stories and is a better option when all values are moving in the
same direction.
Charts Trends
Connected scatter plot
ALSO KNOWN AS Trail chart
REPRESENTATION DESCRIPTION
A connected scatter plot displays the relationship between two
quantitative measures over time. The display is formed by plotting
marks like a dot or circle for each point in time at the respective
coordinates along two quantitative x- and y-axes. The collection of
individual points is then connected (think of a dot-to-dot drawing
puzzle) using lines joining each consecutive point in time to form a
sequence of change. Generally there would only be a single connected
line plotted on a chart to avoid the great visual complexity of overlaying
several in one display. However, if multiple categories are to be
included, colour is typically used to distinguish each series.
EXAMPLE Showing changes in the daily price and availability of
Super Bowl tickets on the secondary market four weeks prior to the
event across five Super Bowl finals.
Figure 6.41 Holdouts Find Cheapest Super Bowl Tickets Late in the
Game
324
HOW TO READ IT & WHAT TO LOOK FOR
Learn what each quantitative axis relates to and make a note of the
range of values in each case (min to max). Look at what each plotted
value on the chart refers to in terms of its date label and determine the
meaning of line direction. It usually helps to parse your thinking by
considering what higher/lower values mean for each quantitative axis
individually and then combining the joint meaning thereafter. Try to
follow the chart from the start to the end, mapping out in your mind the
sequence of a narrative as the values change in all directions and noting
the extreme values in the outer edges of your line’s reach. Look at the
overall pattern of the connected line: is it consistently moving in one
direction? Does it ebb and flow in all directions? Does it create a spiral
shape? Compare consecutive points for a more focused view of change
between two points.
PRESENTATION TIPS
INTERACTIVITY: The biggest challenge is making the connections
and the sequence as visible as possible. This becomes much harder
when values change very little and/or they loop back almost in spiral
325
fashion, crossing back over themselves. It is especially hard to label the
sequential time values elegantly. One option to overcome this is
through interactivity and particularly through animated sequences
which build up the display, connecting one line at a time and unveiling
the date labels as time progresses. It is often the case that only one
series will be plotted. However, interactive options may allow the user
to overlay one or more for comparison, switching them on and off as
required.
ANNOTATION: Connected scatter plots are generally seen as one of
the most complex chart types for the unfamiliar reader to work out how
to read, given the amount of different attributes working together in the
display. It is therefore vital that as much help is given to the reader as
possible with ‘how to read’ guides and illustrations of what the different
directions of change mean.
COLOUR: Colour is only generally used to accentuate certain sections
of a sequence that might represent a particularly noteworthy stage of
narrative.
COMPOSITION: As the encoding of the plotted point values is based
on position along an axis, it is not necessary to start the axes from a
zero baseline – just make the scale ranges as representative as possible
of the range of values being plotted. Ideally a connected scatter plot will
have a 1:1 aspect ratio (equally as tall as it is wide), creating a squared
area to help patterns surface more evidently. If one quantitative variable
(e.g. weight) is likely to be affected by the other variable (e.g. height), it
is general practice to place the former on the y-axis and the latter on the
x-axis.
VARIATIONS & ALTERNATIVES
The ‘comet chart’ is to the connected scatter plot what the ‘slope graph’
is to the ‘line chart’ – a summarised view of the changing relationships
across two quantitative values between just two points in time.
Naturally a reduced variation of the connected scatter plot is simply the
‘scatter plot’ where there is no time dimension or elements of
connectedness.
Charts Trends
326
Area chart
ALSO KNOWN AS
REPRESENTATION DESCRIPTION
A line chart shows how quantitative values for different categories have
changed over time. They are typically structured around a temporal x-
axis with equal intervals from the earliest to latest point in time.
Quantitative values are plotted using joined-up lines that effectively
connect consecutive points positioned along a y-axis. The resulting
slopes formed between the two ends of each line provide an indication
of the local trends between points in time. As this sequence is extended
to plot all values across the time frame it forms an overall line
representative of the quantitative change over time story for a single
categorical value. To accentuate the magnitude of the quantitative
values and the change through time the area beneath the line is filled
with colour. The height of each coloured layer at each point in time
reveals its quantity. Area charts can display values for several
categories, using stacks, to show also the changing part-to-whole
relationship.
EXAMPLE Showing changes in the average monthly price ($ per
barrel) of crude oil between 1985 and 2015.
Figure 6.42 Crude Oil Prices (West Texas Intermediate), 1985—2015
327
HOW TO READ IT & WHAT TO LOOK FOR
Firstly, learn about the axes: what is the time period range presented on
the x-axis (and in what order) and what is the range of quantitative
values shown on the y-axis, paying particular attention to whether it is a
percentage or absolute based scale? Inside the chart, determine what
categories each area layer represents: for single areas this will usually
be clear from the chart title, for multiple areas you might have direct
labelling or a nearby legend to learn colour associations. Think about
what high and low values mean: is it ‘good’ to be large/small,
increasing or decreasing? Glance at the general patterns (especially if
there are many layers), looking at the visible ‘thickness’ of the coloured
layers. At what points are the values highest or lowest? When are they
growing or shrinking as the time axis moves along? If there are multiple
categories, which ones take up the largest and smallest slices of the
overall total? Are there any trends (short or long term), any sudden
moments of a rise or fall, any sense of seasonal or cyclical patterns? If
there are multiple categories, look more closely at individual layers of
interest.
PRESENTATION TIPS
ANNOTATION: Direct labelling of quantitative values will get far too
busy so you might choose to annotate specific values of interest
(highest, lowest, specific milestones). Think about the most useful
328
interval for your axis labelling. As ever there is no single rule, so adopt
the Goldilocks principle of not too many, not too few. If you have a
stacked area chart, try directly to label the category layers shown as
closely as possible (if the heights allow it) or at least ensure any colour
associations are easily identifiable through a nearby legend. Think
carefully about what is the most useful and meaningful interval for your
time axis labelling.
COLOUR: If you are using a stacked area chart, ensure the categorical
layers have sufficiently different colours so that their distinct reading
can be efficiently performed.
COMPOSITION: Similar to the line chart, the area chart’s dimensions
should ideally utilise an aspect ratio that optimises the readability
through 45° banking (roughly judging the average slope angle). The
sequencing of values tends to be left to right for the sequence of the
time-based x-axis and low rising to high values on the y-axis; you will
need a good (and clearly annotated) reason to break this convention.
Unlike the line chart, the quantitative axis for area charts must start at
zero as it is the height of the coloured areas under each line that helps
readers to perceive the quantitative values. Do not have overlapping
categories on the same chart because it makes it very difficult to see
(imagine hills behind hills, peaking out and then hiding behind each
other). Rather than stacking categories you might consider using small
multiples, especially as this will present the respective displays from a
common baseline (and make reading sizes a little easier).
VARIATIONS & ALTERNATIVES
Like area charts, ‘alluvial diagrams’ display proportional stacked layers
for multiple categories showing the absolute value change over time.
However, they also show the evolving ranks, switching the relative
ordering of each layer of values based on the current magnitude. Some
deployments of the area chart are not plotted over time but over
continuous dimensions of space, perhaps showing the changing nature
of a given quantitative measure along a given route. When you have
many concurrent layers to show and these layers start and stop at
different times, a ‘slope graph’ is worth considering.
Charts Trends
329
Horizon chart
ALSO KNOWN AS
EXAMPLE Showing percentage changes in price for selected food
items in the USA between 1990 and 2015.
REPRESENTATION DESCRIPTION
Horizon charts show how quantitative values for different categories
have changed over time. They are valuable for showing changes over
time for multiple categories within space-constrained formats (such as
dashboards). They are structured around a series of rows each showing
changes in quantitative values for a single category. The temporal x-
axis has equal intervals from the earliest to latest point in time.
Quantitative values are plotted using joined-up lines that connect
consecutive points positioned along a value y-axis. The resulting slopes
formed between the ends of each line provide an indication of the local
trends between two points in time. As this sequence is extended to plot
all values across the time frame it forms an overall line representative of
the quantitative changes. To accentuate the magnitude of the
quantitative values the area beneath the line is filled with colour.
Negative values are highlighted in one colour, positive values in
another colour. Variations in colour lightness are used to indicate
different degrees or bands of magnitudes, with the extremes getting
darker. Negative value areas are then flipped from underneath the
baseline to above it, joining the positive values but differentiated in
their polarity by colour. Finally, like slicing off layers of a mountain,
each distinct threshold band that sits above the imposed maximum y-
axis scale is chopped off and dropped down to the baseline, in front of
its foundation base. The final effect shows overlapping layers of
increasingly darker colour-shaded areas all occupying the same vertical
space with combinations of height, colour and shade representing the
values.
Figure 6.43 Percentage Change in Price for Select Food Items, Since
330
1990
HOW TO READ IT & WHAT TO LOOK FOR
Firstly, learn about the category rows: what do they represent and in
what order are they presented? Next, the chart axes: what is the time
period range presented on the x-axis (and in what order) and what is the
range of quantitative values shown on the y-axis, paying attention to
whether it is a percentage or absolute value scale? Next, what are the
colour associations (for positive and negative values) and the different
shaded banding thresholds? Think about what high and low values
mean: is it ‘good’ to be large/small, increasing or decreasing? Glance at
the general patterns over time, looking at the most visible dark areas of
each colour polarity: where have values reached a peak in either
direction? Maybe then separate your reading between looking at the
positive value insights and then the negative ones: which chunks of
331
colour are increasing in value (darker) or shrinking (getting lighter) as
the time axis moves along? Where can you see most empty space,
indicating low values? Are there any trends (short or long term), any
sudden moments of a rise or fall, any sense of seasonal or cyclical
patterns, any points of interest where lines cross each other or key
thresholds that are reached/exceeded? Then look more closely at
categories of interest, assessing their own patterns around specific
moments in time and picking out the peak, low, earliest and latest
values for each row.
PRESENTATION TIPS
ANNOTATION: The decisions around annotations are largely reduced
to labelling the category rows. Such is the busy-ness of the chart areas
that any direct labelling is going to clutter the display too much: horizon
charts are less about precise value reading and more about getting a
sense of the main patterns, so avoid the temptation to over-label. Think
carefully about what is the most useful and meaningful interval for your
time axis labelling.
COLOUR: Colour decisions mainly concern the choices of quantitative
scale bandings to show the positive and negative value ranges.
COMPOSITION: The height of the chart area in which you can
accommodate a single row of data will have an influence on the entire
construction of the horizon chart. It will often involve an iterative/trial
and error process, looking at the range of quantitative values across
each category, establishing the most sensible and meaningful thresholds
within these range and then fixing the y-axis scales accordingly. Try to
ensure the sorting of the main categorical rows is as logical and
meaningful as possible.
VARIATIONS & ALTERNATIVES
An alternative to the horizon chart is the entry-level single category
‘area chart’, which does not suffer the same constraints of restrictions to
the vertical scale. For space-constrained displays, ‘spark lines’ would
offer an option suitable to such situations and easily accommodate
multiple category displays.
Charts Trends
332
Stream graph
ALSO KNOWN AS Theme river
REPRESENTATION DESCRIPTION
A stream graph shows how quantitative values for different categories
have changed over time. They are generally used when you have many
constituent categories at any given point in time and these categories
may start and stop at different points in time (rather than continue
throughout the presented time frame). As befitting the name, their
appearance is characterised by a flowing, organic display of meandering
layers. They are typically structured around a temporal x-axis with
equal intervals from the earliest to latest point in time. Quantitative
values are plotted using joined-up lines that effectively connect
consecutive points to quantify the height above a local baseline, which
is not a stable zero baseline but rather a shifting shape formed out of
other category layers. To accentuate the size of the category’s height at
any given point the area beneath the line is filled with colour. The
height of each coloured layer at each point in time reveals its quantity.
This colour is often used to further represent a quantitative value scale
or to associate with categorical colours. The stacking arrangement of
the different categorical streams goes above and below the central axis
line to optimise the layout but not with any implication of polarity.
EXAMPLE Showing changes in the total domestic gross takings ($US)
and the longevity of all movies released between 1986 and 2008.
Figure 6.44 The Ebb and Flow of Movies: Box Office Receipts 1986—
2008
333
HOW TO READ IT & WHAT TO LOOK FOR
Firstly, determine what is the time period presented on the x-axis (and
in what order). In most stream graphs you do not see the quantitative y-
axis scale because the level of reading is more about getting a gist for
the main patterns in a relative sense rather than an absolute one. You
might find that the colouring of layers has a quantitative scale or
categorical association so look for any keys. Also, you will often find
guides to help estimate the quantitative heights of each layer. Think
about what high and low values mean: is it ‘good’ to be large/small,
increasing or decreasing? Glance at the general patterns over time.
Remember that above or below means nothing in the sense of polarity
of values, so your focus is on the entirety of the collective shape. Look
for the largest peaks and the shallowest troughs, possible seasonal
patterns or the significant moments of change. Note where these
patterns occur in relation to the timescale. Can you see any prominently
tall (big values) or wide (long-duration) layers? Notice when layers start
and end, noting times when there are many concurrent categories and
when there are few. Pick out the layers of personal interest and assess
their patterns over time. Do not spend too much effort trying to estimate
precise values of height, but keep your focus on the bigger picture level.
It is often useful to rotate the display so the streams are travelling
vertically, offering a different perspective and removing the instinct to
see positive values above and negative values below the central axis.
PRESENTATION TIPS
334
INTERACTIVITY: If interactivity is a possibility, this could enable
selection or mouseover events to reveal annotated values at any given
point in time or to filter the view.
ANNOTATION: Chart apparatus devices are generally of limited use
in a stream graph with the priority on a general sense of pattern more
than precision value reading. Direct labelling of categories is likely to
be quite busy but may be required, at least to annotate the most
interesting patterns (highest, lowest, specific milestones). Think
carefully about what is the most useful and meaningful interval for your
time axis labelling.
COLOUR: Ensure any colour associations or size guides are easily
identifiable through a nearby legend.
COMPOSITION: Composition choices are firstly concerned with the
landscape or portrait layout. This will largely be informed by the format
and space of your outputs and the meaning of the data. The stream
layers are often smoothed, giving them an aesthetically organic
appearance, both individually and collectively. This is achieved via
curved line interpolation.
VARIATIONS & ALTERNATIVES
The fewer categorical series you have in your data, the more likely a
stacked ‘area chart’ is going to best-fit your needs. You could consider
a stacked ‘bar chart’ over time also, but there is less chance of
maintaining the connected visibility of continuous categorical series via
a singular shape.
Charts Activities
Connected timeline
ALSO KNOWN AS Relationship timeline, storyline visualisations,
335
swim-lane chart
REPRESENTATION DESCRIPTION
A connected timeline displays the duration, milestones and categorical
relationships across a range of categorical ‘activities’. It represents a
particularly diverse and creative way of showing changes over time and
so involves many variations in approach. The structure is generally
formed of time-based quantitative x-axis and categorical y-axis lanes.
Each categorical activity will commence at a point in time and from
within a vertical category ‘family’. Over time, the line will progress,
possibly switching to a different categorical lane position as the nature
of the activity alters. The lines may be of fixed width or proportionally
weighted to represent a quantitative measure. Some activity lines may
cease, restart or merge with others to build a multi-faceted narrative.
Colour can also be used to present further relevant detail. The main
issue with any connected timeline approach is simply the complexity of
the content and the number of moving parts crossing over the display.
As there are many entry points into reading such a timeline there can be
inefficiency in the reading process, but this is usually proportional
simply to the subject at hand and you may not wish to see these nuances
being removed.
EXAMPLE Showing changes in US major college football programme
allegiance to different conferences between 1965 and 2015.
Figure 6.45 Tracing the History of N.C.A.A. Conferences
336
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know what the major categorical ‘lanes’
represent and what the range of date values is (min to max). Then try to
determine what each categorical activity line represents. As there are so
many derivatives there is no single reading strategy, but generally
glance across the entire chart noting the sequence of the activities; there
is usually a sequential logic attached to their sorting based on the start
date milestone in particular. Follow the narrative from left to right,
noting observations about any big, small and medium weighted lines
and spotting any moment when they connect with, overlap or detach
from other activities. Are there any major convergences or divergences
in pattern? Any hubs of dense activity and other sparse moments? Look
for the length of lines to determine the long, medium and short
durations of activity. Where available, compare the activities against
annotated references about other key milestone dates that might hold
some significance or influence.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines
in particular can be helpful to increase the accuracy of the reading of
both the quantitative values and the activity ‘lanes’, which may be
coloured to help recognise divisions between categories. Direct
337
labelling is usually seen in these timelines to help maintain associations
across the display with the categories of characters or activities, perhaps
annotating the consequence or cause of lines merging, etc. Think
carefully about what is the most useful and meaningful interval for your
time axis labelling.
COLOUR: Even if colour does not have a direct association with given
activities, it can be a useful property to highlight certain features of the
narrative, sometimes acting as a container device to group activities
together, even if just for a momentary time period.
COMPOSITION: Where possible, try to make the categorical sorting
meaningful, maybe organising values in ascending/descending size
order. The vertical (y) or horizontal (x) sequencing of time will depend
on the amount of data to show and the space you have to work with.
Also, depending on the narrative, the past > present ordering may be
reversed.
VARIATIONS & ALTERNATIVES
There are similarities with the organic nature of the ‘alluvial diagram’,
which shows ranking and quantitative change over time for a number of
concurrent categories. When there are fewer inter-activity relationships
and more discrete categories are involved, then the ‘Gantt chart’ offers
an alternative way of showing this analysis.
Charts Activities
Gantt chart
ALSO KNOWN AS Range chart, floating bar chart
REPRESENTATION DESCRIPTION
A Gantt chart displays the start and finish points and durations for
different categorical ‘activities’. The display is commonly used in
338
project management to illustrate the breakdown of a schedule of tasks
but can be a useful device to show any data based on milestone dates
and durations. The chart is structured around a time-based quantitative
x-axis and a categorical y-axis. Each categorical activity is represented
by lines positioned according to the start moment and then stretched out
to the finish point. There may be several start/finish durations within the
same activity row. Sometimes points are used to accentuate the
start/finish positions and the line may be coloured to indicate a relevant
categorical value (e.g. separating completed vs ongoing).
EXAMPLE Showing the events of birth, death and period serving in
office for the first 44 US Presidents.
Figure 6.46 A Presidential Gantt Chart
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with what major categorical values each
Gantt bar is associated and what the range of the date values is (min to
max). Follow the narrative, noting the sequence of the categories –
there is usually a sequential sorting based on the start date milestone.
Glance across the entire chart and perform global comparisons to
establish the high-level ranking of biggest > smallest durations (based
339
on the length of the line) as well as early and late milestones. Identify
any noticeable exceptions and/or outliers. Perform local comparisons
between neighbouring bars to identify proportional differences and any
connected dependencies. Estimate (or read, if labels are present) the
absolute values for specific categories of interest. Where available,
compare the activities against annotated references about other key
milestone dates that might hold some significance or influence.
PRESENTATION TIPS
ANNOTATION: Chart apparatus devices like tick marks and gridlines
(or row band-shading) in particular can be helpful to increase the
accuracy of the reading of the start point and duration of activities along
the timeline. If you have axis labels you may not need direct labels for
the values shown with each duration bar – this will be label overload, so
generally decide between one or the other. Think carefully about what
is the most useful and meaningful interval for your time axis labelling.
COMPOSITION: There is no significant difference in perception
between vertical or horizontal Gantt charts, though horizontal layouts
are more metaphorically consistent with the concept of reading time.
Additionally, these layouts tend to make it easier to accommodate and
read the category labels. Where possible, try to sequence the categorical
‘activities’ in a way that makes for the most logical reading, either
organised by the start/finish dates or maybe the durations (depending on
which has most relevance).
VARIATIONS & ALTERNATIVES
Variations might involve the further addition of different point markers
(represented by combinations of symbols and/or colours) along each
activity row to indicate additional milestone details, using the ‘instance
chart’. An emerging trend in technique terms involves preserving the
position of activity lines adjacent to other concurrent activities, rather
than fixing them to stay within discrete rows. Sometimes there is much
more fluidity and less ‘discreteness’ in the relationships between
activity, so approaches like the ‘connected timeline’ may be more
fitting.
Charts Activities
340
Instance chart
ALSO KNOWN AS Milestone map, barcode chart, strip plot
REPRESENTATION DESCRIPTION
An instance chart displays individual moments or instances of
categorical ‘activities’. There are many variations in approach for this
kind of display but generally you will find a structure based on a time-
based quantitative x-axis and a categorical y-axis. For each categorical
activity, instances of note are represented by different point markers
that indicate along the timeline when something has happened. The
point markers may have different combinations of symbols and colours
to represent different types of occurrences, but avoid having too many
different combinations so that viewers do not have to learn an entirely
new alphabet of meaning.
EXAMPLE Showing the instances of different Avengers characters
appearing in Marvel’s comic book titles between 1963 and 2015.
Figure 6.47 How the ‘Avengers’ Line-up Has Changed Over the Years
341
HOW TO READ IT & WHAT TO LOOK FOR
Look at the axes so you know with what major categorical values each
row of instances is associated and what the range of the date values is
(min to max). Look up any legend that will explain what (if any)
associations exist between the instance markers and their
colour/symbol. Glance down the y-axis noting the sequence of the
categories; there is usually a sequential logic attached to their sorting
based on the start date milestone in particular. Follow the narrative,
noting observations about the type and frequency of instances being
plotted. Look across the entire chart to locate the headline patterns of
clustering and identify any noticeable exceptions and/or outliers. Look
across the patterns within each row individually to learn about each
category’s dispersal of instances. Look for empty regions where no
marks appear. How do all these patterns relate to the time frame
displayed? Where available, compare the activities against annotated
references about other key milestone dates that might hold some
significance or influence.
PRESENTATION TIPS
ANNOTATION: The main annotation properties will be used to serve
the role of explaining the associations between marks and attributes
342
through clear legends/keys.
COMPOSITION: Where possible, try to sequence the categorical
‘activities’ in a way that makes for the most logical reading, either
organised by the start/finish dates or maybe the durations (depending on
which has most relevance).
VARIATIONS & ALTERNATIVES
Some variations may see the size of a geometric shape used instead of
just a point to indicate also a quantitative measure to go with the
instance. The marking of an instance through a ‘when’ moment could
also be based on data that talks about positional moments within a
sequence. If the basic activity is reduced to a start/finish moment then
the ‘Gantt chart’ will be the best-fit option.
Charts Overlays
Choropleth map
ALSO KNOWN AS Heat map
REPRESENTATION DESCRIPTION
A choropleth map displays quantitative values for distinct, definable
spatial regions on a map. Each geographic region is represented by a
polygonal area based on its outline shape, with each distinct shape then
collectively arranged to form the entire landscape. (Note that most tools
for mapping have a predetermined reference between a region name and
the dimensions of the regional polygon.) Each area is colour-coded to
represent a quantitative value based on a scale with colour variation
intervals that (typically) go from a light tint for smaller values to a dark
shade for larger values. Choropleth maps should only be used when the
quantitative measure is directly associated with and continuously
relevant across the spatial region on which it will be displayed.
343
Similarly, if your quantitative measure is about or related to the
consequence of more people living in an area, interpretations may be
distorting, so consider transforming your data to per capita or per acre
(or other spatial denominator) to standardise the analysis accordingly.
EXAMPLE Mapping the percentage change in the populations of
Berlin’s districts across new and native Berliners since the fall of the
Berlin Wall.
Figure 6.48 Native and New Berliners — How the S-Bahn Ring
Divides the City
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the colour-scale value associations, usually found
via a legend. Glance across the entire chart to locate the dark, light and
medium shades (generally darker = larger) and perform global
comparisons to establish the high-level ranking of biggest values >
smallest. Identify any noticeable exceptions and/or outliers. Beware
making judgements about the significance of prominent large
344
geographical areas: size is an attribute of the underlying region, not the
significance of the measure displayed. Gradually zoom in your focus to
perform increasingly local comparisons between neighbouring regional
areas to identify any noticeable consistencies or inconsistencies
between their values. Estimate (or read, if labels are present) the
absolute values of specific regions of interest.
PRESENTATION TIPS
ANNOTATION: Directly labelling the regional areas with
geographical details and the value they hold is likely to lead to too
much clutter. You might include only a limited number of regional
labels to provide spatial context and orientation.
COLOUR: Legends explaining the colour scales should ideally be
placed as close to the map display as possible. The border colour and
stroke width for each spatial area should be distinguishable to define the
shape but not so prominent as to dominate attention – usually a subtle
grey- or white-coloured thin stroke will be fine. As well as variation in
colour scales, sometimes pattern or textures may add an extra layer of
detail to the value status of each region. When including a projected
mapping layer image in the background, ensure it is not overly
competing for visual prominence by making it light in colour and
possibly semi-transparent. Do not include any unnecessary geographical
details that add no value to the spatial orientation or interpretation and
clutter the display (e.g. roads, building structures).
COMPOSITION: With Earth being a sphere, there are many different
mapping projections for representing the regions of the world on a
plane surface. Be aware that the transformation adjustments made by
some map projections can distort the size of regions of the world,
inflating their size relative to other regions.
VARIATIONS & ALTERNATIVES
Some choropleth maps may be used to indicate categorical association
rather than quantitative measurements. Alternative thematic mapping
approaches to representing quantitative values might include the
‘proportional symbol map’ and the ‘dot density map’. This is a variation
that involves plotting a representative quantity of dots equally (but
randomly) across and within a defined spatial region. The position of
individual dots is therefore not to be read as indicative of precise
locations but used to form a measure of quantitative density. This offers
a useful alternative to the choropleth map, especially when categorical
separation of the dots through colour is of value. ‘Dasymetric mapping’
345
is similar in approach to choropleth mapping but breaks the constituent
regional areas into much more specific, almost custom-drawn, sub-
regions to better represent the realities of the distribution of human and
physical phenomena within a given spatial boundary.
Charts Overlays
Isarithmic map
ALSO KNOWN AS Contour map, isopleth map, isochrone map
REPRESENTATION DESCRIPTION
An isarithmic map displays distinct spatial surfaces on a map that share
the same quantitative classification. All spatial regions (transcending
geo-political boundaries) that share a certain quantitative value or
interval are formed by interpolated ‘isolines’ connecting points of
similar measurement to form distinct surface areas. Each area is then
colour-coded to represent the relevant quantitative value. The scale of
colour variation intervals differs between deployments but will typically
range from a light tint for smaller values to a dark shade for larger
values. An isarithmic map would be used in preference to a choropleth
map when the patterns of data being displayed transcend the distinct
regional polygons. They could be used to show temperature bandings or
smoothed regions of political attitudes.
EXAMPLE Mapping the degree of dialect similarity across the USA.
Figure 6.49 How Y’all, Youse and You Guys Talk
346
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the colour scale value associations, usually found
via a legend. Glance across the entire chart to locate the dark, light and
medium shades (generally darker = larger) and perform global
comparisons to establish the high-level ranking of biggest values >
smallest. Identify any noticeable exceptions and/or outliers, including
regions that appear in isolation from their otherwise related values and
notable for their position adjacent to very different shaded regions. Note
that any interpolation used to smooth the joins between data points to
form organic surfaces will inevitably reduce the precision of the
surfaces in their relationship to land position. Gradually zoom in your
focus to perform increasingly local comparisons between neighbouring
regional areas to identify any noticeable consistencies or inconsistencies
between their values. Estimate the absolute values of specific regions of
interest.
PRESENTATION TIPS
ANNOTATION: Directly labelling the surface areas to show the
quantitative value or range they represent will be too cluttered. You
might include only a limited number of regional labels to provide
347
spatial context and orientation.
COLOUR: Legends explaining the colour scales should ideally be
placed as close to the map display as possible. If using visible contour
or boundary lines there is a clear implication of a location being inside
or outside the line, so make these lines as prominent in colour as
possible according to the precision of their representation. If the
smoothing of the surface locations has been applied the representation
of these areas should similarly avoid looking definitive. You therefore
might consider subtle colour gradation/overlapping between different
regions to capture appropriately the underlying ‘fuzziness’ of the data.
As well as colour scales, sometimes pattern or textures may add an
extra layer of detail to the value status of each surface region. When
including a projected mapping layer image in the background, ensure it
is not overly competing for visual prominence by making it light in
colour and possibly semi-transparent. Do not include any unnecessary
geographical details that add no value to the spatial orientation or
interpretation and clutter the display (e.g. roads, building structures).
COMPOSITION: Be aware that the transformation adjustments made
by some map projections can distort the size of regions of the world,
inflating their size relative to other regions.
VARIATIONS & ALTERNATIVES
There are specific applications of isarithmic maps used for showing
elevation (‘contour maps’), atmospheric pressure (‘isopleth maps’) or
travel–time distances (‘isochrone maps’). Sometimes you might use
isarithmic maps to show a categorical status (perhaps even a binary
state) rather than a quantitative scale.
Charts Overlays
Proportional symbol map
348
ALSO KNOWN AS Graduated symbol map
REPRESENTATION DESCRIPTION
A proportional symbol map displays quantitative values for locations on
a map. The values are represented via proportionally sized areas
(usually circles), which are positioned with the centre mid-point over a
given location coordinate. Colour is sometimes used to introduce
further categorical distinction.
EXAMPLE Mapping the origin and size of funds raised across the 22
major candidates running for US President during the first half of 2015.
Figure 6.50 Here’s Exactly Where the Candidates’ Cash Came From
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the area size value associations, usually found via
a legend. Glance across the entire chart to locate the large, medium and
small shapes and perform global comparisons to establish the high-level
ranking of biggest values > smallest. Identify any noticeable exceptions
and/or outliers. Gradually zoom in your focus to perform increasingly
local comparisons between neighbouring regional areas to identify any
noticeable consistencies or inconsistencies between their values.
Estimate (or read, if labels are present) the absolute values of specific
349
regions of interest. Also note where there are no markers. If colour is
being used to further break down the categories of the values shown,
identify any grouped patterns that emerge.
PRESENTATION TIPS
INTERACTIVITY: Interaction may be helpful to reveal location and
value labels through selection or mouseover events.
ANNOTATION: Directly labelling the shapes with geographical
details and the value they hold is likely to lead to too much clutter. You
might therefore include only a limited number of regional labels to
provide spatial context and orientation. Legends explaining the size
scales – and any colour associations – should ideally be placed as close
to the map display as possible. Avoid including unnecessary
geographical details that add no value to the spatial orientation or
interpretation and clutter the display (e.g. roads, building structures).
COLOUR: Sometimes the circular shapes are filled, at other times they
remain unfilled. If colours are being used to distinguish the different
categories, ensure these are as visibly different as possible. When a
circle has a large value its shape will transgress well beyond the origin
of its geographical location, intruding on and overlapping with other
neighbouring values. The use of outline borders and semi-transparent
colours helps with the task of avoiding occlusion (visually hiding values
behind others). When including a projected mapping layer image in the
background, ensure it is not overly competing for visual prominence by
making it light in colour and possibly semi-transparent.
VARIATIONS & ALTERNATIVES
Variations may see the typical circle replaced by squares and
geographical space replaced by anatomical regions. Alternatives to the
proportional symbol map include the ‘choropleth map’, which colour-
codes regions, or the ‘dot map’, which uses a dot to represent an
instance of something. Avoid the temptation to turn the circle symbols
into pie charts; it is not a good look. If you absolutely positively have to
show a part-to-whole relationship, only show two categories, as per the
recommended practice for pies.
Charts Overlays
350
Prism map
ALSO KNOWN AS Isometric map, spike map, datascape
REPRESENTATION DESCRIPTION
A prism map displays quantitative values for locations on a map. The
values are represented via proportionally sized lines, appearing as 3D
bars, that typically cover a fixed surface area of space and are just
extended in height proportionally to represent the quantitative value for
that location. Being able to judge the dimensions of 3D forms in a 2D
view is very difficult, so they are only ever really used to create a gist of
the profile of values, enabling recognition of the main peaks in
particular.
EXAMPLE Mapping the population of trees for each 180 square km of
land across the globe.
Figure 6.51 Trillions of trees
351
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the area size value associations, usually found via
a legend. Glance across the entire chart to locate the large, medium and
small shapes and perform global comparisons to establish the high-level
ranking of biggest values > smallest. Identify any noticeable exceptions
and/or outliers. Gradually zoom in your focus to perform increasingly
local comparisons between neighbouring regional areas to identify any
noticeable consistencies or inconsistencies between their values.
Estimate (or read, if labels are present) the absolute values of specific
regions of interest. Also note where there are no bars emerging from the
surface.
PRESENTATION TIPS
352
INTERACTIVITY: Ideally prism maps would be provided with
interactive features that allow panning around the map region to offer
different viewing angles to overcome the perceptual difficulties of
judging the dimensions of 3D forms in a 2D view. Without this, smaller
values will be hidden behind the larger forms, just as smaller buildings
are hidden by skyscrapers in a city.
ANNOTATION: Directly labelling the prism shapes is infeasible – at
most you might include only a limited number of labels to provide
spatial context and orientation against the largest forms. Legends
explaining the size scales should ideally be placed as close to the map
display as possible.
COLOUR: Most tools that enable this type of mapping will likely have
visual property settings for a faux light effect, helping the physical
shapes to emerge more prominently through light and shadow. Ensure
colour assist in helping the shape of the forms to be as visible as
possible, maybe with opacity to enable smaller values to be not entirely
hidden behind any larger ones. When including a mapping layer image
on the surface, ensure it is not overly competing for visual prominence
by making it light in colour and possibly semi-transparent. Do not
include any unnecessary geographical details that add no value to the
spatial orientation or interpretation and clutter the display (e.g. roads,
building structures).
COMPOSITION: Be aware that the transformation adjustments made
by some map projections can distort the size of regions of the world,
inflating their size relative to other regions.
VARIATIONS & ALTERNATIVES
Alternatives to the prism map, especially to avoid 3D form, include the
‘proportional symbol map’, which uses proportionally sized geometric
shapes, and the ‘choropleth map’, which colour-codes regional shapes.
Charts Overlays
353
Dot map
ALSO KNOWN AS Dot distribution map, pointillist map, location
map, dot density map
REPRESENTATION DESCRIPTION
A dot map displays the geographic density and distribution of
phenomena on a map. It uses a point marker to indicate a categorical
‘observation’ at a geographical coordinate, which might be plotting
instances of people, notable sites or incidences. The point marker is
usually a filled, small dot. Colour can be used to distinguish categorical
classifications. Sometimes a dot represents a one-to-one phenomenon
(i.e. a single record at that location) and sometimes a dot will represent
one-to-many phenomena (i.e. for an aggregated statistic whereby the
location represents a logical mid-point). As the proliferation of GPS
recording devices increases, the accuracy and prevalence of detailed
location marked incidences are leading to increased potential for this
type of approach. However, think carefully about the potential
sensitivity of directly plotting a phenomenon or data incidence at a
given location.
EXAMPLE Mapping each resident of the USA based on the location at
which they were counted during the 2010 Census across different
ethnicities.
Figure 6.52 The Racial Dot Map
354
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the phenomenon that is being represented.
Establish the unit of this measure (is it a one-to-one relationship or one-
to-many?) by referring to a legend. If categorical colours have been
deployed, establish the different classifications and associations. Scan
the chart looking for the existence of noticeable clusters as well as the
widely dispersed (and maybe empty) regions. Some of the most
interesting observations come from individual outliers that stand out
separately from others. Are there any patterns between the presence of
dots and their geographical location? Are there any patterns across the
points with similar categorical colour? Gradually zoom in your focus to
perform increasingly local assessments between neighbouring regional
areas to identify any noticeable consistencies or inconsistencies
between their patterns.
PRESENTATION TIPS
INTERACTIVITY: One method for dealing with plotting high
quantities of observations is to provide interactive semantic zoom
features, whereby each time a user zooms in by one level of focus, the
unit quantity represented by each dot decreases, from a one-to-many
towards a one-to-one relationship.
355
ANNOTATION: Direct labelling is not necessary, just provide a
limited number of regional labels to offer spatial context and
orientation. Legends explaining the dot unit scale and any colour
associations should ideally be placed as close to the map display as
possible.
COLOUR: If colours are being used to distinguish the different
categories, ensure these are as visibly different as possible. When
including a mapping layer image in the background, ensure it is not
overly competing for visual prominence by making it light in colour
and possibly semi-transparent. Do not include any unnecessary
geographical details that add no value to the spatial orientation or
interpretation and clutter the display (e.g. roads, building structures).
COMPOSITION: Dot maps must always be displayed on a map that
demonstrates an equal-area projection as the precision of the plotted
locations is paramount. From a readability perspective, try to find a
balance between making the size of the dots small enough to preserve
their individuality but not too tiny to be indecipherable.
VARIATIONS & ALTERNATIVES
A ‘dot density map’ is a variation that involves plotting a representative
quantity of dots equally (but randomly) across and within a defined
spatial region. The position of individual dots is therefore not to be read
as indicative of precise locations but used to form a measure of
quantitative density. This offers a useful alternative to the choropleth
map, especially when categorical separation of the dots through colour
is of value. Plotting the location of an incidence of a phenomenon can
transcend geographical mapping to any spatial display, such as the seat
layout and availability at a theatre or on a flight, or showing the key
patterns of play across a sports pitch.
Charts Overlays
Flow map
356
ALSO KNOWN AS Connection map, route map, stream map, particle
flow map
REPRESENTATION DESCRIPTION
A flow map shows the characteristics of the movement or flow of a
phenomenon across spatial regions. It is often formed using line marks
to map flow and combinations of attributes to display the characteristics
of this flow. Examples might include the patterns of traffic and travel
across or between given routes, the dynamics of the patterns of weather,
or the movement patterns of people or animals. There is no fixed
template for a flow map but it generally displays characteristics of
origin and destination (positions on a map), route (using organic or
vector paths), direction (arrow or tapered line width), categorical
classification (colour) and some quantitative measure (line weight or
motion speed).
EXAMPLE Mapping the average number of vehicles using Hong
Kong’s main network of roads during 2011.
Figure 6.53 Arteries of the City
357
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the phenomenon that is being displayed.
Establish the association of all visible attributes to understand fully their
classification and representation, such as the use of quantitative scales
(colour, line size or width) or categorical associations (colour). Scan the
chart looking for the existence of patterns of movement, maybe through
clustering or common direction, and identify any main hubs and
densities within the network. Find the large and the small, the dense and
the sparse, and draw out any patterns formed by colour classifications.
Gradually zoom in your focus to perform increasingly local assessments
between neighbouring regional areas to identify any noticeable
consistencies or inconsistencies between their patterns.
PRESENTATION TIPS
INTERACTIVITY: Animated sequences will be invaluable to convey
motion if the nature of the flow being presented has the relevant physics
of movement.
358
ANNOTATION: Annotation needs will be unique to each approach
and the inherent complexity or otherwise of the display. Often the
general patterns may offer the sufficient level of readability without the
need for imposing amounts of value labels.
COLOUR: The colour relationship needs careful consideration to get
the right balance between the intricacies of the foreground data layer
and the background mapping layer image. Ensure the background is not
overly competing for visual prominence by making it light in colour
and possibly semi-transparent. Do not include any unnecessary
geographical details that add no value to the spatial orientation or
interpretation, but do include those features that have a direct
association with the subject matter (such as roads, routes, etc.).
COMPOSITION: Some degree of geographic distortion of routes or
connecting lines may be required practically to display flow data.
Choices like interpolation of lines to smooth an activities route or the
merging of relatively similar pathways may be entirely legitimate but
ensure that this is made clear to the reader.
VARIATIONS & ALTERNATIVES
There are naturally many variations in how you might show flow. It
generally differs between whether you are showing point A to point B
‘connection maps’, more nuanced ‘route maps’ or surface phenomena
such as ‘particle flow maps’.
Charts Distortions
Area cartogram
ALSO KNOWN AS Contiguous cartogram, density-equalizing map
EXAMPLE Mapping the measures of climate change responsibility
compared to vulnerability across all countries.
359
REPRESENTATION DESCRIPTION
An area cartogram displays the quantitative values associated with
distinct definable spatial regions on a map. Each geographic region is
represented by a polygonal area based on its outline shape with the
collective regional shapes forming the entire landscape. (Note that most
tools for mapping have a predetermined reference between a region
name and the dimensions of the regional polygon.) Quantitative values
are represented by proportionately distorting (inflating or deflating) the
relative size of and, to some degree, shape of the respective regional
areas. Traditionally, area cartograms strictly aim to preserve the
neighbourhood relationships between different regions. Colour is
sometimes used to further represent the same quantitative value or to
associate the region with a categorical classification. Area cartograms
require the reader to be relatively familiar with the original size and
shape of regions in order to be able to establish the degree of relative
change in their proportions. Without this it is almost impossible to
assess the degree of distortion and indeed to identify the regions
themselves.
Figure 6.54 The Carbon Map
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the quantitative value scales or categorical
classifications associated with the colour scale, usually found via a
legend. Glance across the entire chart to locate the big-, small- and
medium-sized shapes according to their apparent distortion. Identify
any noticeable exceptions and/or outliers. Gradually zoom in your focus
360
to perform increasingly local comparisons between neighbouring
regional areas to identify any noticeable consistencies or inconsistencies
between their values. Estimate (or read, if labels are present) the
absolute values of specific regions of interest.
PRESENTATION TIPS
INTERACTIVITY: Animated sequences enabled through interactive
controls can help to better identify instances and degrees of change but
usually only over a small set of regions and only if the change is
relatively smooth and sustained. Manual animation will help provide
more control over the experience.
ANNOTATION: Directly labelling the regional areas with
geographical details and the value they hold is likely to lead to too
much clutter. You might include only a limited number of regional
labels to provide spatial context and orientation.
COLOUR: Legends explaining any colour scales should ideally be
placed as close to the map display as possible. The border colour and
stroke width for each spatial area should be distinguishable to define the
shape but not so prominent as to dominate attention, usually a subtle
grey- or white-coloured thin stroke will be fine.
COMPOSITION: To aid the readability of the size of the distortions, it
can be useful to present a thumbnail view of the undistorted original
geographical layout to help the readers orient themselves with the
changes.
VARIATIONS & ALTERNATIVES
Unlike contiguous cartograms, non-contiguous cartograms tend to
preserve the shape of the individual polygons but modify the size and
the neighbouring connectivity to other adjacent regional polygon areas.
The best alternative ways of showing similar data would be to consider
using the ‘choropleth map’ or ‘Dorling cartogram’.
Charts Distortions
361
Dorling cartogram
ALSO KNOWN AS Demers cartogram
REPRESENTATION DESCRIPTION
A Dorling cartogram displays the quantitative values associated with
distinct, definable spatial regions on a map. Each geographic region is
represented by a circle which is proportionally sized to represent a
quantitative value. The placement of each circle generally resembles the
region’s geographic location with general preservation of
neighbourhood relationships between adjacent shapes. Colour is used to
associate the region with a categorical classification.
EXAMPLE Mapping the predicted electoral voting results for each
state in the 2012 Presidential Election.
Figure 6.55 Election Dashboard
362
HOW TO READ IT & WHAT TO LOOK FOR
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Establish the quantitative value scales or categorical
classifications associated with the colour scale, usually found via a
legend. Glance across the entire chart to locate the big-, small- and
medium-sized shapes. Identify any noticeable exceptions and/or
outliers. Gradually zoom in your focus to perform increasingly local
comparisons between neighbouring regional areas to identify any
noticeable consistencies or inconsistencies between their values.
Estimate (or read, if labels are present) the absolute values of specific
regions of interest.
PRESENTATION TIPS
INTERACTIVITY: Interactive features that enable annotation for
363
category and value labelling can be useful to overcome the difficulties
associated with the geographic distortion.
ANNOTATION: Directly labelling the shapes with geographical
details and the value they hold is common, though you might restrict
this to the circles that have sufficient size to hold such annotation.
Otherwise you will need to decide how to handle the labelling of small
values.
COLOUR: Legends explaining the size scales and colour associations
should ideally be placed as close to the map display as possible. If
colours are being used to distinguish the different categories, ensure
these are as visibly different as possible.
COMPOSITION: Remember that preserving the adjacency with
neighbouring regions is important. Dorling cartograms tend not to allow
circles to overlap or occlude, so some accommodation of large values
might result in location distortion.
VARIATIONS & ALTERNATIVES
A variation on the approach, called the ‘Demers cartogram’, involves
the use of squares or rectangles instead of circles, which offers an
alternative way of connecting adjacent shapes. Other approaches would
be through the ‘area cartogram’ and the ‘choropleth map’.
Charts Distortions
Grid map
ALSO KNOWN AS Cartogram, bin map, equal-area cartogram,
hexagon bin map
REPRESENTATION DESCRIPTION
A grid map displays the quantitative values associated with distinct
364
definable spatial regions on a map. Each geographic region (or a
statistically consistent interval of space, known as a ‘bin’) is
represented by a fixed-size uniform shape, sometimes termed a ‘tile’.
The shapes used tend to be squares or hexagons, though any tessellating
shape would work in theory in order to help arrange all the regional
tiles into a collective shape that roughly fits the real-world geographical
adjacency. Colours are applied to each regional tile either to represent a
quantitative value or to associate the region with a categorical
classification. Note that the mark used for this chart type is a point
rather than an area mark as its size attributes are constant.
EXAMPLE Showing the percentage of household waste recycled in
each council region across London between April 2013 to March 2014.
Figure 6.56 London is Rubbish at Recycling and Many Boroughs are
Getting Worse
HOW TO READ IT & WHAT TO LOOK FOR
365
Acquaint yourself with the geographic region you are presented with
and carefully consider the quantitative measure that is being
represented. Identify the general layout of the constituent tiles to
determine how good a fit they are with their adjacent regions in
absolute and relative geographical terms. Establish the categorical or
quantitative classifications associated with the colour scale, usually
found via a legend. Glance across the entire chart to locate the big,
small and medium shaded tiles (if quantitative) or the main patterns
formed by the categorical colouring. Identify any noticeable exceptions
and/or outliers. Gradually zoom in your focus to perform increasingly
local comparisons between neighbouring regional areas to identify any
noticeable consistencies or inconsistencies between their values.
Estimate (or read, if labels are present) the absolute values of specific
regions of interest.
PRESENTATION TIPS
INTERACTIVITY: Interactive features that enable annotation for
category and value labelling can be useful to overcome the difficulties
associated with the geographic distortion.
ANNOTATION: Directly labelling the shapes with geographical
details is usually too hard. Some versions of the ‘grid map’ will include
abbreviated labels, maybe two digits, to indicate the region they
represent and to aid orientation. Otherwise it may require interactivity
to facilitate such annotations. Legends explaining the colour
associations should ideally be placed as close to the map display as
possible.
COLOUR: If colour is being used to distinguish the different
categories, ensure they are as visibly different as possible.
COMPOSITION: The main challenge is to find the most appropriate
and representative tile–region relationship (what is the right amount and
geographical level for each constituent tile?) and to optimise the best-fit
collective layout that preserves as many of the legitimate neighbouring
regions as possible.
VARIATIONS & ALTERNATIVES
‘Hexagon bin maps’ are specific deployments of the grid map that offer
a layout formed by a high resolution of smaller hexagons to preserve
localised details. Beyond geographical space, the grid map approach is
applicable to any spatial analysis such as in sports.
366
6.3 Influencing Factors and Considerations
Having covered the fundamentals of visual encoding and profiled many
chart type options that deploy different encoding combinations you now
need to consider the general factors that will influence your specific
choices for which chart or charts to use for your data representation.
Choosing which chart type(s) to use is, inevitably, not a single-factor
decision. Rather, as ever with data visualisation, it is an imperfect recipe
made up of many ingredients. A pragmatic balance has to be found
somewhere between taking on board the range of influencing factors that
shape selections and not becoming frozen with indecision caused by the
burden of having to consider so many different issues.
Firstly, you need to reflect on the relevant factors that emerge from the
first three ‘preparatory’ stages of the design process and then supplement
this by addressing the guidance offered by the three visualisation design
principles introduced in Chapter 1. It must be emphasised that there are no
direct answers provided for you here, simply guidance. How you might
resolve the unique challenges posed by your project has to be something
you arrive at yourself.
Formulating Your Brief
Skills and resources, frequency: What charts can you actually make
and how efficiently can you create them? This is the big question.
Having the ability to create a broad repertoire of different chart types
is the vocabulary of this discipline, judging when to use them is the
literacy. What will have a great influence on the ambitions of the type
of charts you might employ is the ‘expressiveness’ of your abilities
and that of the technology (applications, programs, tools) you have
access to. Expressiveness is a term I first heard used in this context by
Arvind Satyanarayan, a Computer Science PhD candidate at Stanford
University. It describes the amount of variety and extent of control
you are provided with by a given technology in the construction of
your visualisation solution, so long as you also possess the necessary
skills to exploit such features, of course:
In a data representation context, maximum expressiveness means
you can create any combination of mark and attribute encoding
367
to display your data – that is, you can create many different
charts. Programming libraries like D3.js and open source tools
like R offer broad libraries of different chart options and
customisations. The drawing-by-hand nature of Adobe Illustrator
would similarly enable you to create a wide range of solutions
(though unquestionably more manual in effort and less
replicable).
Restricted expressiveness means you have much more limited
scope to adapt different mark and attribute encodings. Indeed
you might be faced with assigning data to the fixed encoding
options afforded by a modest menu of chart types. A tool like
Excel has a relatively limited range of (useful) chart types in its
menu. While there are ways of enhancing the options through
plugins and different ‘workaround’ techniques that broaden its
scope, it is a relatively limited tool. It may, however, suffice for
most people’s visualisation ambitions. Elsewhere, there are
many web-based visualisation creation tools which are of value
for those who want quick and simple charting, though they
certainly reduce the range of options and the capability to
customise their appearance.
‘The capability to cope with the technological dimension is a key
attribute of successful students: coding – more as a logic and a mindset
than a technical task – is becoming a very important asset for designers
who want to work in Data Visualization. It doesn’t necessarily mean
that you need to be able to code to find a job, but it helps a lot in the
design process. The profile in the (near) future will be a hybrid one,
mixing competences, skills and approaches currently separated into
disciplinary silos.’ Paolo Ciuccarelli, discussing students on his
Communication Design Master Programme at Politecnico di
Milano
As you reflect on the gallery of charts, my advice would be to perform an
assessment of the charts you can make using a scoring system as follows:
368
For any of the charts that fail to score 3 points, here are some
strategies to dealing with this:
Tools are continually being enhanced. The applications you use
now that cannot create, for example, a Sankey diagram, may
well offer that in the next release. So wait it out!
For those charts that currently score 1 or 0 points, look around
the web for examples of workaround approaches that will help
you achieve them. For example, you might use conditional
formatting in an Excel worksheet to create a rudimentary heat
map. This is not a chart type offered as standard within the tool
but represents an innovative solution through appropriating
existing features intended to serve other purposes. Any such
solutions, though, have to be framed by the frequency of your
work – will this work realistically need to be replicable and
repeatable (for example, every month) and does my solution
make that achievable?
Invest time in developing skills in the other tools to broaden
your repertoire. Tools like R have a large community of users
sharing code, tutorials and examples, resources that would
greatly help to facilitate your learning.
Lower your ambitions. Sometimes the most significant discipline
to demonstrate is acknowledging what you cannot do and
accepting that (at least, for now) you might need to sacrifice the
ideal choices you would make for more pragmatic ones.
Purpose: Should you even seek to represent you data in chart form? Will
it add any value, enabling new insights or greater perceptual efficiency
compared with its non-visualised form? Will portraying your data via an
elegantly presented table, offering the viewer the ability to look up and
reference values, actually offer a more suitable solution? Do not rule out
the value of a table. Additionally, perhaps you are trying to represent
something in chart form that would actually be better displayed through
369
information-based (rather than data-based) explanations using imagery,
textual anecdotes, video and photos? Most of the time the charting of data
will be fit for purpose, but just keep reminding yourself that you do not
have to chart everything – just make sure you are doing it to add value.
‘I was in the middle of this huge project, juggling as fast and as focused
as I could, and I had this idea of a set of charts stuck in my head that
kept resurfacing. And then, as we were heading close to deadline, I
realized I couldn’t do it. I failed. I couldn’t make it work. Because we
had pictures of the children, and that was enough … I had to let it go.’
Sarah Slobin, Visual Journalist, discussing a project profiling a
group of families with children who have a fatal disease
Purpose map: In defining the ‘tone’ of the project, your were determining
what the optimum perceptibility of your data would be for your audience.
Your definitions were based on whether you were aiming to facilitate the
reading of the data or more a general feeling of the data? Were you
concerned with enabling precise and accurate perceptions of values or is it
more about the sense-making of the big, medium and small judgments –
getting the ‘gist’ of values more than reading back the values? Were there
emotional qualities that you wanted to emphasise perhaps at the
compromise of perceptual efficiency? Maybe there was a balance between
the two?
How these tonal definitions apply specifically to data representation
requires our appreciation of some fundamental theory about data
visualisation. In his book Semiology Graphique, published in 1967,
Jacques Bertin was the first, most notable author to propose the idea that
different ways of encoding data might offer varying degrees of
effectiveness in perception. In 1984 William Cleveland and Robert McGill
published a seminal paper, ‘Graphical Perception: Theory,
Experimentation, and Application to the Development of Graphical
Methods’, that offered more empirical evidence of Bertin’s thoughts. They
produced a general ranking that explained which attributes used to encode
quantitative values would facilitate the highest degree of perceptual
accuracy. In 1986, Jock Mackinlay’s paper, ‘Automating the Design of
Graphical Presentations of Relational Information’, further extended this
to include proposed rankings for encoding categorical nominal and
categorical ordinal data types as well as quantitative ones. The table shown
in Figure 6.57, adapted from Mackinlay’s paper, presents the ‘Ranking of
370
Perceptual Tasks’.
In a nutshell, this ancestry of studies reveals that certain attributes used to
encode data may make it easier, and others may make it harder, to judge
accurately the values being portrayed. Let’s illustrate this with a couple of
examples. Looking at Figure 6.58, ask yourself: if A is 10, how big is B in
the respective bar and circular displays?
In both cases the answer is B = 5, but while the B ‘bar’ being 5 feels about
right, the idea that the B ‘circle’ is 5 does not feel quite right. That is
because our ability to perform relative judgements for the length of bars is
far more precise and accurate than the relative judgements for the area of
circles. This is explained by the fact that when judging the variation in size
of a line (bar) you are detecting change in a linear dimension, whereas the
variation in size of a geometric area (circle) occurs across a quadratic
dimension. If you look at the rankings in Figure 6.57 in the ‘Quantitative’
column, you will see the encoding attribute of Length is ranked higher than
the attribute of Area.
Figure 6.57 The Ranking of Perceptual Tasks
371
Figure 6.58 Comparison of Judging Line Size vs Area Size
Now let’s consider an example (Figure 6.59) that shows the relative
accuracy of using different dimensions of colour variation to represent
categorical nominal values. In the next pair of charts you can see different
attributes being used to represent the categorical groupings of the points in
the respective scatter plots. On the left you can see variation in the
attribute of colour hue (blue, orange and green) to separate the categories
visually; on the right you will see the attribute of shape (diamond, circle
and square) applied to the same category groupings. What you should be
experiencing is a far more immediate, effortless and accurate sense of the
groupings of the coloured category markers compared with the shaped
372
category markers. It is simply easier to spot the associations through
variation in colour than variation in shape. This explains why colour hue is
much higher in the proposed rankings for nominal data than shape.
Figure 6.59 Comparison of judging related items using variation in colour
(hue) vs variation in shape
So you can see from these simple demonstrations that there are clearly
ways of encoding data that will make it easier to read values accurately
and efficiently. However, as Cleveland and McGill stress in their paper,
this ranking should be taken as only one ingredient of guidance: ‘The
ordering does not result in a precise prescription for displaying data but
rather is a framework within which to work’.
This is important to note because you have to take into account other
factors. You have to decide whether precise perceiving is actually what
you need to facilitate for your readers. If you do, then the likes of the bar
chart – through the variation in length of a bar – will evidently offer a very
precise approach. As stated in Chapter 3, that is why they are such an
important part of your visual artillery.
However, sometimes getting a ‘gist’ of the data is sufficient. A few pages
ago I presented an image of a bubble chart on my website’s home page,
showing the popularity of my blog posts over the previous 100-day period.
The purpose of this display was purely to give visitors a sense of the
general order of magnitude from the most popular to the relative least
popular posts. I do not need visitors to form a precise understanding of
absolute values or exact rankings. I just want them to get a sense of the
ranking hierarchy. I can therefore justify moving down the quantitative
attribute rankings proposed and deploy a series of circles that encode the
373
visitor totals through the size of their area (colour is used to represent
different article categories). The level of perceptibility (accuracy and
efficiency) that I need to facilitate is adequately achieved by the resulting
‘frogspawn’-like display. Furthermore, it offers an appealing and varied
display that suits the purpose of this front-page navigation device.
In practice, what all this shows is that chart types vary in the relative
efficiency and accuracy of perception offered to a viewer. Moreover, many
of the charts shown in the gallery can therefore only ever facilitate a gist of
the values of data due to the complexity of their mark and attribute
combinations and the amount of data values they might typically contain
(e.g. the treemap often has many parts of a whole in a single display). It is
up to you to judge what the right threshold is for your purpose.
Working With Data
Data examination: Inevitably, the physical characteristics of your
data are especially influential. What types of data you are trying to
display will have a significant impact on how you are able to show
them. Only certain types of data can fit into certain chart types; only
certain chart types can accommodate certain types of data. That is
why it is often most useful practically to think of this task in terms of
chart types and particularly in terms of these as templates, able to
accommodate specific types of data.
For example, representing data through a bar chart requires one
categorical variable (e.g. department) and one quantitative variable
(e.g. maximum age). If you want to show a further categorical
variable (let’s say, to break down departments by gender) you are
going to need to switch ‘template’ and use something like a clustered
bar chart which can accommodate this extra dimension.
I explained earlier how the shape of data influenced the viability of
the flower metaphor used in the ‘Better Life Index’. The range of
categorical and quantitative values will certainly influence the most
appropriate chart type choice. For example, suppose you want to
show some part-to-whole analysis and you have only three parts
(three sub-categories belonging to the major category or whole) then
a treemap really does not make a great deal of sense – they are better
at representing many parts to a whole. The unloved pie chart would
probably suffice if the percentage values were quite diverse otherwise
the bar chart would be best.
374
Beyond the size and shape of your data you also might be influenced
by its inherent meaning. Sometimes, you will have scope in your
encoding choices to incorporate a certain amount of visual immediacy
in accordance with your topic. The flowers of the Better Life Index
feel consistent in metaphor with the idea of better life: the more in
bloom the flowers, the more colourful and proud each petal appears
and the better the quality of life in that country. There is a congruence
between subject matter and visual form. Think about the billionaires’
project from earlier in the chapter, with rankings displayed by
industry. Each point marking each billionaire was a small caricature
face. This is not necessary – a small circular mark for each person
would have been fine – but by using a face for the mark it creates a
more immediate recognition that the subject matter is about people.
Data exploration: One consistently useful pointer towards how you
might visually communicate your data to others is to consider which
techniques helped you to unearth key insights when you were visually
exploring the data. What chart types have you already tried out and
maybe found to reveal interesting patterns? Exploratory data analysis
is, in many ways, a bridge to visual communication: the charts you
use to inform yourself often represent prototype thinking on how you
might communicate with others. The design execution may end up
being different once you introduce the influence of audience
characteristics into your thinking, naturally, but if a method is already
working, why not utilise the same approach again?
‘Effective graphics conform to the Congruence Principle according to
which the content and format of the graphic should correspond to the
content and format of the concepts to be conveyed.’ Barbara Tversky
and Julie Bauer Morrison, taken from Animation: Can it Facilitate?
Establishing Your Editorial Thinking
Angle: When articulating the angles of analysis you intend to portray
to your viewers, you are effectively dictating which chart types might
be most relevant. If you intend to show how quantities have changed
over time, for example, there will be certain charts best placed to
portray that and many others that will not. By expressing your desired
editorial angles of analysis in language terms, this will be extremely
helpful in identifying the primary families of charts across the
375
CHRTS taxonomy that will provide the best option.
It is vital to treat every representation challenge on its own merits –
do not fall into the trap of going through the motions. Just because
you have spatial data does not mean that the most useful portrayal of
that data will be via a map. If the interesting insights are not
regionally and spatially significant, then the map may not provide the
most relevant window on that data. The composition of a map – the
shape, size and positioning of the world’s regions – is so diverse,
inconsistent and truly non-uniform that it may hinder your analysis
rather than illuminate it. So always make sure you have carefully
considered the relevance of your chosen angle through your editorial
thinking.
Trustworthy Design
Avoiding deception: In the discussion about tone I explained how
variations in the potential precision of perception may be appropriate
for the purpose and context of your work. Precision in perception is
one thing, but precision in design is another. Being truthful and
avoiding deception in how you portray data visually are fundamental
obligations.
There are many ways in which viewers can be deceived through
incorrect and inappropriate encoding choices. The main issues around
deception tend to concern encoding the size of quantities. For
beginners, these mistakes can be entirely innocent and unintended but
need to be eradicated immediately.
Geometric calculations – When using the area of shapes to
represent different quantitative values, the underlying geometry
needs to be calculated accurately. One of the common mistakes
when using circles, for example, is simply to modify the
diameters: if a quantitative value increases from 10 to 20, just
double the diameter, right? Wrong. That geometric approach
would be a mistake because, as viewers, when perceiving the
size of a circle, it is the area, not the width, of the circle upon
which we base our estimates of the quantitative value being
represented.
Figure 6.60 Illustrating the correct and incorrect circle size
encoding
376
The illustration in Figure 6.60 shows the incorrect and correct
ways of encoding two quantitative values through circle size,
where the value of A is twice the size of B. The orange circle for
B has half the diameter of A, the green circle for B has half the
area of A. The green circle area calculations are the correct way
to encode these two values, whereas the orange circle
calculations disproportionately shrink circle B by halving the
diameter rather than halving the area. This makes it appear much
smaller than its true value.
3D decoration – In the vast majority of circumstances the use of
3D charts is at best unnecessary and at worst hugely distorting in
the display of data. I have some empathy for those who might
volunteer that they have made and/or like the look of 3D charts.
In the past I did too. Sometimes we don’t know not to do
something until we are told. So this is me, here and now, telling
you.
The presence of 3D in visualisation tends to be motivated by a
desire to demonstrate technical competence with the features of
a tool in terms of ‘look how many things I know how to do with
this tool!’ (users of Excel, I am pointing an accusatory finger at
you right now). It is also driven by the appetite of rather
unsophisticated viewers who are still attracted by the apparent
novelty of 3D skeuomorphic form. (Middle and senior
management of the corporate world, with your ‘make me a fancy
chart’ commands, my finger of doom is now pointing in your
direction.)
Using psuedo-3D effects in your charts when you have only two
dimensions of data means you are simply decorating data. And
when I say ‘decorating’, I mean this with the same sneer that
377
would greet memories of avocado green bathrooms in 1970s
Britain. A 3D visualisation of 2D data is gratuitous and distorts
the viewer’s ability to read values within any degree of
acceptable accuracy. As illustrated in Figure 6.61, in perceiving
the value estimates of the angles and segments in the respective
pie charts, the 3D version makes it much harder to form accurate
judgements. The tilting of the isometric plane amplifies the front
part of the chart and diminishes the back. It also introduces a
raised ‘step’ which is purely decorative, thus embellishing the
judgement of the segment sizes.
Figure 6.61 Illustrating the Distortions Created by 3D
Decoration
Furthermore, for charts based on three dimensions of data, 3D
effects should only be considered if – and only if – the viewer is
provided with means to move around the chart object to establish
different 2D viewing angles and the collective representation of
all the 3D of data makes sense in showing a whole ‘system’.
Truncated axis scales – When quantitative values are encoded
through the height or length components of size (e.g. for bar
charts and area charts), truncating the value axis (not starting the
range of quantitative values from the true origin of zero) distorts
the size judgements. I will look at this in more detail in the
chapter on composition because it is ultimately more about the
size considerations of scales and deployment of chart apparatus
than necessarily just the representation choices.
Accessible Design
378
The bullet chart is a derivative of the bar chart – the older, more
sophisticated brother of the idiot gauge chart – but I didn’t think it was
necessary to profile as a separate chart type.
Encoded overlays: Beyond the immediate combinations of marks
and attributes that comprise a given chart type, you may find value in
incorporating additional detail to help viewers with the perceiving
and interpretation task. Encoded overlays are useful to help explain
further the context of values and amplify the interpretation of the
good and the bad, the normal and the exceptional. In some ways these
features might be considered forms of annotation, but as they
represent data values (and therefore require encoding choices) it
makes sense to locate these options within this chapter. There are
many different types of visual overlays that may be useful to include:
Figure 6.62 Example of a Bullet Chart Using Banding Overlays
Figure 6.63 Excerpt from ‘What’s Really Warming the World?’
Bandings – These are typically shaded areas that provide some
379
sense of contrast between the main data value marks and
contextual judgements of historic or expected values. In a bullet
chart (Figure 6.62) there are various shaded bands that might
help to indicate whether the bar’s value should be considered
bad, average or good. In the line chart (Figure 6.63) here you can
see the observed rise in global temperatures. To facilitate
comparison with potentially influencing factors, in the
background there is a contextual overlay showing the change in
greenhouse gases with banding to indicate the 95% confidence
interval.
Markers – Adding points to a display might be useful to show
comparison against a target, forecast, a previous value, or to
highlight actual vs budget. Figure 6.64 shows a chart that
facilitates comparisons against a maximum value marker.
Figure 6.64 Example of Using Markers Overlays
Figure 6.65 Why Is Her Paycheck Smaller?
380
Reference lines – These are useful in any display that uses
position or size along an axis as an attribute for a quantitative
value. Line charts or scatter plots (Figure 6.65) are particularly
enhanced by the inclusion of reference lines, helping to direct
the eye towards calculated trends, constants or averages and,
with scatter plots specifically, the lines of best fit or correlation.
Elegant Design
Visual appeal: This fits again with the thinking about ‘tone’ and may
also be informed by some of the mental visualisations that might have
formed in the initial stages of the process. Although you should not
allow yourself to be consumed by ideas over the influence of the data,
sometimes there is scope to squeeze out an extra sense of stylistic
association between the visual and the content. For example, the
‘pizza’ pie chart in Figure 6.66 presents analysis about the political
contributions made by companies in the pizza industry. The decision
to use pizza slices as the basis of a pie chart makes a lot of sense. The
graphic in Figure 6.67 displays the growth in online sales of razors.
Like the pizzas, the notion of creating bar charts by scraping away
lengths of shaving foam offers a clever, congruent and charming
solution.
Figure 6.66 Inside the Powerful Lobby Fighting for Your Right to Eat
381
Pizza
Figure 6.67 Excerpt from ‘Razor Sales Move Online, Away From
Gillette’
382
Summary: Data Representation
Visual Encoding All charts are based on combinations of marks and
attributes:
Marks: represent records (or aggregation of records) and can be
points, lines, areas or forms.
Attributes: represent variable values held for each record and can
include visual properties like position, size, colour, connection.
Chart Types If visual encoding is the fundamental theoretical
understanding of data representation, chart types are the practical
application. There are five families of chart types (CHRTS mnemonic):
383
Influencing Factors and Considerations
Formulating the brief: skills and resources – what charts can you
make and how efficiently? From the definitions across the ‘purpose
map’ what ‘tone’ did you determine this project might demonstrate?
Working with data: what is the shape of the data and how might that
impact on your chart design? Have you already used a chart type to
explore your data that might prove to be the best way to communicate
it to others?
Establishing your editorial thinking: what is the specific angle of the
enquiry that you want to portray visually? Is it relevant and
representative of the most interesting analysis of your data?
Trustworthy design: avoid deception through mistaken geometric
calculations, 3D decoration, truncated axis scales, corrupt charts.
Accessible design: the use of encoded overlays, such as bandings,
markers, reference lines, can aid readability and interpretation.
Elegant design: consider the scope of certain design flourishes that
might enhance the visual appeal through the form of your charts
whilst also preserving their function.
Tips and Tactics
Data is your raw material, not your ideas, so do not arrive at this stage
desperate and precious about wanting to use a certain data
representation approach.
Be led by the preparatory work (stages 1 to 3) but do use the chart
type gallery for inspiration if you need to unblock!
Be especially careful in how you think about representing instances of
zero, null (no available data) and nothing (no observation).
Do not be too proud to acknowledge when you have made a bad call
or gone down a dead end.
384
385
7 Interactivity
The advancement of technology has entirely altered the nature of how we
consume information. Whereas only a generation ago most visualisations
would have been created exclusively for printed consumption,
developments in device capability, Internet access and bandwidth
performance have created an incredibly rich environment for digital
visualisation to become the dominant output. The potential now exists for
creative and capable developers to produce powerful interactive and
engaging multimedia experiences for cross-platform consumption.
Unquestionably there is still an fundamental role for static (i.e. not
interactive) and print-only work: the scope offered by digital simply
enables you to extend your reach and broaden the possibilities. In the right
circumstances, incorporating features of interactivity into your
visualisation work offers many advantages:
It expands the physical limits of what you can show in a given space.
It increases the quantity and broadens the variety of angles of analysis
to serve different curiosities.
It facilitates manipulations of the data displayed to handle varied
interrogations.
It increases the overall control and potential customisation of the
experience.
It amplifies your creative licence and the scope for exploring different
techniques for engaging users.
The careful judgements that distinguish this visualisation design process
must be especially discerning when handling this layer of the anatomy.
Well-considered interactivity supports, in particular, the principle of
‘accessible’ design, ensuring that you are adding value to the experience,
not obstructing the facilitation of understanding. Your main concern in
considering potential interactivity is to ensure the features you deploy are
useful. This is an easy thing to say about any context but just because you
can does not mean to say you should. For some who possess a natural
technical flair, there is often too great a temptation to create interactivity
where it is neither required nor helpful.
386
Having said that, beyond the functional aspects of interactive design
thinking, depending on the nature of the project there can be value
attached to the sheer pleasure created by thoughtfully conceived
interactive features. Even if these contribute only ornamental benefit there
can be merit in creating a sense of fun and playability so long as such
features do not obstruct access to understanding.
There is a lot on your menu when it comes to considering potential
interaction design features. As before, ahead of your decision making
about what you should do, you will first consider what you could do. To
help organise your thinking, your options are divided into two main groups
of features:
Data adjustments: Affecting what data is displayed.
Presentation adjustments: Affecting how the data is displayed.
There is an ever-increasing range of interfaces to enable interaction
events beyond the mouse/touch through gesture interfaces like the
Kinect device, oculus rift, wands, control pads. These are beyond the
scope of this book but it is worth watching out for developments in the
future, especially with respect to the growing interest in exploring the
immersive potential of virtual reality (VR).
When considering potential interactive features you first need to recognise
the difference between an event, the control and the function. The event is
the input interaction (such as a click), applied to a control (maybe a button)
or element on your display, with the function being the resulting operation
that is performed (filter the data).
Where once we were limited to the mouse or the trackpad as the common
peripheral, over the past few years the emergence of touch-screens in the
shape of smartphones and tablets has introduced a whole new event
vocabulary. For the purposes of this chapter we focus on the language of
the mouse or trackpad, but here is a quick translation of the equivalent
touch events. Note that arguably the biggest difference in assigning events
to interactive data visualisations exists in the inability to register a
mouseover (or ‘hover’) action with touch-screens.
387
7.1 Features of Interactivity: Data
Adjustments
This first group of interactive features covers the various ways in which
you can enable your users to adjust and manipulate your data. Specifically,
they influence what data is displayed at a given moment.
I will temporarily switch nomenclature to ‘user’ in this chapter because
a more active role is needed than ‘viewer’.
Framing: There is only so much one can show in a single
visualisation display and thus giving users the ability to modify
criteria to customise what data is visible at any given point is a strong
advantage. Going back to the discussion on editorial thinking, in
Chapter 5, this set of adjustments would specifically concern the
‘framing’ of what data to isolate, include or exclude from view.
For those of you familiar with databases, think of this group of features
as similar in scope to modifying the criteria when querying data in a
database.
388
In ‘Gun Deaths’ (Figure 7.1), you can use the filters in the pop-up
check-box lists at the bottom to adjust the display of selected
categorical data parameters. The filtered data is then shown in
isolation above the line from all non-selected groups, which are
shown below the line. The ‘Remove filters’ link can be used to reset
the display to the original settings.
Figure 7.1 US Gun Deaths
In the bubble map view of the ‘FinViz’ stock market analysis site,
you can change the values of the handles along the axes to modify the
maximum and minimum axis range, which allows you effectively to
zoom in on the records that match this criterion. You can also select
the dropdown menus to change the variables plotted on each axis.
Notice the subtle transparency of the filter menu (in Figure 7.1) so that
it doesn’t entirely occlude the data displayed beneath.
389
Figure 7.2 FinViz: Standard and Poor’s 500 Index
Navigating: There are dynamic features that enable users to expand
or explore greater levels of detail in the displayed data. This includes
lateral movement and vertical drill-down capabilities.
You will see that many of these interactive projects include links to
share the project (or view of the project) with others via social media or
through offering code to embed work into other websites. This helps to
mobilise distribution and open up wider access to your work.
Figure 7.3 The Racial Dot Map
390
The dot map in Figure 7.3, showing the 2010 Census data, displays
population density across the USA. As a user you can use a scrollable
zoom or scaled zoom to zoom in and out of different map view levels.
The map can also be navigated laterally to explore different regions at
the same resolution.
This act of zooming to increase the magnification of the view is
known as a geometric zoom. This is considered a data adjustment
because through zooming you are effectively re-framing the window
of the included and excluded data at each level of view.
In the ‘Obesity Around the World’ visualisation (Figure 7.4),
selecting a continent connector expands the sub-category display to
show the marks for all constituent countries. Clicking on the same
connector collapses the countries to revert back to the main continent-
level view.
The ‘Social Progress Imperative’ project (Figure 7.5) provides an
example of features that enable users to view the tabulated form of
the data – the highest level of detail – by selecting the ‘Data Table’
tab. The data adjustment taking place here is through providing
access to the data in a non-visual form. Users can also export the data
391
by clicking on the relevant button to conduct further local analysis.
Figure 7.4 Obesity Around the World
Animating: Data with a temporal component often lends itself to
being portrayed via animated sequences. The data adjustment taking
place here involves the shifting nature of the timeframe in view at any
given point. Operations used to create these sequences may be
automatic and/or manual in nature.
Figure 7.5 Excerpt from ‘Social Progress Index 2015’
392
This next project (Figure 7.6) plots NFL players’ height and weight
over time using an animated heat map. When you land on the web
page the animation automatically triggers. Once completed, you can
also select the play button to recommence the animation as well as
moving the handle along the slider to manually control the sequence.
The gradual growth in the physical characteristics of players is clearly
apparent through the resulting effect.
Sequencing: In contrast to animated sequences of the same
phenomena changing over time, there are other ways in which a more
discrete sequenced experience can suit your needs. This commonly
exists by letting users navigate through predetermined, different
angles of analysis about a subject. As you navigate through the
sequence a narrative is constructed. This is a quintessential example
of storytelling with data exploring the metaphor of the anecdote: ‘this
happened’ and then ‘this happened’…
Figure 7.6 NFL Players: Height & Weight Over Time
393
The project ‘How Americans Die’ (Figure 7.7) offers a journey
through many different angles of analysis. Clicking on the series of
‘pagination’ dots and/or the navigation buttons will take you through
a pre-prepared sequence of displays to build a narrative about this
subject.
Figure 7.7 Excerpt from ‘How Americans Die’
394
Sometimes data exists in only two states: a before and after view.
Using normal animated sequences would be ineffective – too sudden
and too jumpy – so one popular technique, usually involving two
images, employs the altering of the position of a handle along a slider
to reveal/fade the respective views. This offers a more graduated
sequence between the two states and facilitates comparisons far more
effectively as exhibited by the project shown in Figure 7.8.
A different example of sequencing – and an increasingly popular
trend – is the vertical sequence. This article from the Washington Post
(Figure 7.9) profiles the beauty of baseball player Bryce Harper’s
swing and uses a very slick series of illustrations to break down four
key stages of his swing action. As you scroll down the page it acts
like a lenticular print or flip-book animation. Notice also how well
judged the styles of the illustrations are.
Figure 7.8 Model Projections of Maximum Air Temperatures Near
the Ocean and Land Surface
395
Figure 7.9 Excerpt from ‘A Swing of Beauty’
396
Contributing: So far the features covered modify the criteria of what
data is included/excluded, that then help you dive deeper into the
data, and move through sequenced views of that data. The final
component of ‘data adjustment’ concerns contributing data.
Sometimes there are projects that require user input, either for
collecting further records to append and save to an original dataset or
just for temporary (i.e. not held beyond the moment of usage)
participation. Additionally, there may be scope to invite users to
modify certain data in order to inform calculations or customise a
display. In each case, the events and controls associated with this kind
of interaction are designed to achieve one function: input data.
397
The first example ‘How well do you know your area?’ (Figure 7.10)
by ONS Digital, employs simple game/quiz dynamics to challenge
your knowledge of your local area in the UK. Using the handle to
modify the position along the slider you input a quantitative response
to the questions posed. Based on your response it then provides
feedback revealing the level of accuracy of your estimation.
Figure 7.10 How Well Do You Know Your Area?
In the next project (Figure 7.11), by entering personal details such as
your birth date, country and gender into the respective input boxes
you learn about your place in the world’s population with some rather
sobering details about your past, present and future on this planet.
Figure 7.11 Excerpt from ‘Who Old Are You?’
398
Figure 7.12 shows an excerpt from ‘512 Paths to the White House’. In
this project the toggle buttons are used to switch between three
categorical data states (unselected, Democratic and Republican) to
build up a simulated election outcome based on the user’s predictions
for the winners in each of the key swing states. As each winner is
selected, only the remaining possible pathways to victory for either
candidate are shown.
Inevitably data privacy and intended usage are key issues of concern for
any project that involves personal details being contributed, so be
careful to handle this with integrity and transparency.
Adjusting the position of the handle along the slider in the Better Life
Index project (Figure 7.13) modifies the quantitative data value
representing the weighting of importance you would attach to each
quality of life topic. In turn, this modifies the vertical positioning of
the country flowers based on the recalculated average quality of life.
Figure 7.12 512 Paths to the White House
399
Figure 7.13 OECD Better Life Index
7.2 Features of Interactivity: Presentation
Adjustments
In contrast to the features of ‘data adjustment’, this second group of
interactive features does not manipulate the data but rather lets you
configure the presentation of your data in ways that facilitate assistance
and enhance the overall experience.
400
Focusing: Whereas the ‘framing’ features outlined previously
modified what data would be included and excluded, ‘focus’ features
control what data is visually emphasised and, sometimes, how it is
emphasised. Applying such filters helps users select the values they
wish to bring to the forefront of their attention. This may be through
modifying the effect of depth through colour (foreground, mid-
ground and background) or a sorting arrangement. The main
difference with the framing features is that no data is eliminated from
the display but simply relegated in its contrasting prominence or
position.
Figure 7.14 Nobel Laureates
The example in Figure 7.14 provides a snapshot of a project which
demonstrates the use of a focus filter. It enables users to select a radio
button from the list of options to emphasise different cohorts of all
Nobel Laureates (as of 2015). As you can see the selections include
filters for women, shared winners and those who were still living at
the time. The selected Laureates are not coloured differently, rather
401
the unselected values are significantly lightened to create the contrast.
Figure 7.15 Geography of a Recession
The project shown in Figure 7.15 titled ‘Geography of a Recession’
allows users to select a link from the list of filters provided on the left
to emphasise different cohorts of counties across the USA. Once
again, the selected counties are not coloured differently here, the
unselected regions are de-emphasised by washing-out their original
shades.
Figure 7.16 How Big Will the UK Population be in 25 Years’ Time?
402
‘Brushing’ data is another technique used to apply focus filters. In
this next example (Figure 7.16), looking at the UK Census estimates
for 2011, you use the cursor to select a range of marks from within
the ‘violin plot’ display in order to view calculated statistics of those
chosen values below the chart.
The next example (Figure 7.17), portraying the increase or cuts in
Workers’ Compensation benefits by US state, demonstrates a
technique known as ‘linking’, whereby hovering over a mark in one
chart display will then highlight an associated mark in another chart
to draw attention to the relationship. In this case, hovering over a state
circle in any of the presented ‘grid maps’ highlights the same state in
the other two maps to draw your eye to their respective statuses. You
might also see this technique combined with a brushing event to
choose multiple data marks and then highlight all associations
between charts, as also demonstrated in the population ‘violin plot’ in
Figure 7.16.
403
Figure 7.17 Excerpt from ‘Workers’ Compensation Reforms by
State’
Sorting is another way of emphasising the presentation of data. In
Figure 7.18, featuring work by the Thomson Reuters graphics team,
‘ECB bank test results’, you see a tabular display with sorting
features that allow you to reorder columns of data by clicking on the
column headers. For categorical data this will sort values
alphabetically; for quantitative data, by value order. You can also
hand-pick individual records from the table to promote them to the
top of the display to facilitate easier comparisons through closer
proximity.
Linking and brushing are particularly popular approaches used for
exploratory data analysis where you might have several chart panels and
wish to see how a single record shows up within each display.
Annotating: As you saw in the previous chapter on data
representation, certain combinations of marks and attributes may only
provide viewers with a sense of the order of magnitude of the values
presented. This might be entirely consistent with the intended tone of
the project. However, with interactivity, you can at least enable
viewers to interact with marks to view more details momentarily.
404
This temporary display is especially useful because most data
representations are already so busy that permanently including certain
annotated apparatus (like value labels, gridlines, map layers) would
overly clutter the display.
Figure 7.18 Excerpt from ‘ECB Bank Test Results’
The example in Figure 7.19, profiles the use of language throughout
the history of US Presidents’ State of the Union addresses, using
circle sizes to encode the frequency of different word mentions,
giving a gist of the overall quantities and how patterns have formed
over time. By hovering over each circle you get access to a tooltip
dialogue box which reveals annotations such as the exact word-use
quantities and extra contextual commentary.
One issue to be aware of when creating pop-up tooltips is to ensure
the place they appear does not risk obstructing the view of important
data in the chart beneath. This can be especially intricate to handle
when you have a lot of annotated detail to share. One tactic is to
405
utilise otherwise-empty space on your page display, occupying it with
temporary annotated captions only when triggered by a select or
hover event from within a chart.
Orientating: A different type of interactive annotation comes in the
form of orientation devices, helping you to make better sense of your
location within a display – where you are or what values you are
looking at. Some of these functions naturally supplement features
listed in the previous section about ‘data adjustment’ specifically for
navigation support.
Figure 7.19 Excerpt from ‘History Through the President’s Words’
This snapshot, again from the ‘How Americans Die’ project (Figure
7.20), dynamically reveals the values of every mark (both x and y
values) in this line chart depending on the hover position of the
cursor. This effect is reinforced by visual guides extending out to the
axes from the current position.
406
Figure 7.20 Excerpt from ‘How Americans Die’
Figure 7.21 Twitter NYC: A Multilingual Social City
Figure 7.21 displays the language of tweets posted over a period of
time from the New York City area. Given the density and number of
data points, displaying the details of the mapping layer would be quite
cluttered, yet this detail would provide useful assistance for judging
the location of the data patterns. The effective solution employed lets
you access both views by providing an adjustable slider that allows
you to modify the transparency of the network of roads to reveal the
apparatus of the mapping layer.
Figure 7.22 Killing the Colorado: Explore the Robot River
407
Finally, as mentioned in the previous section, navigating through
digital visualisation projects increasingly uses a vertical landscape to
unfold a story (some term this ‘scrollytelling’). Navigation is often
seamlessly achieved by using the scroll wheel to move up and down
through the display. To assist with orientation, especially when you
have a limited field of view of a spatial display, a thumbnail image
might be used to show your current location within the overall
journey to give a sense of progress. The project featured in Figure
7.22 is a great example of the value of this kind of interface,
providing a deep exploration of some of the issues impacting on the
Colorado River.
7.3 Influencing Factors and Considerations
You now have a good sense of the possibilities for incorporating
interactive features into your work, so let’s turn to consider the factors that
will have most influence on which of these techniques you might need to
or choose to apply.
Formulating Your Brief
Skills and resources: Interactivity is unquestionably something that
408
many people aspire to create in their visualisation work, but it is
something greatly influenced by the skills possessed, the technology
you have access to and what they offer. These will be the factors that
ultimately shape your ambitions. Remember, even in common
desktop tools like Excel and Powerpoint, which may appear more
limited on this front, there are ways to incorporate interactive controls
(e.g. using VBA in Excel) to offer various adjustment features (e.g.
links within Powerpoint slides to create sequences and navigate to
other parts of a document).
Timescales: It goes without saying that if you have a limited
timeframe in which to complete your work, even with extensive
technical skills you are going to be rather pushed in undertaking any
particularly ambitious interactive solutions. Just because you want to
does not mean that you will be able to.
Setting: Does the setting in which the visualisation solution will be
consumed lend itself to the inclusion of an interactive element to the
experience? Will your audience have the time and know-how to take
full advantage of multi-interactive features or is it better to look to
provide a relatively simpler, singular and more immediate static
solution?
Format: What will be the intended output format that this project
needs to be created for? What device specifications will it need to
work across? How adaptable will it need to be?
The range and varied characteristics of modern devices present
visualisers (or perhaps more appropriately, at this stage, developers)
with real challenges. Getting a visualisation to work consistently,
flexibly and portably across device types, browsers and screen
dimensions (smartphone, tablet, desktop) can be something of a
nightmare. Responsive design is concerned with integrating automatic
or manually triggered modifications to the arrangement of contents
within the display and also the type and extent of interactive features
that are on offer. Your aim is to preserve as much continuity in the
core experience as possible but also ensure that the same process and
outcome of understanding can be offered to your viewers.
While the general trend across web design practice is heading towards
a mobile-first approach, for web-based data visualisation
developments there is still a strong focus on maximising the
capabilities of the desktop experience and then maybe compromising,
in some way, the richness of the mobile experience.
For ProPublica’s work on ‘Losing Ground’ (Figure 7.23), the
409
approach to cross-platform compatibility was based around the rule of
thumb ‘smallify or simplify’. Features that worked on ProPublica’s
primary platform of the desktop would have to be either simplified to
function practically on the smartphone or simply reduced in size. You
will see in the pair of contrasting images how the map display is both
shrunk and cropped, and the introductory text is stripped back to only
include the most essential information.
Figure 7.23 Losing Ground
Other format considerations include whether your solution will be
primarily intended for the Web, but will it also need to work in print?
The proverb ‘horses for courses’ comes to mind here: solutions need
to be created as fit for the format it will be consumed in. The design
features that make up an effective interactive project will unlikely
translate directly as a static, print version. You might need to pursue
two parallel solutions to suit the respective characteristics of each
output format.
Another illustration of good practice from the ‘History through the
Presidents’ words’ (Figure 7.24) includes a novel ‘Download graphic’
function which, when selected, opens up an entirely different static
graphic designed to suit a printable, pdf format.
Figure 7.24 Excerpt from ‘History Through the President’s Words’
410
Purpose map: Interactivity does not only come into your thinking
when you are seeking to create ‘Exploratory’ experiences. You may
also employ interactive features for creating ‘Explanatory’
visualisations, such as portraying analysis across discrete sequenced
411
views or interactively enabling focus filters to emphasise certain
characteristics of the data. The general position defined on the
purpose map will not singularly define the need for interactivity,
rather it will inform the type of interactivity you may seek to
incorporate to create the experience you desire.
There will also often be scope for an integrated approach whereby
you might lead with an explanatory experience based around showing
headline insights and then transitions into a more exploratory
experience through offering a set of functions to let users interrogate
data in more detail.
Working With Data
Data examination: As profiled with the functions to facilitate drill-
down navigation, one of the keen benefits of interactivity is when you
have data that is too big and too broad to show in one view. To
repeat, you can only show so much in a single-screen display. Often
you will need to slice up views across and within the various
hierarchies of your data.
One particular way the physical properties of the data will inform
your interaction design choices is with animation. To justify an
animated display over time, you will need to consider the nature of
the change that exists in your data. If your data is not changing much,
an animated sequence may simply not prove to be of value.
Conversely, if values are rapidly changing in all dimensions, an
animated experience will prove chaotic and a form of change
blindness will occur. It may be that the intention is indeed to exhibit
this chaos, but the value of animated sequences is primarily to help
reveal progressive or systematic change rather than random variation.
The speed of an animation is also a delicate matter to judge as you
seek to avoid the phenomenon of change blindness. Rapid sequences
will cause the stimulus of change to be missed; a tedious pace will
dampen the stimulus of change and key observations may be lost. The
overall duration will, of course, be informed by the range of values in
your temporal data variable. There is no right or wrong here, it is
something that you will get the best sense of by prototyping and
trialling different speeds.
Establishing Your Editorial Thinking
412
Angle, framing and focus: If you have multiple different angles of
analysis you wish to portray then these will have to be accommodated
within the space allocated. Alternatively, using interactivity, you
could provide access to them via sequenced views or menus enabling
their selection. The value of incorporating the potential features to
achieve this – and the specific range of different options you do wish
to facilitate – will be informed by the scope of the decisions you
made in the editorial thinking stages.
Thinking again about animations, you must consider whether an
animated sequence will ultimately convey the clearest answer to an
angle of interest about how something has changed over time. This
really depends on what it is you want to show: the dynamics of a
‘system’ that changes over time or a comparison between different
states over time?
The animated project in Figure 7.25 shows the progressive clearing of
snow across the streets of New York City during the blizzard of
February 2014. The steady and connected fluidity of progress of the
snow-clearing is ideally illustrated through the intervals of change
across the 24 hours shown.
Figure 7.25 Plow: Streets Cleared of Snow in New York City
413
Sometimes, you might wish to compare one moment directly against
another. With animated sequences, there is a reliance on memory to
conduct this comparison of change. However, our ability to recall is
fleeting at best and weakens the further apart (in time) the basis of the
comparison has occurred. Therefore, to facilitate such a comparison
you ideally need to juxtapose individual frames within the same view.
The most common technique used to achieve this is through small
multiples, where you repeat the same representation for each moment
in time of interest and present them collectively in the same view,
often through a grid layout. This enables far more incisive
comparisons, as you can see through ‘The Horse in Motion’ work by
Eadward Muyrbidge, which was used to learn about the galloping
form of a horse by seeing each stage of the motion through
individually framed moments.
‘Generations of masterpieces portray the legs of galloping horses
414
incorrectly. Before stop-gap photography, the complex interaction of
horses’ legs simply happened too fast to be accurately apprehended …
but in order to see the complex interaction of moving parts, you need
the motion.’ [Paraphrasing] Barbara Tversky and Julie Bauer
Morrison, taken from Animation: Can it Facilitate?
Figure 7.26 The Horse in Motion
Data Representation
Chart type choice: Some charts are inherently visually complex and
ideally need interactivity to make them more accessible and readable
for the viewer. The bump chart, chord diagram, and Sankey diagram
are just a few of the charts that are far more readable and, by
extension, usable if they can offer users the means to filter or focus on
certain selected components of the display through interactivity.
Trustworthy Design
Functional performance: Faith in the reliability, consistency and
general performance of a visualisation is something that impacts on
the perception of a project as ‘trustworthy’. Does it do what it
promises and can I trust the functions that it performs? Projects that
415
involve the collection of user-inputted data will carry extra risk
around trust: how will the data be used and stored? You need to
alleviate any such concerns upfront.
‘Confusing widgets, complex dialog boxes, hidden operations,
incomprehensible displays, or slow response times … may curtail
thorough deliberation and introduce errors.’ Jeff Heer and Ben
Schneiderman, taken from Interactive Dynamics for Visual Analysis
Accessible Design
Useful: Does it add value? Resort to interactivity only when you have
exhausted the possibility of an appropriate and effective static
solution. Do not underestimate how effective a well-conceived and
executed static presentation of data can be. This is not about holding a
draconian view about any greater merits offered by static or print
work, but instead recognising that the brilliance of interactivity is
when it introduces new means of engaging with data that simply
could not be achieved in any other way.
Unobtrusive: As with all decisions, an interactive project needs to
strive for the optimum ease of usability: minimise the friction
between the act of engaging with interactive features and the
understanding they facilitate. Do not create unnecessary obstacles that
stifle sparks of curiosity and the scent of intrigue that stirs within the
user. The main watchword here is affordance, making interactive
features seamless and either intuitive or at least efficiently
understandable.
Visual accessibility: To heighten the accessibility levels of your
work you may offer different presentations of it. For people with
visual impairments you might offer options to magnify the view of
your data and all accompanying text. For those with colour
deficiencies, as you will learn about shortly, you could offer options
to apply alternative, colour-blind friendly palettes. A further example
of this is seen with satellite navigation devices whereby the displayed
colour combinations change to better suit the surrounding lightness or
darkness at a given time of day.
Elegant Design
416
Feature creep: The discipline required to avoid feature creep is
indisputable. The gratuitous interactive operation of today is the
equivalent of the flashy, overbearing web design trends of the late
1990s and early 2000s. People were so quick and so keen to show
how competent and expressive they could be through this (relatively)
new technology that they forgot to judge if it added value.
If your audience is quite broad you may be (appropriately) inclined to
cover more combinations of features than are necessary in the hope of
responding to as many of the anticipated enquiries as well as possible
and serving the different types of viewer. Judging the degree of
flexibility is something of a balancing act within a single project: you
do not want to overwhelm the user with more adjustments than they
need, nor do you want to narrow the scope of their likely
interrogations. For a one-off project you have to form your own best
judgement; for repeatedly used projects you might have scope to
accommodate feedback and iteration.
Minimise the clicks: With visualisation you are aiming to make the
invisible (insights) visible. Conversely, to achieve elegance in design
you should be seeking to make visible design features as seamlessly
inconspicuous as possible. As Edward Tufte stated, ‘the best design is
invisible; the viewer should not see your design. They should only see
your content’.
Fun: A final alternative influence is to allow yourself room for at
least a little bit of fun. So long as the choices do not gratuitously
interrupt the primary objective of facilitating understanding, one
should not downplay the heightened pleasure that can be generated by
interactive features that might incorporate an essence of playability.
Summary: Interactivity
Data adjustments affect what data is displayed and may include the
following features:
Framing: isolate, include or exclude data.
Navigating: expand or explore greater levels of detail in the displayed
data.
Animating: portray temporal data via animated sequences.
Sequencing: navigate through discrete sequences of different angles
of analysis.
417
Contributing: customising experiences through user-inputted data.
Presentation adjustments affect how the data is displayed and may
include the following features:
Focusing: control what data is visually emphasised.
Annotating: interact with marks to bring up more detail.
Orientating: make better sense of your location within a display.
Influencing Factors and Considerations
Formulating the brief: skills and resources, timescales, setting, and
format will all influence the scope of interactivity. What experience
are you facilitating and how might interactive options help achieve
this?
Working with data: what range of data do you wish to include? Large
datasets with diverse values may need interactive features to help
users filter views and interrogate the contents.
Establishing your editorial thinking: choices made about your chosen
angle, as well as definitions for framing and focus will all influence
interactive choices, especially if users must navigate to view multiple
angles of analysis or representations portrayed through animated
sequences.
Data representation: certain chart choices may require interactivity to
enable readability.
Trustworthy design: functional performance and reliability will
substantiate the perception of trust from your users.
Accessible design: any interactive feature should prove to be useful
and unobtrusive. Interactivity can also assist with challenges around
visual accessibility.
Elegant design: beware of feature creep, minimise the clicks, but
embrace the pleasure of playability.
Tips and Tactics
Initial sketching of concepts will be worth doing first before investing
too much time jumping into prototype mode.
Project management is critical when considering the impact of
development of an interactive solution.
Backups, contingencies, version control.
418
Do not be precious about – nor overly impressed with – ‘cool’-
sounding interaction features that will disproportionately divert
precious resources (time, effort, people).
Beware of feature creep: keep focusing on what is important and
relevant. A technical achievement is great for you, but is it great for
the project?
Version control and file management will be important here.
419
8 Annotation
Annotation is the third layer of the visualisation design anatomy and is
concerned with the simple need to explain things: what is the right amount
and type of help your viewers will need when consuming the
visualisation?
Annotation is unquestionably the most often neglected layer of the
visualisation anatomy. Maybe this is because it involves the least amount
of pure design thinking relative to the other matters requiring attention,
like interactivity and colour. More likely, it is because effective annotation
requires visualisers truly to understand their intended audience. This can
be a hard frame of mind to adopt, especially when your potential viewers
are likely to have a diverse knowledge, range of interests and capability.
In contrast to the greater theoretical and technical concerns around data
representation, colour and interactivity, I find thinking about annotation
relatively refreshing. It is not only uncomplicated and based on a huge
dose of common sense, but also hugely influential, especially in directly
facilitating understanding.
Annotation choices often conform to the Goldilocks principle: too much
and the display becomes cluttered, overwhelming, and potentially
unnecessarily patronising; too little and the viewers may be
inappropriately faced with the prospect of having to find their own way
around a visualisation and form their own understanding about what it is
showing.
Later in this chapter we will look at the factors that will influence your
decision making but to begin with here is a profile of some of the key
features of annotated design that exist across two main groups:
Project annotations: helping viewers understand what the project is
about and how to use it.
Chart annotations: helping viewers perceive the charts and optimise
their potential interpretations.
420
8.1 Features of Annotation: Project
Annotation
This collection of annotation options is related to decisions about how
much and what type of help you might need to offer your audiences in
their understanding of the background, function and purpose of your
project.
Headings: The titles and subtitles occupy such prime real estate
within your project’s layout, yet more often than not visualisers fail to
exploit these to best effect. There are no universal practices for what a
heading should do or say; this will vary considerably between subject
areas and industries, but should prove fundamentally useful.
Figure 8.1 A Sample of Project Titles
The primary aim of a title (and often subtitle combination) is to
inform viewers about the immediate topic or display, giving them a
fair idea about what they are about to see. You might choose to
articulate the essence of the curiosity that has driven the project by
framing it around a question or maybe a key finding you unearthed
following the work.
Subheadings, section headings and chart titles will tend to be more
functional in their role, making clear to the viewer the contents or
focus of attention associated with each component of the display.
Your judgement surrounds the level of detail and the type of language
421
you use in each case to fit cohesively with the overall tone of the
work.
Introductions: Essentially working in conjunction with titles,
introductions typically exist as short paragraphs that explain, more
explicitly than a title can, what the project is about. The content of
this introduction might usefully explain in clear language terms some
of the components you considered during the editorial thinking
activity, such as:
details of the reason for the project (source curiosity);
an explanation of the relevance of this analysis;
a description of the analysis (angle, framing) that is presented;
expression of the main message or finding that the work is about
to reveal (possibly focus).
Some introductions will extend beyond a basic description of the project to
include thorough details of where the data comes from and how it has been
prepared and treated in advance of its analysis (including any assumptions,
modifications or potential shortcomings). There may also be further links
to ‘read more’ detail or related articles about the subject.
Figure 8.2 Excerpt from ‘The Color of Debt’
Introductions may be presented as fixed text located near the top (or
start) of a project (usually underneath a title) as in Figure 8.2 or,
through interactivity, may be hidden from view and brought up in a
separate window or pop-up to provide the details upon request.
User guides: As you have seen, some projects can incorporate many
different features of interactivity. While they may not necessarily be
overly technical – and therefore not that hard to learn how to use
422
them – the full repertoire of features may be worth walking through,
as in Figure 8.3. This is important to consider so that, as a visualiser,
you can be sure your users are acquainted with the entire array of
options they have to explore, interrogate and control their experience.
You should want people to fully utilise all the different features you
have carefully curated and created, so it is in everyone’s interest to
think about including these types of user guides.
Figure 8.3 Excerpt from ‘Kindred Britain’
423
424
Multimedia: There is increasing potential and usage of broader
media assets in visualisation design work beyond charts, such as
video and imagery. In visualisation this is perhaps a relatively
contemporary trend (infographics have incorporated such media but
visualisations generally have done so far less) and, in some ways,
reflects the ongoing blurring of boundaries between this and other
related fields. Incorporating good-quality and sympathetically styled
assets like illustrations or photo-imagery can be a valuable
complementary device alongside your data representation elements.
In the ‘Color of Debt’ (Figure 8.4) project, different neighbourhoods
of Chicago that have been hardest hit by debt are profiled using
accompanying imagery to show more graphic context of the
communities affected, including a detailed reference map of the area
and an animated panel displaying a sequence of street view images.
Imagery, in particular, will be an interesting option to consider when
it adds value to help exhibit the subject matter in tangible form,
offering an appealing visual hook to draw people in or simply to aid
immediate recognition of the topic. In Bloomberg’s billionaires
project (Figure 8.5), each billionaire is represented by a pen-and-ink
caricature. This is elegant in choice and also dodges the likely flaws
of having to compose the work around individual headshot
photographs that would have been hard to frame and colour
consistently.
Figure 8.4 Excerpt from ‘The Color of Debt’
It was worth Bloomberg investing in the time/cost involved in
commissioning these illustrations, given that the project was not a
one-off but something that would be an ongoing, updated daily
425
resource.
Problems with the integration of such media within a visualisation
project will occur when unsuitable attempts are made to combine
imagery within the framework of a chart. Often the lack of cohesion
creates a significant hindrance whereby the data representations are
obscured or generally made harder to read, as the inherent form and
colour clashes undermine the functional harmony.
Researching, curating, capturing or creating assets of imagery
requires skill and a professional approach, otherwise the resulting
effect will look amateurish. Incorporating these media into a data
visualisation is not about quickly conducting some Google Image
fishing exercise. Determining what imagery you will be able to use
involves careful considerations around image suitability, quality and,
critically, usage rights. Beware the client or colleague who thinks
otherwise.
‘Although all our projects are very much data driven, visualisation is
only part of the products and solutions we create. This day and age
provides us with amazing opportunities to combine video, animation,
visualisation, sound and interactivity. Why not make full use of this? …
Judging whether to include something or not is all about editing: asking
“is it really necessary?”. There is always an aspect of “gut feel” or
“instinct” mixed with continuous doubt that drives me in these cases.’
Thomas Clever, Co-founder CLEVER°FRANKE, a data driven
experiences studio
A frequent simple example of incorporated imagery is when you have to
include logos according to the needs of the organisation for whom your
work is being created. Remember to consider this early so you at least
know in advance that you will have to assign some space to accommodate
this component elegantly.
Footnotes: Often the final visible feature of your display, footnotes
provide a convenient place to share important details that further
substantiate the explanation of your work. Sometimes this
information might be stored within the introduction component
(especially if that is interactively hidden/revealed to allow it more
room to accommodate detail):
Data sources should be provided, ideally in close proximity to
the relevant charts.
426
Credits will list the authors and main contributors of the work,
often including the provision of contact details.
Figure 8.5 Excerpt from ‘Bloomberg Billionaires’
Attribution is also important if you wish to recognise the
influence of other people’s work in shaping your ideas or to
acknowledge the benefits of using an open source application or
free typeface, for example.
Usage information might explain the circumstances in which the
work can be viewed or reused, whether there are any
confidentialities or copyrights involved.
Time/date stamps are often forgotten but they will give an
indication to viewers of the moment of production and from that
they might be able to ascertain the work’s current accuracy and
contextualise their interpretations accordingly.
Figure 8.6 Excerpt from ‘Gender Pay Gap US’
8.2 Features of Annotation: Chart Annotation
This second group of annotated features concerns the ways you provide
viewers with specific assistance for perceiving and interpreting the charts.
Think of these as being the features that refer directly to your charts or
exist directly within or in immediate proximity to each chart.
Reading guides: These are written or visual instructions that provide
427
viewers with a guide for how to read the chart or graphic and offer
greater detailed assistance than a legend (see later). The idea of
learnability in visualisation is an important consideration. It is a two-
way commitment requiring will and effort from the viewer and
sufficient assistance from the visualiser. This is something to be
discussed in Chapter 11 under ‘Visualisation literacy’.
Recognising that their readership may not necessarily understand
connected scatter plots, Bloomberg’s visual data team offer a ‘How to
Read this Graphic’ guide immediately as you land on the project
shown in Figure 8.7. This can be closed but a permanent ‘How to’
button remains for those who may need to refer to it again. The
connected scatter plot was the right choice for this angle of analysis,
so rather than use a different ‘safer’ representation approach (and
therefore alter what analysis was shown) it is to their credit that they
respected the capacity of their viewers to be willing to learn how to
read this unfamiliar graphical form.
Figure 8.7 Excerpt from ‘Holdouts Find Cheapest Super Bowl
Tickets Late in the Game’
428
Figure 8.8 Excerpt from ‘The Life Cycle of Ideas’
429
The second example shown (Figure 8.8) is from the ‘How to Read’
guide taken from the ‘Life Cycle of Ideas’ graphic created by
Accurate, a studio renowned for innovative and expressive
visualisation work. Given the relative complexity of the encodings
used in this piece, it is necessary to equip the viewer with as much
guidance as possible to ensure its potential is fully realised.
Chart apparatus: Options for chart apparatus relate to the structural
components found in different chart types. Every visualisation
displayed in this book has different elements of chart apparatus
(Figure 8.9), specifically visible axis lines, tick marks or gridlines to
help viewers orient their judgements of size and position. There is no
right or wrong for including or excluding these features, it tends to be
informed by your tonal definitions based on how much precision in
the perceiving of values you wish to facilitate. I will discuss the range
of different structures underlying each chart type (such as Cartesian,
Radial or Spatial) in Chapter 10 on composition, as these have more
430
to do with issues of shape and dimension.
‘Labelling is the black magic of data visualization.’ Gregor Aisch,
Graphics Editor, The New York Times
Labels: There are three main labelling devices you will need to think
about using within your chart: axis titles, axis labels and value labels:
Axis titles describe what values are being referenced by each
axis. This might be a single word or a short sentence depending
on what best fits the needs of your viewers. Often the role of an
axis is already explained (or implied) by project annotations
elsewhere, such as titles or sub-headings, but do not always
assume this will be instantly clear to your viewers.
Axis labels provide value references along each axis to help
identify the categorical value or the date/quantitative value
associated with that scale position. For categorical axes (as seen
in bar charts and heat maps, for example) one of the main
judgments relates to the orientation of the label: you will need to
find sufficient room to fit the label but also preserve its
readability. For non-categorical data the main judgement will be
what scale intervals to use. This has to be a combination of what
is most useful for referencing values by your viewer, what is the
most relevant interval based on the nature of the data (e.g.
maybe a year-level label is more relevant than marking each
month), and also what feels like it achieves the best-looking
visual rhythm along the chart’s edge. This will be another matter
that is discussed more in the composition chapter.
Figure 8.9 Mizzou’s Racial Gap Is Typical On College
Campuses
431
Value labels will appear in proximity to specific mark encodings
inside the chart. Typically, these labels will be used to reveal a
quantity, such as showing the percentage sizes of the sectors in a
pie chart or the height of bars. Judging whether to include such
annotations will refer back to your definition of the appropriate
tone: will viewers need to read off exact values or will their
perceived estimates of size and/or relationship be sufficient? The
need to include categorical labels will be a concern for maps
(whether to label locations?) or charts like the scatter plot seen in
Figure 8.9, where you may wish to draw focus to a select sample
of the categories plotted across the display.
As you have seen, one way of providing detailed value labels is
432
through interaction, maybe offering a pop-up/tooltip annotation
that is triggered by a hover or click event on different mark
encodings. Having the option for interactivity here is especially
useful as it enables you to reduce clutter from your display that
can develop as more annotated detail is added.
Redundancy in labelling occurs when you include value labelling of
quantities for all marks whilst also including axis-scale labelling. You
are effectively unnecessarily doubling the assistance being offered and
so, ideally, you should choose to include one or the other.
Legend: A legend is an annotated feature within or alongside your
chart that presents one or several keys to help viewers understand the
categorical or quantitative meaning of different attributes.
Figure 8.10 Excerpt from ‘The Infographic History of the World’
For quantitative data the main role for a legend will be if the attribute
of area size has been used to encode values, as found on the bubble
plot chart type. The keys displayed there will provide a reference for
the different size scales. Which selection of sizes to show needs
careful thought: what is the most useful guide to help your viewers
make their perceptual judgements from a chart? This might not entail
showing only even interval sizes (50, 100, 150 etc.); rather, you might
offer viewers a indicative spread of sizes to best represent the
distribution of your data values. The example in Figure 8.10 shows
logical interval sizes to reflect the range of values in the data and also
helpfully includes reference to the maximum value size to explain
that no shape will be any larger than this. For categorical data you
also see a key showing the meaning of different colours and shapes
433
and their associated values.
Figure 8.11 Twitter NYC: A Multilingual Social City
A nice approach to getting more out of your legends is shown in
Figure 8.11. Here you will see a key explaining the colour
associations combined with a bar chart to display the distribution of
quantities for each language grouping from this analysis of tweets
posted around New York City.
Captions: These exist typically as small passages of written analysis
that bring to the surface some of the main insights and conclusions
from the work. These might be presented close to related values
inside the chart or in separate panels to provide commentary outside
the chart.
In ‘Gun Deaths’ (Figure 8.12), there is a nice solution that combines
annotated captions with interactive data adjustments. Below the main
chart there is a ‘What This Data Reveals’ section which some of the
main findings from the analysis of the gun death data. The captions
double up as clickable shortcuts so that you can quickly apply the
relevant framing filters and update the main display to see what the
captions are referring to.
Figure 8.12 Excerpt from ‘US Gun Deaths’
434
As creative tools become more ubiquitous the possibility for
incorporating non-visual data in you work increases. As an alternative
to the written caption there is greater scope to consider using audio as
a means of verbally narrating a subject and explaining key messages.
Over the past few years one of the standout projects using this feature
was the video profiling ‘Wealth Inequality in America’ (Figure 8.13),
as introduced in Chapter 3, where the voiceover provides a very
compelling and cohesive narrative against the backdrop of the
animated visuals that present the data being described.
Figure 8.13 Image taken from ‘Wealth Inequality in America’
435
8.3 Typography
As you have seen, many features of annotation utilise text. This means
your choices will be concerned not just with what text to include, but also
with how it will look. This naturally merits a brief discussion about how
typography will have a significant role in the presentation of your work.
Firstly, some clarity about language. A typeface is a designed collection of
glyphs representing individual letters, numbers and other symbols of
language based on a cohesive style. A font is the variation across several
physical dimensions of the typeface, such as weight, size, condensation
and italicisation. A typeface can have one or many different fonts in its
family. Type effectively represents the collective appearance formed by the
choice of typeface and the font.
Tahoma and Century Gothic are different typefaces. This font and this font
both belong to the Georgia typeface family but display variations in size,
weight and italicisation.
I discussed earlier the distinction between definitions of data
visualisation and other related fields. I mentioned how the person
creating their design is not necessarily conscious or concerned about
what label is attached to their work, they are simply doing their work
regardless. The same could be applied to people’s interchangeable use
of and meaning of the terms typeface and font, the clarity of which has
been irreparably confused by Microsoft’s desktop tools in particular.
Serif typefaces add an extra little flourish in the form of a small line at the
end of the stroke in a letter or symbol. Garamond is an example of a serif
font. Serif typefaces are generally considered to be easier to read for long
sequences of text (such as the full body text) and are especially used in
print displays.
Sans-serif typefaces have no extra line extending the stroke for each
character. Verdana is an example of a sans-serif typeface. These typefaces
are commonly used for shorter sections of text, such as axis or value labels
or titles, and for screen displays.
In making choices about which type to use, there are echoes with the
thinking you are about to face on using colour. As you will see, colour
436
decisions concern legibility and meaning first, decoration last. With
typeface choices you are not dressing up your text, you are optimising its
readability and meaning across your display. The desired style of typeface
only comes into your thinking after legibility and meaning.
In terms of legibility, you need to choose a typeface and font combination
that will be suitable for the role of each element of text you are using.
Viewers need to be able to read the words and numbers on display without
difficulty. Quite obvious, really. Some typefaces (and specifically fonts)
are more easily read than others. Some work better to make numbers as
clearly readable as possible, others work better for words. There are plenty
of typefaces that might look cool and contemporary but if they make text
indecipherable then that is plain wrong.
Typeface decisions will often be taken out of your hands by the visual
identity guidelines of organisations and publications, as well as by
technical reasons relating to browser type, software compatibility and
availability.
Just as variation in colour implies meaning, so does variation in typeface
and font. If you make some text capitalised, large and bold-weight this will
suggest it carries greater significance and portrays a higher prominence
across the object hierarchy than any text presented in lower case, with a
smaller size and thinner weight. So you should seek to limit the variation
in font where possible.
Text-based annotations should be considered part of the supporting cast
and the way you consider typeface and font choices should reflect this
role. Typography in visualisation should be seen but really not heard.
Deciding on the most suitable type is something that can ultimately come
down to experience and influence through exposure to other work. Every
individual has their own relied-upon preferences. In practice, I find there is
a good chunk of trial and error as well as viewer testing that goes into
resolving the final selection. Across the spectrum of data visualisation
work being produced there are no significant trends to be informed by
largely because judging the most suitable typography choices will be
unique to the circumstances influencing each project.
Typography is just another of the many individual ingredients relevant to
data visualisation that exists as a significant subject in its own right. It is
437
somewhat inadequate to allocate barely two pages of this book to
discussing its role in visualisation, but these will at least offer you a bite-
sized window into the topic.
‘Never choose Times New Roman or Arial, as those fonts are favored
only by the apathetic and sloppy. Not by typographers. Not by you.’
Matthew Butterick, Typographer, Lawyer and Writer
8.4 Influencing Factors and Considerations
Having become familiar with the principal options for annotating, you now
have to decide which features to incorporate into your work and how you
might deploy these.
Formulating Your Brief
‘Think of the reader – a specific reader, like a friend who’s curious but a
novice to the subject and to data-viz – when designing the graphic. That
helps. And I rely pretty heavily on that introductory text that runs with
each graphic – about 100 words, usually, that should give the new-to-
the-subject reader enough background to understand why this graphic is
worth engaging with, and sets them up to understand and contextualize
the takeaway. And annotate the graphic itself. If there’s a particular
point you want the reader to understand, make it! Explicitly! I often run
a few captions typeset right on the viz, with lines that connect them to
key elements in the design.’ Katie Peek, Data Visualisation Designer
and Science Journalist, on making complex and/or complicated
subject matter accessible and interesting to her audience
Audience: Given that most annotations serve the purpose of viewer
assistance, your approach will inevitably be influenced by the
characteristics of your intended audience. Having an appreciation of
and empathy towards the knowledge and capabilities of the different
cohorts of viewers is especially important with this layer of design.
How much help will they need to understand the project and also the
data being portrayed? You will need to consider the following:
Subject: how well acquainted will they be with this subject
matter? Will they understand the terminology, acronyms,
438
abbreviations? Will they recognise the relevance of this
particular angle of analysis about this subject?
Interactive functions: how sophisticated are they likely to be in
terms of being able to understand and utilise the different
features of interactivity made possible through your design?
Perceiving: how well equipped are they to work with this
visualisation? Is it likely that the chart type(s) will be familiar or
unfamiliar; if the latter, will they need support to guide them
through the process of perceiving?
Interpreting: will they have the knowledge required to form
legitimate interpretations of this work? Will they know how to
understand what is good or bad, what big and small mean, what
is important, or not? Alternatively, will you need to provide
some level of assistance to address this potential gap?
Purpose map: The defined intentions for the tone and experience of
your work will influence the type and extent of annotation features
required.
If you are working towards a solution that leans more towards the
‘reading’ tone you are placing an emphasis on the perceptibility of the
data values. It therefore makes sense that you should aim to provide
as much assistance as possible (especially through extensive chart
annotations) to maximise the efficiency and precision of this process.
If it is more about a ‘feeling’ tone then you may be able to justify the
absence of the same annotations. Your intent may be to provide more
of a general sense – a ‘gist’ – of the order of magnitude of values.
If you are seeking to provide an ‘explanatory’ experience it would be
logical to employ as many devices as possible that will help inform
your viewers about how to read the charts (assisting with the
‘perceiving’ stage of understanding) and also bring some of the key
insights to the surface, making clear the meaning of the quantities and
relationships displayed (thus assisting with the stage of
‘interpreting’). The use of captions and visual overlays will be
particularly helpful in achieving this, as will the potential for audio
accompaniments if you are seeking to push the explanatory
experience a step further.
‘Exploratory’ experiences are less likely to include layers of insight
assistance, instead the focus will be more towards project-level
annotation, ensuring that viewers (and particularly here, users) have
as much understanding as possible about how to use the project for
their exploratory benefit. You might find, however, that devices like
439
‘How to read this graphic’ are still relevant irrespective of the
definition of your intended experience.
Characteristically, ‘exhibitory’ work demonstrates far less annotated
assistance because, by intention, it is more about providing a visual
display of the data rather than offering an explanatory presentation or
the means for exploratory interrogation. The assumptions here are
that audiences will have sufficient domain and project knowledge not
to require extensive additional assistance. Common chart annotations
like value labels and legends, and project annotations like titles and
introductions, are still likely to be necessary, but these might reflect
the extent required.
Establishing Your Editorial Thinking
Focus: During your editorial thinking you considered focus and its
particular role in supporting explanatory thinking. Are there specific
value labels that you wish to display over others? Rather than
labelling all values, for example, have you determined that only
certain marks and attributes will merit labelling? As you saw earlier
in the example scatter plot about the under-representation of black
students in US colleges, only certain points were labelled, not all.
These would have been judged to have been the most relevant and
interesting elements to emphasise through annotation.
Trustworthy Design
Transparency: Annotation is one of the most important aids to
ensure that you secure and sustain trust from your viewers by
demonstrating integrity and openness:
Explain what the project is and is not going to show.
Detail where the data came from and what framing criteria were
used during the process of acquisition, and also make what has
been ultimately included in the chart(s).
Outline any data transformation treatments, assumptions and
calculations. Are there any limitations that viewers need to be
aware of?
Highlight and contextualise any findings to ensure accuracy in
interpretation.
With digital projects in particular, provide access to coding
440
repositories to lay open all routines and programmatic solutions.
Accessible Design
Understandable: If you recall, in the section profiling circumstances
you considered what the characteristics were of the setting or
situation in which your audience might consume your visualisation.
Well-judged project and chart annotations are entirely concerned with
providing a sufficient level of assistance to achieve understanding.
The key word there is ‘sufficient’ because there is a balance: too
much assistance makes the annotations included feel overburdening;
too little and there is far more room for wrong assumptions and
misconceptions to prosper. A setting that is consistent with the need
to deliver immediate insights will need suitable annotations to fulfil
this. There will be no time or patience for long introductions or
explanations in that setting. Conversely, a visualisation about a
subject matter that is inherently complex may warrant such
assistance.
Elegant Design
Minimise the clutter: A key concern about annotations is judging the
merits of including structural or textual assistance against the
potential disruption and obstruction caused by these to the view of the
data. Any annotation device added to your display has a spatial and
visual consequence that needs to be accommodated. Of course, as
mentioned, with the benefit of interactivity it is possible to show and
hide layers of detail. Overall, you will have to find the most elegant
solution for presenting your annotations to ensure you do not
inadvertently undermine the help you are trying to provide.
Summary: Annotation
Project annotations help viewers understand what the project is about and
how to use it, and may include the following features:
Headings: titles, sub-titles and section headings.
Introductions: providing background and aims of the project.
441
User guides: advice or instruction for how to use any interactive
features.
Multimedia: the potential to enhance your project using appropriate
imagery, videos or illustrations.
Footnotes: potentially includes data sources, credits, usage
information, and time/date stamps.
Chart annotations help viewers perceive the charts and optimise their
potential interpretations and may include the following features:
Chart apparatus: axis lines, gridlines, tick marks.
Labels: axis titles, axis labels, value labels.
Legend: providing detailed keys for colour or size associations.
Reading guides: detailed instructions advising readers how to
perceive and interpret the chart.
Captions: drawing out key findings and commentaries.
Typography Most of the annotation features you include are based on text
and so you will need to consider carefully the legibility of the typeface you
choose and the logic behind the font-size hierarchy you display.
Influencing Factors and Considerations
Formulating the brief: consider the characteristics and needs of the
audience. Certain chart choices and subjects may require more
explanation. From the ‘purpose map’ what type of tone and
experience are you trying to create and what role might annotation
play?
Establishing your editorial thinking: what things do you want to
emphasise or direct the eye towards (focus)?
Trustworthy design: maximise the information viewers have to ensure
all your data work is transparent and clearly explained.
Accessible design: what is the right amount and type of annotation
suitable to the setting and complexity of your subject?
Elegant design: minimise the clutter.
Tips and Tactics
Attention to detail is imperative: all instructions, project information,
captions and value labels need to be accurate. Always spell-check
442
digitally and manually, and ask others to proofread if you are too
‘close’ to see.
Do not forget to check on permission to use any annotated asset, such
as imagery, photos, videos, quotations, etc.
443
9 Colour
Having established which charts you will use, the potential interactive
functions that might be required and the annotation features that will be
especially useful, you have effectively determined all the visible elements
that will be included in your project. The final two layers of design
concern not what elements will be included or excluded, but how they will
appear. After this chapter you will look at issues on composition, but
before that the rather weighty matter of colour.
As one of the most powerful sensory cues, colour is a highly influential
visual property. It is arguably the design decision that has the most
immediate impact on the eye of the viewer. All the design features of your
visualisation display hold some attribute of colour, otherwise they are
invisible:
Every mark and item of apparatus in your charts will be coloured;
indeed colour in itself may be an attribute that represents your data
values.
Interactive features do not always have an associated visible property
(some are indeed invisible and left as intuitively discoverable).
However, those features that involve buttons, menus, navigation tabs
and value sliders will always have a colour.
Annotation properties such as titles, captions and value labels will all
be coloured.
Composition design mainly concerns the arrangement of all the above
features, though you might use colour to help achieve a certain design
layout. As you will see, emptiness is a useful organising device –
leaving something blank is a colour choice.
Thankfully, there is a route through all of this potential complexity relying
on just a little bit of science mixed in with lots of common sense. By
replacing any arbitrary judgements that might have been previously based
on taste, and through increasing the sensitivity of your choices, colour
becomes one of the layers of visualisation design that can be most quickly
and significantly improved.
444
‘Colors are perhaps the visual property that people most often misuse in
visualization without being aware of it.’ Robert Kosara, Senior
Research Scientist at Tableau Software
The key factor in thinking about colour is to ensure you establish meaning
first and decoration last. That is not to rule out the value of certain
decorative benefits of colour, but to advise that these should be your last
concern. Besides, in dealing with meaningful applications of colour you
will already have gone a long way towards establishing the ‘decorative’
qualities of your project’s aesthetic appearance.
This chapter begins with a look at some of the key components of colour
science, offering a foundation for your understanding about this topic.
After that you will learn about the ways and places in which colour could
be used. Finally, you will consider the main factors that influence colour
decisions.
COLOUR thinking begins from inside the chart(s), working outwards
across the rest of the visualisation anatomy:
Data legibility.
Editorial salience.
Functional harmony.
9.1 Overview of Colour Theory
COLOUR in visualisation is something of a minefield. As with many of
these design layer chapters, an introduction to colour involves judging the
right amount of science and the right amount of practical application. What
does justice to the essence of the subject and gives you the most relevant
content to work with is a delicate balance.
When you lift the lid on the science behind colour you open up a world of
brain-ache. When this chapter is finalised I will have spent a great deal of
time agonising over how to explain this subject and what to leave in or
leave out because there is so much going on with colour. And it is tricky.
Why? Because you almost come face to face with philosophical questions
like ‘what is white?’ and the sort of mathematical formulae that you really
rather hoped had been left behind at school. You learn how the colours you
445
specify in your designs as X might be perceived by some people as Y and
others as Z. You discover that you are not just selecting colours from a
neat linear palette but rather from a multi-dimensioned colour space
occupying a cubic, cylindrical or spherical conceptual shape, depending on
different definitions.
The basis of this topic is the science of optics – the branch of physics
concerned with the behaviour and properties of light – as well as
colorimetry – the science and technology used to quantify and describe
human colour perception. Two sciences, lots of maths, loads of variables,
endless potential for optical illusions and impairment: that is why colour is
tricky and why you need to begin this stage of thinking with an
appreciation of some colour theory.
The most relevant starting point is to recognise that when dealing with
issues of colour in data visualisation you will almost always be creating
work on some kind of computer. Unless you are creating something by
hand using paints or colouring pencils, you will be using software viewed
through an electronic display.
This is important because a discussion about colour theory needs to be
framed around the RGB (Red, Blue, Green) colour model. This is used to
define the combination of light that forms the colours you see on a screen,
conceptually laid out in a cubic space based on variations across these
three attributes.
The output format of your work will vary between screen display and print
display. If you are creating something for print you will have to shift your
colour output settings to CMYK (Cyan, Magenta, Yellow and Black). This
is the model used to define the proportions of inks that make up a printed
colour. This is known as a subtractive model, which means that combining
all four inks produces black, whereas RGB is additive as the three screen
colours combine to produce white.
When you are creating work to be consumed on the Web through screen
displays, you will often program using HEX (Hexadecimal) codes to
specify the mix of red, green and blue light (in the form #RRGGBB
using codes 00 to FF).
While CMYK communicates from your software to a printer, telling it
446
what colours to print as an output, it does not really offer a logical model
to think about the input decisions you will make about colour. Neither, for
that matter, does RGB: it just is not realistic to think in those terms when
considering what choices are needed in a visualisation design. There are
different levers to adjust and different effects being sought that require an
alternative model of thinking.
Figure 9.1 HSL Colour Cylinder
Figure 9.2 Colour Hue Spectrum
I share the belief with many in the field that the most accessible colour
model – in terms of considering the application of colour in data
visualisation – is HSL (Hue, Saturation, Lightness), devised by Albert
Munsell in the 1980s. These three dimensions combine to make up what is
known as a cylindrical-coordinate colour representation of the RGB colour
model (I did warn you about the cylinders).
Hue is considered the true colour. With hue there are no shades
(adding black), tints (adding whites) or tones (adding grey) – a
447
consideration of these attributes follows next. When you are
describing or labelling colours you are most commonly referring to
their hue: think of the colours of the rainbow ranging through various
mixtures of red, orange, yellow, green, blue, indigo and violet. Hue is
considered a qualitative colour attribute because it is defined by
difference and not by scale.
Saturation defines the purity or colourfulness of a hue. This does
have a scale from intense pure colour (high saturation) through
increasing tones (adding grey) to the no-colour state of grey (low
saturation). In language terms think vivid through to muted.
Figure 9.3 Colour Saturation Spectrum
Lightness defines the contrast of a single hue from dark to light. It is
not a measure of brightness – there are other models that define that –
rather a scale of light tints (adding white) through to dark shades
(adding black). In language terms I actually think of lightness more as
degrees of darkness, but that is just a personal mindset.
Figure 9.4 Colour Lightness Spectrum
Technically speaking, black, white and grey are not considered colours.
I have deliberately described these dimensions separately because, as you
will see when looking at the applications of colour in visualisation, your
decisions will often be defined by how you might employ these distinct
dimensions of colour to form your visual display. The main choices tend to
fall between employing difference in hue and variation in lightness, with
the different levels of saturation often being a by-product of the definitions
made for the other two dimensions.
Alternative models exist offering variations on a similar theme, such as
HSV (Hue, Saturation, Value), HSI (Hue, Saturation, Intensity), HSB
(Hue, Saturation, Brightness) and HCL (Hue, Chroma, Luminance).
These are all primarily representations of the RGB model space but
involve differences in the mathematical translation into/from RGB and
448
offer subtle differences in the meaning of the same terms (local definitions
of hue and saturation vary). The biggest difference relates to their
emphasis as a means of specifying either a colour quality (in an input,
created sense) or a colour perception (in how a colour is ultimately
experienced).
Pantone is another colour space that you might recognise. It offers a
proprietary colour-matching, identifying and communicating service for
print, essentially giving ‘names’ to colours based on the CMYK
process.
The argument against using the HSL model for defining colour is that,
while it is fine for colour setting (i.e. an intuitive way to think about and
specify the colours you want to set in your visualisation work), the
resulting colours will not be uniformly perceived the same, from one
device to the next. This is because there are many variables at play in the
projection of light to display colour and the light conditions present in the
moment of perception. That means the same perceptual experience will not
be guaranteed. It is argued that more rigorous models (such as CIELAB)
offer an absolute (as opposed to a relative) definition of colour for both
input and output. My view is that they are just a little bit too hard to easily
translate into visualisation design thinking. Furthermore, trying to control
for all the subtleties of variation in consumption conditions is an extra
burden you should ideally avoid.
At this stage, it is important to be pragmatic about colour as much as
possible. The vast majority of your colour manipulating and perceptual
needs should be nicely covered by the HSL model. As and when you
develop a deeper, purist interest in colour you should then seek to learn
more about the nuances in the differences between the definitions of these
models and their application.
9.2 Features of Colour: Data Legibility
Data legibility concerns the use of the attribute of colour to encode data
values in charts. The objective here is to make the data being represented
by differences in colour as clearly readable and as meaningful as possible.
While you have probably already decided by now the chart or charts you
449
intend to use, you still need to take think carefully – and separately – about
how you will specifically employ colour. To do this we first need to revisit
the classification of data types and consider how best to use colour for
representing each different type.
Nominal (Qualitative)
With nominal data colour is used to classify different categorical values.
The primary motive for the choice of colour is to create a visible
distinction between each unique categorical association, helping the eye to
discern the different categories as efficiently and accurately as possible.
Creating contrast is the main aim of representing nominal data. What you
are not seeking to show or even imply is any sense of an order of
magnitude. You want to help differentiate one category from the next –
and make it easily identifiable – but to do so in a way that preserves the
sense of equity among the colours deployed.
Figure 9.5 Excerpt from ‘Executive Pay by the Numbers’
Variation in hue is typically the colour dimension to consider using for
450
differentiating categories. Additionally, you might explore different tones
(variations in saturation across the hues). You should not, though, consider
using variations in the lightness dimension. That is because the result is
insufficiently discernible. As you can see demonstrated in Figure 9.5, the
lightness variation of a blue tone makes it quite hard to connect the colour
scale presented in the key at the top with the colours displayed in the
stacked bars underneath. With the shading in the column header and the
2011 grey bar also contributing similar tones to the overall aesthetic of the
table our visual processing system has to work much harder to determine
the associations than it should need to do.
Often the categories you will be differentiating with colour will be
relatively few in number, maybe two or three, such as in the separation
between political parties or plotting different values for gender, as seen in
Figure 9.6.
Figure 9.6 How Nations Fare in PhDs by Sex
Figure 9.7 How Long Will We Live – And How Well?
451
Beyond these small numbers, you still typically might only need to
contend with assigning colours to around four to six categories, perhaps in
analysis that needs to visually distinguish values for different continents of
the world, as seen in the scatter plot in Figure 9.7.
As the range of different categories grows, the ability to preserve clear
differentiation becomes harder. In expanding your required palette, the
colours used become decreasingly unique. The general rule of thumb is
that once you have more than 12 categories it will not be possible to find a
sufficiently different colour to assign to categories from 13 upwards.
Additionally, you are really increasing the demands of learning and
recognition for viewers. This then becomes quite a cognitive burden and
delays the process of understanding.
Figure 9.8 Charting the Beatles: Song Structure
452
Two approaches for dealing with this. Firstly, consider offering interactive
filters to modify what categories are displayed in a visualisation – thus
potentially reducing the impact of so many being available. Secondly,
think about transforming your data by excluding or combining categories
in to a reduced number of aggregate groupings.
Depending on the subject of your data, sometimes you can look to
supplement the use of colour with texture or pattern to create further
visible distinctions. In Figure 9.8 you can see two patterns being used
occasionally as additive properties to show the structure of tracks on The
Beatles’ album.
Ordinal (Qualitative)
With ordinal data you are still dealing with categories but now they have a
natural hierarchy or ordering that can be exploited. The primary motive for
using colour in this case is not only to create a visible distinction between
each unique category association but also to imply some sense of an order
of magnitude through the colour variation. The colour dimensions used to
achieve this tend to employ variations of either the saturation or the
lightness (or a combination of both). You might also introduce different
hues when dealing with diverging (dual-direction) scales rather than
simply converging (single-direction) ones.
453
Figure 9.9 displays a simple example of colour used to display a
converging ordinal variable. This is the teacup that I use in my office. On
the inside you can see it has a colour guide to help ascertain how much
milk you might need to add: going through Milky, Classic British,
Builder’s Brew, and finally Just Tea (zero milk).
Figure 9.9 Photograph of MyCuppa Mug
A typical example of a diverging ordinal scale might be seen in the stacked
bar chart showing the results of a survey question (Figure 9.10). The
answers are based on the strength of feeling: strongly agree, agree, neutral,
disagree, strongly disagree. By colouring the agreement in red (‘hot’
sometimes used to represent ‘good’) and the disagreement in blue (‘cold’
454
to mean ‘bad’) means a viewer can quickly perceive the general balance of
feelings being expressed.
Figure 9.10 Example of a Stacked Bar Chart Based on Ordinal Data
Another example of ordinal data might be to represent the notion of
recency. In Figure 9.11 you see a display plotting the 2013 Yosemite
National Park fire. Colour is used to display the recorded day-by-day
progress of the fire’s spread. The colour scale is based on a recency scale
with darker = recent, lighter = furthest away (think faded memory).
Figure 9.11 The Extent of Fire in the Sierra Nevada Range and Yosemite
National Park, 2013
455
Interval and Ratio (Quantitative)
With quantitative data (ratio and interval) your motive, as it is with ordinal
data, is to demonstrate the difference between and of a set of values. In the
choropleth map in Figure 9.12, showing the variation in electricity prices
across Switzerland, the darker shades of blue indicate the higher values,
the lighter tints the lower prices. This approach makes the viewer’s
perception of the map’s values immediate – it is quite intuitive to
recognise the implication of the general patterns of light and dark shades.
Figure 9.12 What are the Current Electricity Prices in Switzerland
[Translated]
456
Typically, using colour to represent quantitative data will involve breaking
up your data values into discrete classifications or ‘bins’. This makes the
task of reading value ranges from their associated colour shade or tone a
little easier than when using a continuous gradient scale. While our
capacity to judge exact variations in colour is relatively low (even with a
colour key for reference), we are very capable of detecting local variations
of colour through differences in tint, shade or tone. Assessing the relative
contrast between two colours is generally how we construct a quantitative
hierarchy.
Look at the fascinating local patterns that emerge in the next map (Figure
9.13), comparing increases in the percentage of people gaining health
insurance in the USA (during 2013–14). The data is broken down to
county level detail with a colour scale showing a darker red for the higher
percentage increases.
Some of the most relevant colour practices for data visualisation come
from the field of cartography (as do many of the most passionate colour
457
purists). Just consider the amount of quantitative and categorical detail
shown in a reference map that relies on colour to differentiate types of
land, indicate the depth of water or the altitude of high ground, present
route features of road and rail networks, etc. The best maps pack an
incredible amount of detail into a single display and yet somehow they
never feel disproportionately overwhelming.
Figure 9.13 Excerpt from ‘Obama’s Health Law: Who Was Helped Most’
Aside from the big-picture observations of the darker shades in the west
and the noticeably lighter tints to the east and parts of the mid-west, take a
closer look at some of the interesting differences at a more local level. For
example, notice the stark contrast across state lines between the dark
regions of southern Kentucky (to the left of the annotated caption) and the
light regions in the neighbouring counties of northern Tennessee. Despite
their spatial proximity there are clearly strong differences in enrolment on
the programme amongst residents of these regions.
Both of these previous examples use a convergent colour scale, moving
through discrete variations in colour lightness to represent an increasing
458
scale of quantitative values, from zero or small through to large. As
illustrated with the stacked bar chart example shown earlier, portraying the
range of feelings from an ordinal dataset, sometimes you may need to
employ a divergent colour scale. This is when you want to show how
values are changing in two directions either side of a set breakpoint.
Figure 9.14 Daily Indego Bike Share Station Usage
Figure 9.14 shows a cropped view of a larger graphic comparing the
relative peaks and troughs of usage across all bike share stations in
Philadelphia over a 24-hour period. The divergent colour scale uses two
hues and variations in lightness to show the increasingly busy and
increasingly slow periods of station activity either side of a breakpoint,
represented by a very light grey to indicate the average point. The darkest
red means the station is full, the darkest blue means the station is empty.
Regardless of whether you are plotting a converging or diverging scale,
judging how you might divide up your colour scales into discrete value
bins needs careful thought. The most effective colour scales help viewers
perceive not just the relative order of magnitude – higher or lower – but
also a sense of the absolute magnitude – how different a value might be
compared to another value.
There is no universal rule about the number of value bins. Indeed, it is not
uncommon to see entirely continuous colour scales. However, a general
rule of thumb I use is that somewhere between between four and nine
meaningful – and readable – value intervals should suffice. There are two
key factors to consider when judging your scales:
Are you plotting observed data or observable data? You might only
459
have collected data for a narrow range of quantities (e.g. 15 to 35) so
will your colour classifications be based on this observed range or on
the potentially observable data range i.e. the values you know
would/could exist with a wider sample size or on a different
collection occasion (e.g. 0 to 50)?
What are the range and distribution of your data? Does it make sense
to create equal intervals in your colour classifications or are there
more meaningful intervals that better reflect the shape of your data
and the nature of your subject? Sometimes, you will have legitimate
outliers that, if included, will stretch your colour scales far beyond the
meaningful concentration of most of your data values.
Figure 9.15 Battling Infectious Diseases in the 20th Century: The Impact
of Vaccines
You can see this effect in Figure 9.15, showing the incidence of Hepatitis
A per 100,000 population. There are only three values that exceed 100
(you can see them on the top line for Alaska in the late 1970s). To
accommodate these outliers the colour scale becomes somewhat stretched-
out, with a wide range of potential values being represented by a dark
yellow to red colour. With 99.9% of the values being under 100 there is
little discernibly in the blue/green shades used for the lower values. If
outliers are your focus, it makes sense to include these and colour
460
accordingly to emphasise their exceptional quality. Otherwise if they risk
compromising the discrete detail of the lower values you might look to
create a broad classification that uses a single colour for any value beyond
a threshold of maybe 75, with even value intervals of maybe 15 below that
help to show the patterns of smaller values.
For diverging scales, the respective quantitative shades either side of a
breakpoint need to imply parity in both directions. For example, a shade of
colour that means +10% one side of the breakpoint should have an equal
shade intensity in a different hue on the other side to indicate the same
interval, i.e. −10%. Additionally, the darkest shades of hues at the extreme
ends of a diverging scale must still be discernible. Sometimes the darkest
shades will be so close to black that you will no longer be able to
distinguish the differences in their underlying hues when plotted in a chart
or map.
As well as considering the most appropriate discrete bins for your values,
for diverging scales one must also pay careful attention to the role of the
breakpoint. This is commonly set to separate values visually above or
below zero or those either side of a meaningful threshold, such as target,
average or median.
One of the most common mistakes in using colour to represent quantitative
data comes with use of the much-derided rainbow scale. Look at Figure
9.16, showing the highest temperatures across Australia during the first
couple of weeks in 2013. Consider the colour key to the right of the map
and ask yourself if this feels like a sufficiently intuitive scale. If the key
was not provided, would you be able to perceive the order of magnitude
relationship between the colours on the map? If you saw a purple colour
next to a blue colour, which would you expect to mean hotter and which
colder?
Figure 9.16 Highest Max Temperatures in Australia
461
While the general implication of blue = ‘colder’ through to red = ‘hotter’ is
included within sections of this temperature colour scale, it is the presence
of many other hues that obstructs the accessibility and creates
inconsistency in logic. For instance, do the colours used to show 24°C
(light blue) jumping to 26°C (dark green) make sense as a means for
showing an increasing temperature? How about 18°C (grey) to 20°C (dark
blue), or the choice of the mid-brown used for 46°C which interrupts the
increasingly dark red sequence? If you saw on the map a region with the
pink tone as used for 16°C would you be confident that you could easily
distinguish this from the lighter pink used to represent 38°C? Unless there
are meaningful thresholds within your quantitative data – justifiable
breakpoints – you should only vary your colour scales through the
lightness dimension, not the hue dimension.
One of the interesting recurring challenges faced by visualisers is how to
represent nothing. For example, if a zero quantity or no category is a
meaningful state to show, you still need to represent this visually
somehow, even though it might possess no size, no position and no area.
How do you distinguish between no data and a zero value?
Figure 9.17 State of the Polar Bear
462
Typically, using colour is one of the best ways to portray this. Figure 9.17
shows one solution to making ‘no data’ a visible value. This map displays
the population trends of the polar bear. Notice those significant areas of
grey representing ‘data deficient’. A subtle but quite effective political
point is being made here by including this status indicator. As I mentioned
before, sometimes the absence of data can be the message itself.
Figure 9.18 Excerpt from ‘Geography of a Recession’
463
When considering colour choices for quantitative classifications, you will
need to think especially carefully about the lowest value grouping: is it to
be representative of zero, an interval starting from zero up to a low value,
or an interval starting only from the minimum value and never including
zero? In this choropleth map (Figure 9.18) looking at the unemployment
rate across the counties of the USA, no value is as low as zero. There
might be value that are close, but nowhere is the unemployment rate at
0%. As you can see, the lowest tint used in this colour key is not white,
rather a light shade of orange, so as not to imply zero. Whilst not relevant
to this example, if you wanted to create a further distinction between the
lowest value interval and the ‘null’ or ‘no data’ state you could achieve
this by using a pure white/blank.
9.3 Features of Colour: Editorial Salience
Having considered options for the application of colour in facilitating data
legibility, the next concern is colour used for editorial salience. Whereas
data legibility was concerned with helping to represent data, using colour
for editorial salience is about drawing the viewer’s attention to the
464
significant or meaningful features of your display. Colour offers such a
potent visual stimulus and an influential means for drawing out key
aspects of your data and project that you might feel are sufficiently
relevant to make prominent.
Consider again the idea of photography and the effect of taking a
photograph of a landscape. You will find the foreground objects are darker
and more prominent than the faded view of the background in the distance
as light and colour diminish. Using colour to achieve editorial salience
involves creating a similar effect of depth across your visualisation’s
contents: if everything is shouting, nothing is heard.
The goal of using colour to facilitate editorial salience is a suitable
contrast. For things to stand out, you are in turn determining which other
things will not.
The degree of contrast you might seek to create will vary. Often you will
be seeking to draw a significant contrast, maximising the emphasis of a
value or subset of values so the viewer can quickly home in on what you
have elevated for their attention relative to everything else.
For this reason, grey will prove to be one of your strongest allies in data
visualisation. When contrasted with reasonably saturated hues, grey helps
to create depth. Elements coloured in greyscale will sit quietly at the back
of the view, helping to provide a deliberately subdued context that enables
the more emphasised coloured properties to stand proudly in the
foreground.
In Figure 9.19, the angle of analysis shows a summary of the most
prevalent men’s names featuring among the CEOs of the S&P 1500
companies. As you can see there are more guys named ‘John’ or ‘David’
than the percentage of all the women CEOs combined. With the emphasis
of the analysis on this startling statement of inequality the bar for ‘All
women’ is emphasised in a burgundy colour, contrasting with the grey bars
of all the men’s names. Notice also that the respective axis and bar value
labels are both presented using a bold font, which further accentuates this
emphasis. It is also editorially consistent with the overriding enquiry of the
article. As discussed in Chapter 3, bringing to the surface key insights
from data displays in this way contributes towards facilitating an
‘explanatory’ experience.
465
Figure 9.19 Fewer Women Run Big Companies Than Men Named John
Figure 9.20 NYPD, Council Spar Over More Officers
Sometimes, only noticeable contrast – not shouting, just being slightly
more distinguishable – may be appropriate. Compared with the previous
bar chart example, Figure 9.20 creates a more subtle distinction between
the slightly darker shade of green (and emboldened text) emphasising the
New York figures compared to the other listed departments in a slightly
lighter green. As with the CEOs’ example, the object of our attention is the
466
subject of focus in the analysis, in this case regarding a drive for more
NYPD officers. This does not need to be any more contrasting; it is just as
sufficiently noticeable as the visualiser wishes it to be.
Sometime you will seek to create several levels of visual ‘urgency’ in the
relative contrast of your display. The colour choices in Figure 9.21 gives
foreground prominence to the yellow coloured markers and values (the
dots are also larger) and then mid-ground/secondary prominence to the
slightly muted red markers. In perceiving the values of the yellow markers,
the viewer is encouraged to concentrate on primarily comparing these with
the red markers. The subtle grey markers are far less visible – closer in
shade to the background than the foreground – and deliberately relegated
to a tertiary level so they do not clutter up the display and cause
unwarranted attention. They provide further context for the distribution of
the values but do not need to be any more prominent in their relationship
with the foreground and mid-ground colours.
Figure 9.21 Excerpt from a Football Player Dashboard
I touched on the use of encoded overlays earlier where coloured areas or
bandings can be used to help separate different regions of a display in
order to facilitate faster interpretation of the meaning of values. In the
bubble plot in Figure 9.22, you can see the circle markers are colour coded
467
to help viewers quickly ascertain the significance of each location on the
chart according to the quadrants in which they fall. Notice how in the
background the diagonal shading further emphasises the distinction
between above the line ‘improvement’ and below the line ‘worsening’, a
very effective approach.
Figure 9.22 Elections Performance Index
9.4 Features of Colour: Functional Harmony
After achieving data legibility and editorial salience through astute colour
choices, functional harmony is concerned with ensuring that any remaining
colour choices will aid, and not hinder, the functional effectiveness and
elegance of the overall visualisation.
‘When something is not harmonious, it’s either boring or chaotic. At
one extreme is a visual experience that is so bland that the viewer is not
engaged. The human brain will reject under-stimulating information. At
the other extreme is a visual experience that is so overdone, so chaotic,
that the viewer can’t stand to look at it. The human brain rejects what it
can not organise, what it cannot understand.’ Jill Morton, Colour
468
Expert and Researcher
You must judge the overall balance of and suitability of your collective
colour choices and not just see these as isolated selections. This is again
primarily a judgement about contrast – what needs to be prominent and
what needs to be less so. Such an apparent calming quality about a well-
judged and cohesive colour palette is demonstrated by Stefanie Posavec’s
choices in visualising the structure of Walter Benjamin’s essay ‘Art in the
age of mechanical reproduction’ (Figure 9.23). There is effortless harmony
here between the colour choices extending across the entire anatomy of
design: the petals, branches, labels, titles, legend, and background.
A reminder that any and every design feature you incorporate into your
display will have a property of colour otherwise they will be invisible. In
looking at data legibility and editorial salience you have considered your
colour choices for representing data. A desire to achieve functional
harmony means considering further colour decisions that will help
establish visual relationships across and between the rest of your
visualisation’s anatomy: its interactive features, annotations and
composition.
Figure 9.23 Art in the Age of Mechanical Reproduction: Walter Benjamin
Interactive features: Visible interactive features will include
controls such as dropdown menus, navigation buttons, time sliders
and parameter selectors. The colour of every control used will need to
be harmonious with the rest of the project but also, critically, must be
functionally clear. How you use colour to help the user discern what
469
is selected and what is not will need to be carefully judged.
To illustrate this, Figure 9.24 shows an interactive project that
examines the connected stories of the casualties and fatalities from
the Iraqi and Afghan conflicts. Here you can see that there are several
interactive features, all of which are astutely coloured in a way that
feels both consistent with the overall tone of the project but also
makes it functionality evident what each control’s selected status or
defined setting is. This is achieved through very subtle but effective
combinations of dark and light greys that help create intuitive clarity
about which values the user has selected or highlighted. When a
button has a toggle setting (on/off, something/something else), such
as the ‘Afghanistan’ or ‘Iraq’ tabs at the top, the selected tab is
highlighted in bright grey and the unselected tab in a more subdued
grey. Filters can either frame (include/exclude) or focus
(highlight/relegate) the data. The same approach to using brighter
greys for the selected parameter values makes it very clear what you
have chosen, but also what you have excluded (while making evident
the other currently-unselected values from which you can potentially
choose).
Figure 9.24 Casualties
Annotations: Chart annotations such as gridlines, axis lines and
value labels all need colouring in a way that will be sympathetic to
the colour choices already made for the data representation and,
possibly, editorial contrasting. As mentioned in the last chapter, many
470
annotation devices exist in the form of text and so the relative font
colour choices will need to be carefully considered. For any
annotation device the key guiding decision is to find the level at
which these are suitably prominent. Not loud, not hidden, just at the
right level. This will generally take a fair amount of trial and error to
get right but once again, depending on your context, your first
thought should be to consider the merits offered by different shades
of grey.
You might be starting to suspect I’m a lobbyist for the colour grey.
Nobody wants to live in a world of only grey. The point is more about how
its presence enables other colours to come alive. The great Bill Shankly
once said ‘Football is like a piano, you need 8 men to carry it and 3 who
can play the damn thing’. In data visualisation, grey does the heavy lifting
so the more vibrant colours can bring the energy and vibrancy to your
design.
Figure 9.25 First Fatal Accident in Spain on a High-speed Line
[Translated]
Another example of the role of greyscale is demonstrated by Figure 9.25,
illustrating key aspects of the tragic rail crash in Spain in 2013. The sense
of foreground and background is clearly achieved by the prominence of
the scarlet-coloured annotations and visual cues offset against the
471
backdrop of an otherwise greyscale palette.
Figure 9.26 Lunge Feeding
472
There are other features of annotation that will have an impact on
functional harmony through their colouring. Multimedia assets like
photos, embedded videos, images and illustrations need to be
consistent in tone according to their relative role on the display. If
they are to dominate the page then unleash the vibrancy of their
colours to achieve this; if they are playing more of a secondary or
supporting role then relegate their constituent colours to allow other
primary features due prominence.
Figure 9.26 includes small illustrations of a whale, showing how it
goes through the stages of lunge feeding. The elegance of the colours
used in these illustrations is entirely harmonious with the look and
feel of the overall piece. They are entirely at one with the rest of the
graphic.
Composition: The clarity in layout of a project will often be achieved
by the use of background colour to create logical organisation. In the
‘Lunge Feeding’ graphic the shading of the blue sea getting darker as
it moves down is not attempting to offer a precise representation of
the sea, but it gives a sense of depth and draws maximum attention to
that panel. It is also naturally congruent with the subject matter.
Figure 9.27 Examples of Common Background Colour Tones
In general, there are no fixed rules on the benefits of any particular
colour for background shading. Your choices will depend mostly on
the circumstances and conditions in which your viewers are
473
consuming the work. Usually, when there is no associated congruence
for a certain background colour, your options will tend to come from
one of the selection of neutral and/or non-colours (Figure 9.27). This
is because they particularly help to aid accentuation in combination
with foreground colours.
Typically, though, a white background (at least for your chart area)
gives viewers the best chance of being able to accurately perceive the
different colour attributes used in your data representation and the
contrasting nature of your editorial contrast.
White – or more specifically emptiness – is one of your most
important options for creating functional meaning for nothingness,
something I touched on earlier. The emptiness of uncoloured space
can be used very effectively to direct the eye’s attention. It organises
the relationship between space on a page without the need for visible
apparatus, as seen in the left hand column of the lunge feeding
graphic. It can also be used to represent or emphasise values that
might have the state of ‘null’ or ‘zero’ to maximise contrast.
‘The single most overlooked element in visual design is emptiness.
Space must look deliberately used.’ Alex White, Author, The
Elements of Graphic Design
9.5 Influencing Factors and Considerations
Having mapped out the ways and places where colour could be used, you
will now need to consider the factors that will influence your decisions
about how colour should be used.
Formulating Your Brief
Format: This is a simple concern but always worth pointing out: if
you are producing something for screen display you will need to set
your colour output to RGB; if it is for print you will need CMYK.
Additionally, when you are preparing work for print, running off
plenty of proofs before finalising a design is imperative. What you are
preparing digitally is a step away from the form of its intended
output. What looks like a perfect colour palette on screen may not
ultimately look the same when printed.
474
Print quality and consistency is also a factor. Graphics editors who
create work for print newspapers or magazines will often consider
using colours as close in tone as possible to pure CMYK, especially if
their work is quite intricate in detail. This is because the colour plates
used in printing presses will not always be 100% aligned and thus
mixtures of colours may be slightly compromised.
As black and white printing is still commonplace, you need to be
aware of how your work might look if printed without colour. If you
are creating a visualisation that might possibly be printed by certain
users in black and white, the only colour property that you can
feasibly utilise will be the lightness dimension. Sometimes, as a
designer or author, you will be unaware of this intent and the
colourful design that you worked carefully towards will end up not
being remotely readable.
We all refer to black and white printing, but technically printers do not
actually print using white ink, it is just less black or no black.
Furthermore, there is an important difference in how colours appear when
published in colour and how they appear when published in black and
white. Hues inherently possess different levels of brightness: the purest
blue is darker than the purest yellow. If these were printed in black and
white, blue would therefore appear a darker, more prominent shade of
grey. If your printed work will need to be compatible for both colour and
black and white output, before finalising your decisions check that the
legibility and intended meaning of your colour choices are being
maintained across both forms.
Setting: For digital displays, the conditions in which the work will be
consumed will have some influence over the choice between light and
dark backgrounds. The main factor is the relative contrast and the
stresses this can place on the eye to adjust against the surroundings. If
your work is intended for consumption in a light environment, lighter
backgrounds tend to be more fitting; likewise darker backgrounds
will work best for consuming in darker settings. For
tablets/smartphones, the bordering colour of the devices can also
influence the most suitable choice of background tone to most
sympathetically contrast with the surroundings.
Colour rules and identities: In some organisations there are style
475
guidelines or branding identities that require the strict use of only
certain colour options. Similar guidelines may exist if you are
creating work for publication in a journal, magazine, or on certain
websites. Guidelines like these are well intended, driven by a desire
to create conformity and consistency in style and appearance.
However, in my experience, the basis of such colour guides rarely
incorporates consideration for the subtleties of data visualisation. This
means that the resulting palettes are often a bad fit for ideal
visualisation colour needs, providing limited scope for the variation
and salience you might seek to portray.
Your first task should always be to find out if there is any
compromise – any chance of not having these colour restrictions
imposed. If there is no flexibility, then you will just have to accept
this and begin acquainting yourself with the colours you do have to
work with. Taking a more positive view, achieving consistency in the
use of colour for visualisation within an organisation does have merits
if the defined palettes offer suitably rich variety. Developing a
recognisable ‘brand’ and not having to think from scratch about what
colours to use every time you face a new project is something that can
be very helpful, especially across a team.
Purpose map: Does it need to be utilitarian or decorative? Should it
be functional or appealingly seductive? Does it lend itself to being
vivid and varied in colour or more muted and distinguished? Colour
is the first thing we notice as viewers when looking at a visualisation,
so your choices will play a huge part in setting the visible tone of
voice. How you define your thinking across the vertical dimension of
your purpose map will therefore have an influence on your colour
thinking.
Along the horizontal dimension, the main influencing consideration
will be a desire to offer an ‘explanatory’ experience. As mentioned,
some of the tactics for incorporating editorial salience will be of
specific value if you are seeking to emphasise immediately apparent,
curated insights.
Ideas and inspiration: In the process of sketching out your ideas and
capturing thoughts about possible sources of influence, maybe there
were already certain colours you had identified as being consistent
with your thinking about this subject? Additionally, you might have
already identified some colours you wish to avoid using.
476
Working With Data
Data examination: The characteristics of your data will naturally
have a huge impact, on the decisions you make around data legibility.
Firstly, the type of data you are displaying (primarily nominal vs all
other types) will require a different colour treatment, as explained.
Secondly, the range of categorical colour associations (limits on
discernible hues) and the range and distribution of quantitative values
(numbers of divisions and definition of the intervals across your
classification scale) will be directly shaped by the work you did in the
examination stage.
In Figure 9.28 you can see a census of the prevalence and species of
trees found around the boroughs of New York City. This initial big-
picture view creates a beautiful tapestry made up of tree populations
across the region (notice the big void where JFK Airport is located).
Figure 9.28 Excerpt from ‘NYC Street Trees by Species’
To observe patterns for individual tree types is harder: with 52
different tree species there are simply too many classifications to be
able to allocate sufficiently unique colours to each. To overcome this,
the project features a useful pop-up filter list which then allows you to
adjust the data on view to reveal the species you wish to explore.
It is often the case when thinking about colour classifications that you
may need to revisit the data transformation actions to find new ways
of grouping your data to create better-fit quantitative value
classifications or to look at ways of grouping your categories. For the
latter, actions such as combining less important categories in an
477
‘other’ bin to reduce the variability or eliminating certain values from
your analysis may be necessary.
‘If using colour to identify certain data, be careful to not accidentally
apply the same identity to a nearby part of the graphic. Don’t allow
colour to confuse just for the sake of aesthetics. I also like to use colour
to highlight. A single colour highlight on a palette of muted colours can
be a strong way to draw attention to key information.’ Simon Scarr,
Deputy Head of Graphics, ThomsonReuters
Establishing Your Editorial Thinking
Focus: When considering the perspective of ‘focus’ in the editorial
thinking stage, you were defining which, if any, elements of content
would merit being emphasised. Are there features of your analysis
that you might wish to accentuate? How might colour be used to
accentuate key insights in the foreground and push other (less
important) features into the background? What are the characteristics
of your data that you might want to emphasise through changes in
colour? For example, are there certain threshold values that will need
to be visually amplified if exceeded? Your decisions here will directly
influence your thinking about using colour to facilitate editorial
salience.
Data Representation
Chart type choice: Specifically in relation to data legibility,
depending on which chart type you selected to portray your data, this
may have attributes requiring decisions about colour. The heat map
and choropleth map are just two examples that use variation in colour
to encode quantitative value. Almost every chart has the potential to
use colour for categorical differentiation.
Trustworthy Design
Data classification: The decisions you make about how to encode
data through colour have a great bearing on the legibility and
accuracy of your design, especially with quantitative data. You will
478
need to ensure the classifications present a true reflection of the shape
and characteristics of your data and do not suppress any significant
interpretations.
‘Start with black and white, and only introduce color when it has
relevant meaning. In general, use color very sparingly.’ Nigel Holmes,
Explanation Graphic Designer
Meaningful: Eliminating arbitrary decisions is not just about
increasing the sophistication of your design thinking, it is also an
essential part of delivering a trustworthy design. If something looks
visually significant in its data or editorial colouring it will be read as
such, so make sure it is significant, otherwise remove it. You
especially want to avoid any connotation of significant meaning
across your functional or decorative colour choices. This will be
confusing at best, or will appear deceptive at worst.
Do not try to make something look more interesting than it
fundamentally is. Colour should not be used to decorate data. You
might temporarily boost the apparent appeal of your work in the eye
of the viewer but this will be short-lived and artificial.
Illusions: The relationship between a foreground colour and a
background one can create distorting illusions that modify the
perceived judgement of a colour. You saw an effect of this earlier
with the inverted area chart showing ‘Gun deaths in Florida’,
whereby the rising white mountain was seen by some as the
foreground data, when in fact it was the background emptiness
framed by the red area of data and the axis line. Illusions can affect
all dimensions of colour perception. There are simply too many to
mention here and they are hard to legislate for entirely; it is really
more about mentioning that you need to be aware of these as a
consequence of your colour choices.
Accessible Design
Consistency: Consistency in the use of colours helps to avoid visual
chaos and confusion and minimises cognitive effort. When you
establish association through colour you need to maintain that
meaning for as long as possible. Once a viewer has allocated time and
effort to learn what colours represent, that association becomes
479
locked down in the eye and the mind. However, if you then allocate
the same colour(s) to mean something different (within the same
graphic or on a different page/screen view) this creates an additional
cognitive burden. The viewer has almost to disregard the previous
association and learn the new one. This demands effort that
undermines the accessibility of your design.
Sometimes this can prove difficult, especially if you have a restricted
colour palette. The main advice here is to try to maximise the ‘space’
between occasions of the same colour meaning different things. This space
may be physical (different pages, interactive views), time (the simple
duration of reading between the associations being changed) or editorial
(new subject matter, new angle of analysis). Such space effectively helps
to clean the palate (pun intended). Of course, at the point of any new
assignment in your colour usage, clear explanations are mandatory.
Visual accessibility: Approximately 5% of the population have visual
impairments that compromise their ability to discern particular colours and
colour combinations. Deuteranopia is the most common form, often
known as red–green colour blindness, and is a particular genetic issue
associated with men. The traffic light scheme of green = ‘good’, red =
‘bad’ is a widespread approach for using colour as an indicator. It is a
convenient and common metaphor and the reasons for its use are entirely
understandable. However, as demonstrated in the pair of graphics in Figure
9.29, looking at some word-usage sentiment analysis, the reds and greens
that most of us would easily discern (from the left graphic) are often not at
all distinguishable for those with colour blindness (simulated on the right).
Figure 9.29 Demonstrating the Impact of Red-green Colour Blindness
(deuteranopia)
480
Of course, if you have a particularly known, finite and fixed audience then
you can easily discover if any colour-blindness issues do in fact exist.
However, if your audience is much larger and you are going to
alienate potentially 1 person in every 14, in which case the use of the
default red–green colour combination is not acceptable. Be more sensitive
to your viewers by considering other options:
Figure 9.30 Colour-blind Friendly Alternatives to Green and Red
If you are working on an interactive solution, you may consider
having a toggle option to switch between different colour modes. For
481
print outputs you might normally have reduced flexibility, but in
certain circumstances the option of creating dual versions (second
output for colour-impaired viewers) may be legitimate.
Connotations and congruence: Whether it is in politics, sport,
brands or in nature, there are many subjects that already have
established colour associations you can possibly look to exploit. This
association may sit directly with the data, such as the normal colour
associations for political party categories, or more through the
meaning of the data, such as perhaps through the use of green to
present analysis about ecological topics.
In support of accessible design, exploiting pre-existing colour
associations in your work can create more immediacy in subject
recognition. You might also benefit from the colour learning
experiences your viewers may already have gone through. This
provides a shortcut to understanding through familiarity.
However, while some colour connotations can be a good thing, in
some cases they can be a bad thing and possibly should be avoided.
You need to be considerate of and sensitive to any colour usage to
ensure that you do not employ connotations that may have a negative
implication and may evoke strong emotions and reactions from
people.
Sometimes a colour is simply incongruent with a subject. You would
not use bright, happy colours if you were portraying data about death
or disease. Earlier, in the ‘Vision: Ideas’ section, I described a project
context where I knew I wanted to avoid the use of blue colours in a
particular project about psychotherapy treatment in the Arctic,
because it would carry an unwelcome clichéd association given the
subject matter. The use of ‘typical’ skin colours to represent ethnic
groups in a visualisation is something that would be immediately
clumsy (at best) and offensive (at worst).
Cultural sensitivities and inconsistencies are also important to
consider. In China, for example, red is a lucky colour and so the use
of red in their stock market displays, for example, indicates the rising
values. A sea of red on the FTSE or Dow Jones implies the opposite.
In Western society red is often the signal for a warning or danger.
Occasionally established colour associations are out of sync with
contemporary culture or society. For example, when you think about
colour and the matter of gender, because it has been so endlessly
utilised down the years, it is almost impossible not to think
instinctively about the use of blue (boys) and pink (girls). My
482
personal preference is to avoid this association entirely. I agree with
so many commentators out there that the association of pink to
signify the female gender, in particular, is clichéd, outdated and no
longer fit for purpose. It is not too much to expect viewers to learn the
association of – at most – two new colours for representing gender.
Elegant Design
Unity: As I alluded to in the discussion about using colours for
editorial salience, colour choices are always about contrast. The effect
of using one colour is not isolated to just that instance of colour:
choosing one colour will automatically create a relationship with
another. There is always a minimum of two colours in any
visualisation – a foreground and background colour – but generally
there are many more.
We notice the impact of colour decisions more when they are done
badly. Inconsistent and poorly integrated colour combinations create
jarring and discordant results. If we do not consciously notice colour
decisions this probably means they have been seamlessly blended into
the fabric of the overall communication.
Neutral colouring: Even if there is no relevance in the use of colour
for quantitative or categorical classifications, you still have to give
your chart some colour, otherwise it will be invisible. The decision
you make will depend again on the relative harmony with other
colour features but should also avoid unnecessarily ‘using up’ a
useful colour. Suppose you colour your bars in blue but then
elsewhere across your visualisation project blue would have been a
useful colour to show something meaningful; you then have
unnecessarily taken blue out of the reckoning. My default choice is to
go with grey to begin with (Figure 9.31) and only use a colour if there
is a suitable and available colour not used elsewhere or if it needs to
be left as a back- or mid-ground artefact to preserve prominence
elsewhere in the display.
Figure 9.31 Excerpt from ‘Pyschotherapy in The Arctic’
483
Justified: Achieving elegant design is about eliminating the arbitrary.
In thinking about colour usage I often get quite tough with myself. If I
want to show any feature on my visualisation display I have to seek
permission from myself to unlock access to the more vibrant colours
by justifying why I should be allowed to use and apply that colour (I
know what you’re thinking, ‘what a fun existence this guy leads’).
Elegance in visualisation design is often about using only the colours
you need to use and avoiding the temptation to inject unnecessary
decoration. The Wind Map project (Figure 9.32) demonstrates
unquestionable elegance and yet uses only a monochromatic palette.
There is no colouring of the sea, no topographic detail, no
emphasising of any extreme wind speed thresholds being reached.
The resulting elegance is quite evident: the map has artistic and
functional beauty.
To emphasise again, I am not advocating a need to pursue
minimalism: while you can create incredibly elegant and detailed
works from a limited palette of colours, justifying the use of colours
is not the same as unnecessarily restricting the use of colour.
Feels right: The last component of influence is yourself. Sometimes
you will just find colours that feel right and look good when you
apply them to your work. There is maybe no underlying science
behind such choices, and as such you will simply need to back your
own instinctive judgement as an astute visualiser and know when
something looks good. Creating the right type of visual appeal,
something that is pleasing to the eye and equally fit for purpose in all
484
the functional ways I have outlined, is a hard balance to achieve, but
you will find that weighing up all these different components of
influence alongside your own flair for design judgement will give you
the best chance of getting there.
Figure 9.32 Wind Map
Summary: Colour
Data legibility involves using colours to represent different types of data.
The most appropriate colour association or scale decisions will depend on
the data type: nominal (qualitative), ordinal (qualitative), interval and ratio
(quantitative).
Editorial salience is about using colour to direct the eye. For which
features and to what degree of emphasis do you want to create contrast?
Functional harmony concerns deciding about every other colour property
as applied to all interactive features, annotations and aspects of your
composition thinking.
Influencing Factors and Considerations
Formulating the brief: format, setting, colour rules and imposed
guidelines all have a significant impact. Your definitions about both
tone and and experience, on the purpose map, will lead to specific
485
choices being more suitable than others. What initial ideas did you
form? Have any sources of inspiration already implanted ideas inside
your head about which colours you could use?
Working with data: what type of data and what range of
values/number of classifications have you got?
Establishing your editorial thinking: what things do you want to
emphasise or direct the eye towards (focus)?
Data representation: certain chart type choices will already include
colour as an encoded attribute.
Trustworthy design: ensure that your colour choices are faithful to the
shape of your data and the integrity of your insights. If something
looks meaningful it should be, otherwise it will confuse or deceive.
Accessible design: once you’ve committed colour to mean something
preserve the consistency of association for as long as possible. Be
aware of the sensitivities around visual accessibility and
positive/negative colour connotations.
Elegant design: the perception of colours is relative so the unity of
your choices needs to be upheld. Ensure that you can justify every dot
of colour used and, ultimately, rely on your own judgment to
determine when your final palette feels right.
Tips and Tactics
Use the squint test: shrink things down and/or half close your eyes to
see what coloured properties are most prominent and visible – are
these the right ones?
Experimentation: trial and error is still often required in colour,
despite the common sense and foundation of science attached to it.
Developing a personal style guide for colour usage saves you the pain
of having to think from scratch every time and will help your work
become more immediately identifiable (which may or may not be an
important factor).
Make life easier by ensuring your preferred (or imposed) colour
palettes are loaded up into any tool you are using, even if it is just the
tool you are using for analysis rather than for the final presentation of
your work.
If you are creating for print, make sure you do test print runs of the
draft work to see how your colours are looking – do not wait for the
first print when you (think you) have finished your process.
486
487
10 Composition
Composition concerns making careful decisions about the physical
attributes of, and relationships between, every visual property to ensure the
optimum readability and meaning of the overall, cohesive project.
Composition is the final layer of your design anatomy, but this should not
imply that it is the least important part of your design workflow. Far from
it. It is simply that now is the most logical time to think about this, because
only at this point will you have established clarity about what content to
include in your work. As I explained, this final layer of design thinking,
along with colour, is no longer about what elements will be included but
how they will appear. Composition is a critical component of any design
discipline. The care and attention afforded in the precision of your
composition thinking will continue until the final dot or pixel has been
considered.
Visual assets such as your chart(s), interactive controls and annotations all
occupy space. In this chapter you will be judging what is the best way to
use space in terms of the position, size and shape of every visible property.
In many respects these individual dimensions of thought are inseparable
and so, similar to the discussion about annotation, the division in thinking
is separated between project- and chart-level composition options:
Project composition: defining the layout and hierarchy of the entire
visualisation project.
Chart composition: defining the shape, size and layout choices for all
components within your charts.
10.1 Features of Composition: Project
Composition
This first aspect of composition design concerns how you might lay out
and size all the visual content in your project to establish a meaningful
hierarchy and sequence. Content, in this case, means all of your charts,
interactive operations and elements of annotation.
488
Where will you put all of this, what size will it be and why? How will the
hierarchy (across views) and sequencing (within a view) best fit the space
you have to work in? How will you convey the relative importance and
provide a connected narrative where necessary?
I will shortly run through all the key factors that will influence your
decisions, but it is worth emphasising that so much about composition
thinking is rooted in common sense and involves a process of iteration
towards what feels like an optimum layout. Of course, there are certain
established conventions, such as the positioning of titles first or at the top
(usually left or centrally aligned). Introductions are inevitably useful to
offer early, whereas footnotes detailing data sources and credits might be
of least importance, relatively speaking. You might choose to show the
main features first, exploiting the initial attention afforded by your
audience, or you may wish to build up to this, starting off with contextual
content before the big ‘reveal’.
Figure 10.1 City of Anarchy
489
490
The hierarchy of content is not just a function of relative position through
layout design, it can also be achieved through the relative variation in size
of the contents. Just as variation in colour implies significance, so too does
variation in size: a chart that is larger than another chart will imply that the
analysis it is displaying carries greater importance.
The ‘City of anarchy’ infographic demonstrates a clear visual hierarchy
across its design. There is a primary focal point of the main subject
‘cutaway’ illustration in the centre with a small thumbnail image above it
for orientation. At the bottom there are small supplementary illustrations to
provide further information. It is clear through their relative placement at
the bottom of the page and their more diminutive stature that they are of
somewhat incidental import compared with the main detail in the centre.
There are generally two approaches for shaping your ideas about this
project-level composition activity, depending on your entry-point
perspective: wireframing and storyboarding. I profiled these at the start
of this part of the book, but it is worth reinforcing their role now you are
focusing on this section of design thinking.
Wireframing involves sketching the potential layout and size of all the
major contents of your design thinking across a single-page view.
This might be the approach you take when working on an infographic
or any digital project where all the interactive functions are contained
within a single-screen view rather than navigating users elsewhere.
Any interactive controls included would have a description within the
wireframe sketch to explain the functions they would trigger.
Figure 10.2 is an early wireframe drawn by Giorgia Lupi when
shaping up her early thoughts about the potential layout of a graphic
exploring various characteristics of Nobel prizes and laureates
between 1901 and 2012.
Figure 10.2 Wireframe Sketch
491
Storyboarding is something you would undertake with wireframing if
you have a project that will entail multiple pages or many different
views and you want to establish a high-level feel for the overall
architecture of content, its navigation and sequencing. This would be
an approach relevant for linear outputs like discrete sequences in
reports, presentation slides or video graphics, or for non-linear
navigation around different pages of a multi-faceted interactive. The
individual page views included as cells in this big-picture hierarchy
will each merit more detailed wireframing versions to determine how
their within-page content will be sized and arranged, and how the
navigation between views would operate.
With both wireframing and storyboarding activities all you are
working towards, at this stage, are low-fidelity sketched concepts.
Whether this sketching is on paper or using a quick layout tool does
not matter; it just needs to capture with moderate precision the
essence of your early thinking about the spatial consequence of
bringing all your design choices together. Gradually, through further
iteration, the precision and finality of your solution will emerge.
492
10.2 Features of Composition: Chart
Composition
After establishing your thoughts about the overall layout, you will now
need to go deeper in your composition thinking and contemplate the
detailed spatial matters local to each chart, to optimise its legibility and
meaning. There are many different components to consider.
Chart size: Do not be afraid to shrink your charts. The eye can still
detect at quite small resolution and with great efficiency chart
attributes such as variation in size, position, colour, shape and pattern.
This supports the potential value of the small-multiples technique, an
approach that tends to be universally loved in data visualisation. As I
explained earlier, this technique offers an ideal solution for when you
are trying to display the same analysis for multiple categories or
multiple points in time. Providing all the information in a
simultaneous view means that viewers can efficiently observe overall
patterns as well as perform a more detailed inspection. Figure 10.3
provides a single view of a rugby team’s match patterns across the
first 12 matches of a season. Each line chart panel portrays the
cumulative scoring for the competing teams across the 80 minutes of
a match. The 12 match panels are arranged in chronological order,
from top left to bottom right, based on the date of the match.
Figure 10.3 Example of the Small Multiples Technique
493
The main obstacle to shrinking chart displays is the impact on text.
The eye will not cope too well with small fonts for value or category
labels, so there has to be a trade-off, as always, between the amount
of detail you show and the size you show it.
Chart scales: When considering your chart-scales try to think about
how you might use these to tell the viewer something meaningful.
This can be achieved through astute choices around the maximum
value ranges and also in the choice of suitable intervals for labelling
and gridline guides.
The maximum values that you assign to your chart scales, informed
by decisions around editorial framing, can be quite impactful in
surfacing key insights. You may recall the chart from earlier that
looked at the disproportionality of women CEO’s amongst the S&P
1500 companies. Figure 10.4 is another graphic on a similar subject,
which contextualises the relative progress in the rise of women CEOs
amongst the Fortune 500 companies. By setting the maximum y-axis
value range to reflect the level at which equality would exist, the
resulting empty space emphasises the significant gap that still
persists.
Figure 10.4 Reworking of ‘The Glass Ceiling Persists’
494
Figure 10.5 Fast-food Purchasers Report More Demands on Their
Time
495
Figure 10.5 shows how the lack of careful thought about your scales
can undermine the ease of readability. This chart shows how
American adults spend their time on different activities. The analysis
is broken down into minutes and so the maximum is set at 1440
minutes in a day. For some reason, the y-axis labels and the
associated horizontal gridlines are displayed at intervals of 160
minutes. This is an entirely meaningless quantity of time so why
divide the day up into nine intervals? To help viewers perceive the
significance and size of the different stacked activities it would have
been far more logical to use 60-minute time intervals as that is how
we tend to think when dividing our daily schedule.
Chart orientation: Decisions about the orientation of your chart and
its contents can sometimes help squeeze out an extra degree of
readability and meaning from your display.
Figure 10.6 Illustrating the Effect of Chartorientation Decisions
496
The primary concern about chart orientation is towards the legibility
of labels along the axis. A vertical bar chart, with multiple categories
along the x-axis, will present a challenge of making the labels legible
and avoiding them overlapping. Ideally you would want to preserve
label reading in line with the eye, but you might need to adjust their
orientation to either 45° or 90°. My preference for handling this with
bar charts is to switch the orientation of the chart and to then have
497
much more compatible horizontal space to accommodate the labels.
The meaning of your subject’s data may also influence your choice.
While there may have been constraints on the dimension of space in
its native setting, Figure 10.6, portraying the split of political parties
in Germany, feels like a missed opportunity to display a political axis
of the Left and the Right through using a landscape rather than
portrait layout.
As you saw earlier, the graphic about ‘Iraq’s bloody toll’ (Figure
1.11) uses an inverted bar chart to create a potent display of data that
effectively conveys the subject matter, but importantly does so
without introducing any unnecessary obstacles in readability.
In the previous section I presented a wireframe sketch of a graphic
about Nobel prize winners. Figure 10.7 shows the final design. Notice
how the original concept of the novel diagonal orientation was
accomplished in the final composition, exploiting the greater room
that this dimension of space offers within the page. It feels quite
audacious to do this in a newspaper setting.
Figure 10.7 Nobels no Degrees
498
Figure 10.8 Kasich Could Be The GOP’s Moderate Backstop
Figure 10.8, from FiveThirtyEight, rotates the scatter plot by 45° and
then overlays a 2 × 2 grid which helps to guide the viewer’s
interpretation by making it easier to observe which values are located
in each quadrant. It is also used to emphasise the distinction between
location in the top and bottom halves of the chart along the axis of
popularity, essentially the primary focus of the analysis.
Although the LATCH and CHRTS acronyms share some similarities,
the application of each concerns entirely different aspects of your design
thinking. They are independent of one another. A bar chart, which
belongs to the categorical (C) family of charts, could have its data
potentially sorted by location, alphabet, time, category or hierarchy.
499
Chart value sorting: Sorting content within a chart is important for
helping viewers to find and compare quickly the most relevant
content. One of the best ways to consider the options for value sorting
comes from using the LATCH acronym, devised by Richard Saul
Wurman, which stands for the five ways of organising displays of
data: Location, Alphabet, Time, Category or Hierarchy.
Location sorting involves sequencing content according to the order
of a spatial dimension. This does not refer to sorting data on a map
locations are fixed, rather it could be sorting data by geographical
spatial relationships (such as presenting data for all the stops along a
subway route) or a non-geographical spatial relationship (like a
sequence based on the position of major parts of the body from head
to toe). You should order by location only when you believe it offers
the most logical sequence for the readability of the display or if there
is likely to be interest or significance in the comparison of
neighbouring values. An example of location sorting is displayed in
‘On Broadway’ (Figure 10.9) on the following page, an interactive
installation that stitches together a sequenced compilation of data and
media related to 30 metre intervals of life along the 13 miles (21 km)
of Broadway that stretches across the length of Manhattan. This
continuous narrative offers compelling views of the fluctuating
characteristics as you transport yourself down the spine of the city.
Figure 10.9 On Broadway
Alphabetical sorting is a cataloguing approach that facilitates efficient
lookup and reference. Only on rare occasions, when you are
500
especially keen to offer convenient ordering for looking up
categorical values, will you find that alphabetical sorting alone offers
the best sequence. In Figure 10.10, investigating different measures of
waiting times in emergency rooms across the United States, the bar
charts are presented based on the alphabetical sorting of each state.
This is the default setting but users can also choose to reorder the
table hierarchically based on the increasing/decreasing values across
the four columns.
Data representation techniques that display overlapping connections,
like Sankey diagrams, slope graphs and chord diagrams, also introduce
the need to contemplate value sorting in the z-dimension: that is, which
of these connections will be above and which will be below, and why.
Alphabetical sorting might be seen as a suitably diplomatic option
should you not wish to imply any ranking significance that would be
displayed when sorting by any other dimension. Additionally, there is
a lot of sense in employing alphabetical ordering for values listed in
dropdown menus as this offers the most immediate way for viewers
to quickly find the options they are interested in selecting.
Figure 10.10 ER Wait Watcher: Which Emergency Room Will See
You the Fastest?
501
Time-based sorting is used when the data has a relevant chronological
sequence and you wish to display and compare how changes have
progressed over time. In Figure 10.11, you can see a snapshot of a
graphic that portrays the rain patterns in Hong Kong since 1990. Each
row of data represents a full year of 365/366 daily readings running
from left to right. The subject matter and likely interest in the
seasonality of patterns make chronological ordering a common-sense
choice.
Figure 10.11 Rain Patterns
Categorical sorting can be usefully applied to a sequence of
categories that have a logical hierarchy implied by their values or
unique to the subject matter. For example, if you were presenting
analysis about football players you might organise a chart based on
the general order of their typical positions in a team (goalkeeper >
defenders > midfielders > forwards) or use seniority levels as a way
to present analysis about staff numbers. Alternatively, if you have
ordinal data you can logically sort the values according to their
inherent hierarchy. In Figure 10.12, that you saw earlier in the profile
of ordinal colours, the columns are sequenced left to right in order
from ‘major deterioration’ to ‘major improvement’, to help reveal the
balance of treatment outcomes from a sample of psychotherapy
clients.
Figure 10.12 Excerpt from ‘Pyschotherapy in The Arctic’
502
Hierarchical sorting organises data by increasing or decreasing
quantities so a viewer can efficiently perceive the size, distribution
and underlying ranking of values. In Figure 10.13, showing the
highest typical salaries for women in the US, based on analysis of
data from the US Bureau of Labour Statistics, the sorting arrangement
presents the values by descending quantity to reveal the highest
rankings values.
Figure 10.13 Excerpt from ‘Gender Pay Gap US’
In Figure 10.12 the bubbles in each column do not need to be coloured
as their position already provides a visual association with the
‘deterioration’ through to ‘improvement’ ordinal categories. The
attribute of colour, specifically, can therefore be considered redundant
encoding. However, you might still choose to include this redundancy if
503
you believed it aided the immediacy of association and distinction. In
this case, the chart was part of a larger graphic that employed the same
colour associations across several different charts and therefore it made
sense to preserve this association.
10.3 Influencing Factors and Considerations
You are now familiar with the array of various aspects of composition
thinking. At this point you will need to weigh up your decisions on how
you might employ these in your own work. Here are some of the specific
factors to bear in mind.
Formulating Your Brief
Format: Naturally, as composition is about spatial arrangement, the
nature and dimensions of the canvas you have to work with will have
a fundamental bearing on the decisions you make. There are two
concerns here: what will be the shape and size of the primary format
and how transferable will your solution be across the different
platforms on which it might be used or consumed?
Another factor surrounding format concerns the mobility of viewing
the work. If the form of your output enables viewers to easily move a
display or move around a display in a circular plane (such as looking
at a printout or work on a tablet) this means that issues such as label
orientation can be largely cast aside. If your output is going to be
consumed in a relatively fixed setting (desktop/laptop or via a
presentation) the flexibility of viewing positions will be restricted.
Working With Data
Data examination: Not surprisingly, the shape and size of your data
will directly influence your chart composition decisions. When
discussing physical properties in Chapter 4, I described the influence
of quantitative values with legitimate outliers distorting ideal scale
choices. One solution for dealing with this is to use a non-linear
logarithmic (often just known as a ‘log’) scale. Essentially, each
major interval along a log scale increases the value at that marked
position by a factor of 10 (or by one order of magnitude) rather than
504
by equal increments. In Figure 10.14, looking at ratings for thousands
of different board games, the x-axis is presented on a log scale in
order to accommodate the wide range of values for the ‘Number of
ratings’ measure and to help fit the analysis into a square-chart layout.
Had the x-axis remained as a linear scale, to preserve a square layout
would have meant squashing values below 1000 into such a tightly
packed space that you would hardly see the patterns. Alternatively, a
wide rectangular chart would have been necessary but impractical
given the limitations of the space this chart would occupy.
I have great sympathy for the challenges faced by designers like
Zimbabwe-based Graham van de Ruit, when working on typesetting a
book titled Millions, Billions, Trillions: Letters from Zimbabwe,
2005−2009 in 2014. The book was all text, apart from one or two
tables. One of the tables of data supplied to Graham showed
Zimbabwe’s historical monthly inflation rates, which, as you can see
(Figure 10.15), included some incredibly diverse values.
I love the subtle audacity of Graham’s solution. Even though it is
presented in tabular form there is a strong visual impact created by
allowing the sheer spatial consequence of the exceptional mid-2008
numbers to cause the awkward widening of the final column. I think
this makes the point much more effectively than a chart might, in this
case.
Figure 10.14 The Worst Board Games Ever Invented
505
Figure 10.15 From Millions, Billions, Trillions: Letters from Zimbabwe,
2005−2009
506
‘I thought that a graph might be more effective, but I quickly realised
that the scale would be a big challenge… The whole point of graphing
would have been to show the huge leap in 2008, something that I felt
the log scale would detract from and was impractical with the space
constraints. I also felt that a log scale might not be intuitive to the target
audience.’ Graham van de Ruit, Editorial and Information Designer
Establishing Your Editorial Thinking
Angles: The greater the number of different angles of analysis you
wish to cover in your work, the greater the challenge will be to
seamlessly accommodate the resulting chart displays in one view. The
more content you include increases the need to contemplate
reductions in the size of charts or a non-simultaneous arrangement,
perhaps through multi-page sequences with interactive navigation.
In defining your editorial perspectives, you will have likely
established some sense of hierarchy that might inform which angles
should be more prominent (regarding layout position and size) and
which less so. There might also be some inherent narrative binding
507
each slice of analysis that lends itself to being presented in a
deliberate sequence.
Data Representation
Chart type choice: Different charts have different spatial
consequences. A treemap generally occupies far more space than a
pie chart simply because there are many more ‘parts’ being shown. A
polar chart is circular in shape, whereas a waffle chart is squared.
With each chart you include you will have a uniquely shaped piece
that will form part of the overall jigsaw puzzle. Inevitably there will
be some shuffling of content to find the right size and placement
balance.
The table in Figure 10.16 summarises the main chart structures and
the typical shapes they occupy. This list is based only on the charts
included in the Chapter 6 gallery but still offers a reasonable
compilation of the main structures. These are ordered in descending
frequency as per the distribution of the different structures of charts in
the gallery.
Figure 10.16 List of chart structures
508
Trustworthy Design
Chart-scale optimisation: Decisions about chart scales concern the
maximum, minimum and interval choices that ensure integrity
through the representation as well as optimise readability.
Firstly, let’s look at decisions around minimum values used on the
quantitative value axis, known as the origin, and the reasons why it is
not OK for you to truncate the axis in methods like the bar chart. Any
data representation where the attribute of size is used to encode a
quantitative value needs to show the full, true size, nothing more and
nothing less. The origin needs to be zero. When you truncate a bar
chart’s quantitative value axis you distort the perceived length or
height of the bar. Visualisers are often tempted to crop axis scales
when values are large and the differences between categories are
small. However, as you can see in Figure 10.17, the consequence is
that it creates the impression of highly noticeable relative difference
between values when the absolute values do not support this.
509
Figure 10.17 Illustrating the Effect of Truncated Bar Axis Scales
The single instance in which it is remotely reasonable to truncate an
axis would be if you had a main graphic which effectively offered a
thumbnail view of the whole chart for orientation positioned
alongside a separate associated chart (similar to that on the right).
This separate chart might have a truncated axis that would provide a
magnified view of the main chart, showing just the tips of the bar, to
help viewers see the differences close up.
In contrast to the bar chart, a line chart does not necessarily need
always to have a zero origin for the value axis (normally the y-axis).
A line chart’s encoding involves a series of connected lines (marks)
joining up continuous values based on their absolute position along a
scale (attribute). It therefore does not encode quantitative values
through size, like the bar chart does, so the truncation of a value axis
will not unduly impact on perceiving the relative values against the
scale and the general trajectory. For some data contexts the notion of
a zero quantity might be impossible to achieve. In Figure 10.18,
showing 100m sprint record times, no human is ever going to be able
to run 100m in anywhere near zero seconds. Times have improved, of
course, but there is a physical limit to what can be achieved. To show
this analysis with the y-axis starting from zero would be unnecessary
and even more so if you plotted similar analysis for longer distance
races.
However, if you were to plot the 100m results and the 400m results
on the same chart, you would need to start from zero to enable
orientation of the scale of comparable values. This sense of
comparable scale is missing from the next chart, whereby including
the full quantitative value range down to zero would be necessary to
perceive the relative scale of attitudes towards same-sex marriage.
The chart’s y-axis appears to start from an origin of 20 but as we are
510
looking at part-to-whole analysis, the y-axis should really be
displayed from an origin of zero. The maximum doesn’t need to go
up to 100%, the highest observed value is fine in this case, but it
could be interesting to set the maximum range to 100% in order to
create a similar sense of the gap to be bridged before 100% of
respondents are in agreement.
Figure 10.18 Excerpt from ‘Doping under the Microscope’
Figure 10.19 Record-high 60% of Americans Support Same-sex
Marriage
Aspect ratios: The aspect ratio of a line chart, as derived from the
height and width dimensions of the chart area, can have a large
impact on the perceived trends presented. If the chart is too narrow,
511
the steepness of connections will be embellished and look more
significant; if the chart is stretched out too wide, the steepness of
slopes will be much more dampened and key trends may be
somewhat disguised. There is no absolutely right or wrong approach
here but clearly there is a need for sensitivity to avoid the possibility
of unintended deception. A general rule of thumb is to seek a chart
area that enables the average slope to be presented at 45°, though this
is not something that can be easily and practically applied, especially
as there are many other variables at play, such as the range of
quantitative and time values and the scales being used. My advice is
just to make a pragmatic judgement by eye to find the ratio that you
think is faithful to the significance of the trends in your data.
Mapping projections: One of the most contentious matters in the
visual representation of data relates to thematic mapping and
specifically to the choice of map projection used. The Earth is not flat
(hopefully no contention there, otherwise this discussion is rather
academic), yet the dominant form through which maps are presented
portrays the Earth as being just that. Features such as size, shape and
distance can be measured accurately on Earth but when projected on a
flat surface a compromise has to occur. Only some of these qualities
can be preserved and represented accurately.
I qualify this with ‘dominant’ because, increasingly, advances in
technology (such as WebGL) mean we can now interact with spherical
portrayals of the Earth within a 2D space.
There are lots of exceptionally complicated calculations attached to
the variety of spatial projections. The main things you need to know
about projection mapping are that:
every type of map projection has some sort of distortion;
the larger the area of the Earth portrayed as a flat map, the
greater the distortion;
there is no single right answer – it is often about choosing the
least-worst case.
Thematic mapping (as opposed to mapping spatially for navigation or
reference purposes) is generally best portrayed using mapping projections
based on ‘equal-area’ calculations (so the sacrifice is more on the shape,
not the size). This ensures that the phenomena per unit – the values you are
512
typically plotting – are correctly represented by proportion of regional
area. For choosing the best specific projection, in the absence of perfect,
damage limitation is often the key: that is, which choice will distort the
spatial truth the least given the level of mapping required. There are so
many variables at play, however, based on the scope of view (world,
continent, or country/sub-region), the potential distance from the equator
of your region of focus and whether you are focusing on land, sea or sky
(atmosphere), to name but a few. As with many other topics in this field, a
discussion about mapping projections requires a dedicated text but let me
at least offer a brief outline of five different projections to begin your
acquaintance:
Many tools that offer rudimentary mapping options will tend to only
come with a default (non-adjustable) projection, often the Mercator (or
Web Mercator). The more advanced geospatial analysis tools will offer
pre-loaded or add-in options to broaden and customise the range of
projections. Hopefully, in time, an increasing range of the more
pragmatic desktop tools will enhance projection customisations.
Figure 10.20 A Selection of Commonly Deployed Mapping Projections
513
Accessible Design
Good design is unobtrusive: One of the main obstructions to
facilitating understanding through a visualisation design is when
viewers are required to rely on their memory to perform comparisons
between non-simultaneous views.
When the composition layout requires viewers to flick between pages
or interactively generated views, they have to try store one view in
their mind and then mentally compare that against the live view that
514
has arrived on the screen. This is too hard and too likely to fail given
the relatively weak performance of the brain’s working memory.
Content that warrants direct comparison should be enabled through
proximity to and alignment with related items. I mentioned in the
section on animation that if you want to compare different states over
time, rather than see the connected system of change, you will need to
have access to the ‘moment’ views simultaneously and without a
reliance on memory.
‘Using our eyes to switch between different views that are visible
simultaneously has much lower cognitive load than consulting our
memory to compare a current view with what was seen before.’
Tamara Munzner taken from Visualization Analysis and Design
Elegant Design
‘I’m obsessed with alignments. Sloppy label placement on final files
causes my confidence in the designer to flag. What other details haven’t
been given full attention? Has the data been handled sloppily as well?
… On the flip side, clean, layered and logically built final files are a
thing of beauty and my confidence in the designer, and their attention to
detail, soars.’ Jen Christiansen, Graphics Editor at Scientific
American
Unity: As I discussed with colour, composition decisions are always
relative: an object’s place and its space occupied within a display
immediately create a relationship with everything else in the display.
Unity in composition provides a similar sense of harmony and
balance between all objects on show as was sought with colour. The
flow of content should feel logical and meaningful.
The enduring idea that elegance in design is most appreciated when it
is absent is just as relevant with composition. Look around and open
your eyes to composition that works and does not work, and
recognise the solutions that felt effortless as you read them and those
that felt punctured and confusing. This is again quite an elusive
concept and one that only comes with a mixture of common-sense
judgement, experience and exposure to inspiration from elsewhere.
Thoroughness: Precision positioning is the demonstration of
515
thoroughness and care that is so important in the pursuit of elegance.
You should aim to achieve pixel-perfect accuracy in the position and
size of every single property.
Think of the importance of absolute positioning in the context of
detailed architectural plans that outline the position of every fine
detail down to power sockets, door handles and the arc of a window’s
opening manoeuvre. A data visualiser has to commit to ultimate
precision and consistency because any shortcomings will be
immediately noticeable and will fundamentally impact on the
function of the work. If you do not feel a warm glow from every
emphatic snap-to-grid resize operation or upon seeing the results of a
mass alignment of page objects, you are not doing it right. (Honestly,
I am loads of fun to be around.
Summary: Composition
Project composition defines the layout and hierarchy of the entire
visualisation project and may include the following features:
Visual hierarchy – layout: how to arrange the position of elements?
Visual hierarchy – size: how to manage the hierarchy of element
sizes?
Absolute positioning: where specifically should certain elements be
placed?
Chart composition defines the shape, size and layout choices for all
components within your charts and may include the following features:
Chart size: don’t be afraid to shrink charts, so long as any labels are
still readable, and especially embrace the power of small multiple.
Chart scales: what are the most meaningful range of values given the
nature of the data?
Chart orientation: which way is best?
Chart value sorting: consider the most meaningful sorting
arrangement for your data and editorial focus, based on the LATCH
acronym.
Influencing Factors and Considerations
Formulating the brief: what space have you got to work within?
516
Working with data: what is the shape and size of your data and how
might this affect your chart design architecture?
Establishing your editorial thinking: how many different angles
(charts) might you need to include? Is there any specific focus for
these angles that might influence a sequence or hierarchy between
them?
Data representation: any chart has a spatial consequence – different
charts have different structures that will create different dimensions
that will need to be accommodated.
Trustworthy design: the integrity and meaning of your chart scale,
chart dimensions, and (for mapping) your projection choices are
paramount.
Accessible design: remember that good design is unobtrusive – if you
want to facilitate comparisons between different chart displays these
ideally need to be presented within a simultaneous view.
Elegant design: unity of arrangement is another of the finger-tip sense
judgments but will be something achieved by careful thinking about
the relationships between all components of your work.
Tips and Tactics
You will find that as you reach the latter stages of your design
process, the task of nudging things by fractions of a pixel and
realigning features will dominate your attention. As energy and
attention start to diminish you will need to maintain a commitment to
thoroughness and a pride in precision right through to the end!
Empty space is like punctuation in visual language: use it to break up
content when it needs that momentary pause, just as how a comma or
full stop is needed in a sentence. Do not be afraid to use empty space
more extensively across larger regions as a device to create impact.
Like the notes not played in jazz, effective visualisation design can
also be about the relationship between something and nothing.
517
518
Part C Developing Your Design Solution
6 Data Representation
7 Interactivity
8 Annotation
9 Colour
10 Composition