So, let’s move on to the next point for making good visualizations, which is context. And we hear “context” a lot in software development. What is it from a visualization standpoint?
Data can be misleading - even if it’s completely accurate - if it isn’t presented with some sort of context. And the example I gave here was two data points from the 1950’s showing traffic deaths in Connecticut in 1955 and deaths in Connecticut in 1956. And there’s a notation here saying that, “speed limits were more strictly enforced during this period”. Well, the numbers go down significantly from 325 to I think 277 or so. You can see a diagonal line going down.
So, you would think just from those two points that stricter enforced speed limits saved lives, but we really don’t know if that’s true. What we really would like to know is what happened before 1955. Were traffic accidents already going down? Was there some other factor in play?
What happened after 1956? Did traffic accidents stay low or did they spike right back up? Was it just an anomaly? And so, really to provide a better context of this, you really need to provide some data around that. What happened before? What happened after? Maybe compare that data to other similar states in that region, find out what happened during that period in New York and Connecticut, Rhode Island — states that didn’t have that stricter enforcement of speed limits — did their traffic accidents go down?
#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index |
|
Read More |
And it turns out that the stricter speed limits did save more lives, that the traffic deaths were going up in Connecticut beforehand. They were going up all across New England. They went down after the stricter enforcement of speed limits in Connecticut, and they did not go down in other states. When provided with that context of data, it becomes a whole lot more meaningful.
And that’s something you’d apply definitely to the reports you’re generating from your line of business applications. So, the key points to take away from this are you’re wanting to make sure that the representation of your data is proportional to the data itself. You want to make sure that if the data is one-dimensional, then the graph itself should be one-dimensional.
You want to provide context for your data. And I didn’t really mention this before, but if you’re dealing with dollar amounts over time you should always use real dollars. Real dollars, of course are adjusted for inflation as opposed to nominal dollars. Nominal dollars are just the dollars themselves you pay out at that time because what’s $1 in today's dollars is worth much less than $1 in 1970. Inflation takes its toll on that. And so, when printing monetary information over time you need to adjust for that. It’s just more realistic.
The next point that we’re going to talk about is data-ink. And so, this is data-ink as in like a pen. I just want to get that clear because if I say data ink, people could get a different idea. But this is data-ink. What’s that mean from a visualization standpoint with Tufte?
This is maybe the single biggest point that I took away from this book. The word “ink” here refers to the dark parts of your charts and graphs. It comes from the days we used to print everything. Nowadays you probably don’t print much any more, but we still use the word “ink” to represent the dark parts of the screen. So, there’s “ink” which is all the dark parts, and then there’s “data-ink.” That’s just the ink that represents the data.
And Tufte actually has a data-ink ratio, and that’s the data-ink/total ink. That’s the amount of data actually represented in your graph versus the amount of ink devoted to non-data stuff. The goal should be to get that number as high as possible within reason, and you do that by eliminating things that are unnecessary—things that don’t represent the data.
|
Read what InetSoft customers and partners have said about their selection of Style Scope for their solution for dashboard reporting. |
One example is a bar chart. Just think about one single bar in a bar chart. That bar represents one number and yet the bar has two vertical lines, and a line across the top, and sometimes they’ll print the number on top of that. It might be shaded, and there’s a lot of ink in that bar. It only represents one number. That data-ink ratio is very low.
It’s a tiny amount of data and a lot of ink whereas a three-dimensional bar is very, very, very low. So, these are just things to be conscious of that all of this extra ink is distracting the viewer. If you only want to communicate this one piece of data you don’t need all these extra lines to do that.