=Telling the Truth with Your Data

This is the continuation of the transcript of a webinar hosted by InetSoft in January 2018 on the topic of "Data Visualization How To Techniques." The speaker is Abhishek Gupta, product manager at InetSoft.

The next point is talking about telling the truth with your data. It’s called graphical integrity. As a viewer, as a consumer of data and graphical data, there’s nothing more annoying than being lied to by a report, or by a chart, or by a graph, and yet this happens all the time.

There are a few different ways that people will lie with their graphs. One way is that they will misrepresent the data that they’re showing. They’ll misrepresent changes in the data. Very often they’ll use the length of a line or the size of an object to represent the relative growth of something. So, you’ll see lines getting longer and longer over time to indicate that a number gets larger and larger over time.

But unless that object gets larger where the same proportion to the number gets larger, then that object is lying to you. So for example, here’s a graph that shows the mandated fuel economy for automobiles in the 1970’s. The government mandated the average fuel economy had to be 18 miles per gallon in 1978 ,and it went up every year until it was mandated to be 27.5 miles per gallon in 1985.

So over that period, that fuel economy was mandated to go up about 50% or so. But the graph that’s showing that information shows lines to represent those numbers. Those lines get bigger and bigger by about 8-10 times. So, a 50% increase in data is shown by about an 800% increase in the size of these lines that represent that data.

#1 Ranking: Read how InetSoft was rated #1 for user adoption in G2's user survey-based index Read More

The Graph Is Lying

The graph is totally lying to us and Tufty actually came up with a formula to describe this or measure this. He called it the lie factor. The lie factor is the size of the effect that’s shown in the graphic, the size of the effect of the data. So, if you’ve got an 800% increase in the size of your graphic change to represent only a 50% increase in the size of your data, then your lie factor is 800% divided by 50% which is what? 16.

So, you’re lying quite a bit. And this is the picture that we’re talking about here. This is from the New York Times. Now, these are credible news sources doing this. And sometimes they do it deliberately, but more often than not they do it because they want to make it look pretty. They think to make it look pretty is more important than actually showing the truthful data.

How do you show data that is one-dimensional? How do people show that? Do they really show it fairly? This is a really common problem. What you’ll see a lot of times is data points tend to be one-dimensional data points. I’ve got a graph I’m looking at here. It’s the number of doctors devoted to family practice for different years.

In 1964, it was 27%, 1975 it was 16%, and 1990 it was 12%. Those are three numbers. Those are single points. Each number represents just a single dimension of data—just a point. But the graph I’m looking at represents each one of those points by a picture of a doctor—a nice little drawing of a doctor holding a clipboard.

And the problem with this is that these pictures are two-dimensional. So, what the picture is doing is it has the height of each doctor. It’s proportional to those numbers—27, 16, and 12. But the picture doesn’t just have height. The picture also has width and as you know the size of an object is proportional both to its size and its width.

{{> ad72}}

And so, if you’re presenting one-dimensional data with a two-dimensional object the size of that object tends to be over-exaggerated as it gets larger because both the width and the height are both growing. And so, in this case here—27% and 12%--we’ve got a difference of what is that—about 100%? It a little bit more than doubled between 12-27%.

But these pictures—even though they’re twice as high, they’re actually about four times the size because when you double the height you also double the width. So, this is a lie factor of about four here because it’s representing 100% data increase with the 400% increase in size. That’s a really common problem.

What Are Some Other Ways a Chart Can Be Misleading?

Charts and graphs are powerful tools for visualizing data, but they can be manipulated—intentionally or unintentionally—to mislead or distort the interpretation of that data. Here are several ways in which a chart can be misleading:

1. Omitting the Baseline (Truncated Y-Axis)

  • How It Misleads: When a chart doesn't start the Y-axis at zero, it can exaggerate differences between data points, making small changes seem more significant.
  • Example: A bar chart that starts at 50 instead of zero could make a 3% increase appear like a huge jump when it's actually quite minor.

2. Inconsistent Scale

  • How It Misleads: Using inconsistent or irregular intervals on the X or Y axes distorts the proportional relationships between data points. This makes it difficult to compare data accurately.
  • Example: A line chart that uses unequal time intervals (e.g., monthly data for part of the chart and yearly data for another part) can skew trends and suggest a more dramatic change over time than what actually exists.

3. Selective Data Representation

  • How It Misleads: Presenting only a selective portion of data, or omitting key data points, can give a biased view of a trend or comparison.
  • Example: Showing data from only one good quarter of sales while omitting previous poor performance misleads the viewer into thinking that the company's sales are trending upwards when they might not be.

4. Improper Use of Logarithmic Scales

  • How It Misleads: Using a logarithmic scale for data where it is not appropriate can confuse viewers, as it compresses the data. This can hide large variations between data points.
  • Example: Displaying financial growth or election results on a log scale can make differences between percentages or quantities appear less significant than they are.

5. Exaggerating Proportions in Pie Charts

  • How It Misleads: Pie charts that don't add up to 100% (because of data omission or overlapping categories) or use disproportionate segment sizes can be misleading.
  • Example: If one slice of a pie chart is exaggerated in size through perspective or 3D effects, it can make that segment appear disproportionately large, even though its value may not be as significant.

6. Cherry-Picking the Time Frame

  • How It Misleads: Choosing a specific time frame that highlights a trend or outlier while ignoring the broader context can mislead viewers about overall performance or patterns.
  • Example: A stock price chart showing only a brief period of high growth might omit the subsequent crash, giving the false impression of sustained success.

7. Overuse of 3D Graphs

  • How It Misleads: 3D charts can distort the perception of data by making it harder to judge the size of data points or bars accurately. The added depth can create confusion about actual proportions.
  • Example: A 3D bar chart may make bars in the foreground appear larger than those in the background, even if their values are equal.

8. Manipulating Aspect Ratio

  • How It Misleads: Changing the aspect ratio (width and height) of a chart can distort the appearance of trends, making them appear steeper or flatter than they are.
  • Example: Stretching a line chart horizontally or vertically can either exaggerate or downplay fluctuations in the data, misleading the viewer's interpretation of trends.

9. Misleading Cumulative Charts

  • How It Misleads: Cumulative charts, where data builds up over time, can give the impression of continuous growth even if the rate of growth is slowing or stagnating.
  • Example: A cumulative sales chart might show a steady upward line even if sales are flatlining in the most recent period, giving a false sense of ongoing progress.

10. Using Unlabeled or Confusing Axes

  • How It Misleads: If a chart's axes are not labeled clearly or are mislabelled, viewers can misinterpret the data being presented, leading to incorrect conclusions.
  • Example: A line graph without a time axis label could confuse viewers as to whether they are looking at daily, monthly, or yearly trends.

11. Correlation vs. Causation

  • How It Misleads: Charts can present two variables that appear correlated without demonstrating a causal relationship. This can lead viewers to incorrectly assume that one variable directly influences the other.
  • Example: A line chart showing a correlation between ice cream sales and crime rates during the summer could imply that ice cream causes crime, when in fact, both increase due to warmer weather.

12. Misleading Use of Colors

  • How It Misleads: Using colors in a way that conveys meaning (e.g., red for bad, green for good) can manipulate how viewers interpret the data. Subtle color changes can also obscure differences.
  • Example: A heatmap where similar data points are colored in shades that are too close together can make large differences appear smaller than they are.

13. Using Irrelevant Comparisons

  • How It Misleads: Comparing unrelated data sets can create false associations between data points, misleading viewers about trends or relationships.
  • Example: A bar chart comparing revenue from different divisions of a company with the number of employees across the same divisions, without showing any meaningful connection between the two.

14. Distorting Proportions in Area Charts

  • How It Misleads: Area charts can mislead if the areas being compared do not scale linearly with the data. This can make certain data points seem more significant than they are.
  • Example: If the area of one segment of a chart is exaggerated relative to another, it can make that segment look disproportionately important.

15. Inconsistent Grouping or Binning

  • How It Misleads: Grouping data points into inconsistent intervals or bins can distort the visualization and hide important patterns or outliers.
  • Example: A histogram with unequal bin sizes might make one data range seem more prevalent than it actually is by skewing how the data is distributed.
Previous: Charts Built Without Computers