Last post we covered what works best with line charts. This post, we’ll be covering another standard chart: the bar chart.
The Bar Chart
Bar charts, as opposed to line charts, are best when each observation is relatively independent of the others. When that’s the case, reordering observations based on value doesn’t obscure the results, it actually make them more clear.
Three Types of Variables
Another way of saying “relatively independent” or “no intrinsic order” is to say that one of your variables is categorical. Since we’re on the subject, it may be a good time to go over the three different types of variables.
As the name implies, these are variables that can be expressed as a number. Car accidents per year would be a numeric variable. Number of squirrels on the road would also be a numeric variable. Two important properties of numeric variables are that they have intrinsic order (14 is greater than 6) and that they have magnitude (14 is exactly 8 units greater than 6). Those two properties give us the ability to find where one observation falls in relation to another.
Categorical variables are those that are not expressed as numbers. A very common example of this would be Male vs Female. One is not intrinsically greater or less than another (though my wife may disagree!). There is no sense of order or magnitude with categorical variables; they just exist more or less independently of each other.
This is where things get a little sticky. Ordinal variables look like numeric variables to the untrained eye. They are usually expressed as numbers, and have intrinsic order (hence “ordinal”). The difference is that they lack magnitude. A perfect example of this is a 5-star rating system. 5 stars is definitely better than 4 stars, but is it really 25% better than 4 stars? That’s much harder to answer
Line Charts vs Bar Charts
In light of the three types of variables, we can actually better define what is required for different graphs:
|Line Chart||Two or more numeric variables, one of which is time|
|Bar Chart||One or more numeric variables & one or more categorical variables|
The complication arises when you realize that, in most cases, you can replace a numeric or categorical variable with an ordinal one and still do just fine. I believe that’s one of the reasons why some people have a hard time choosing the right graph. Hopefully now that we know the difference between these three, things will be just a little bit easier.
So we’ve gone over the minimum requirements, let’s give a few best practices as well.
Long labels? Consider a horizontal bar chart
When you think of a bar chart, you probably picture one where the bars are vertical. There’s nothing wrong with the standard, but when your categories have long titles, you may want to change the orientation to help out. For example, this is what the first chart would look like if it were a vertical bar chart.
If you want to see all the labels, then you’ll have to read them at an awkward angle. It’s not a huge problem, but it can be a little annoying. However, if you use horizontal bars, the labels are oriented for natural reading.
Start at zero
Whenever you’re representing data, it’s very important that you not only present it accurately, but honestly – and there is a difference.
Both charts are technically accurate, and yet they look like very different results at first. Why? The left chart’s y-axis doesn’t start at zero, making the difference look much larger than it actually is. It can be a fairly effective way of manipulating an interpretation out of end users, but a good data analyst shouldn’t be in the business of manipulation.
Now, you may ask, “What if my values are very similar, and starting at zero makes them look equal?” Well, there are exceptions to this rule; that would be one of them. If you do need start the y-axis above zero, be sure you make it very clear that you have. We’ll cover a few ways to do that in a future post.