How To Make a Boxplot in Excel

In this post we’ll cover how to create box plots in Excel. Along with scatter plots, box plots are a standard in statistical analysis – especially during the exploratory phase, when you’re just familiarizing yourself with the data. That’s because they’re great at showing the spread of values – what’s your maximum, minimum, and important in-between values.

The following tutorial was written for Excel 2013 and later, but can also be done in earlier versions. Most steps should be similar enough for you to figure out, but if you have any trouble, feel free to let me know in the comments section below!

Setting Up the Data

Let’s say we have Test and Control groups for which we’ve taken some measurements. We now want a graphical representation of the two group so we can make some visual comparison. The data is stored in Boxplot_Example.xlsx, which you can download from my GitHub page, if you’d like to follow along.

Calculate the Min/Max and Quartiles

Calculating the quartiles for each group separately is a great way to summarize their spread. The easiest formula to do this in Excel is the =QUARTILE.INC() function. This function will calculate whatever quartile you specify when you give it some data. In case you were wondering, the .INC part means that we are INCLUDING the minimum and maximum values.

Here’s a summary of the function:

=QUARTILE.INC(array, quart)

Input Description Values
array Range of cells for which to calculate a quartile Any range of cells containing numerical values
quart The quartile to calculate 0 – Minimum
1 – First quartile (25th percentile)
2 – Median (50th percentile)
3 – Third quartile (75th percentile)
4 – Maximum

The formulas should end up looking like the following:

2. True Calcs

Calculate Graph Values

We’re not quite ready for a chart yet. First, we need a version of the data that works well with a stacked bar chart, since that’s really all a boxplot is.

Right now, if we were to throw the true quartile values straight into a graph, the sum of the values would end up being more than the true maximum, making the bar chart much taller than it should be. So in order to make this work, we should calculate the cumulative versions of each quartile. That way, as you stack one value on top of the previous, their cumulative sum equals the true value you want. In other words:

Cumulative Value True Value
Min = Min
Min + Q1 = Q1
Min + Q1 + Q2 = Q2
Min + Q1 + Q2 + Q3 = Q3
Min + Q1 + Q2 + Q3 + Max = Max

Doing a little bit of algebra, we get the following calculations for each cumulative value.

Cumulative Value True Value
Min = Min
Q1 = Q1 – Min
Q2 = Q2 – Q1
Q3 = Q3 – Q2
Max = Max – Q3

We can put those calculations right below the true values, like in the picture below.

3.Cumulative Values

Create the Graph

This is where things start coming together. Highlight cells D12:F17, and insert a stacked bar chart. At first, the chart will look like this:

Chart1

But we can switch the axes by right-clicking on the chart, selecting ‘Select Data…’, and clicking the ‘Switch Row/Column’ button.

Chart2

So far things still aren’t looking much like a boxplot, but we’re almost there! Right-click the bottom blue section of the bars, and select ‘Format Data Series…’. Under ‘Fill’ select ‘Solid Fill’ and increase the transparency to 100%.

Chart3

Working our way up, left-click the next (orange) section. Click on the plus sign near the top right of the chart. Click on the arrow next to ‘Error Bars’, and select ‘More Options…’. In the menu, select ‘Minus’, ‘No Cap’, and set the percent to 100%. This gives us an error bar that spans the entire length (100%) of this orange section of the graph.

Now that we have the error bar, select the orange section again, and set it’s transparency to 100% as well. And while we’re here, why don’t we change the color of the error bar to a nice shade of blue.

Chart4

The next gray and yellow sections represent the interquartile range. The bottom of the gray is Q1, the top of the yellow is Q3, and where they meet is the median. We’ll format both of these regions the same way by making them light blue, with a darker blue border.

Chart5

Now just add the the top whisker the same way we did the bottom, by adding another error bar.

Chart6

And look what we have here – a boxplot! Not bad, but it’s still missing something: the average. So let’s calculate that for each group, and store it in E18:F18. Now, right-click on the chart, select “Select Data…”, and click “Add”. For the series name, enter “Avg”, and for the series values, highlight E18:F18. The chart will look screwed up at first.

Chart7

To fix it, right-click on the green bars and select “Change Series Chart Type…” In the resulting window, scroll down to Avg, and select “Line”.

Chart8

Format the line to include some nice blue data markers, and delete the line itself. Then just clean it up with a little extra formatting and that’s how you make data look good!

Chart9

P.S. If all of this formatting was too tedious for you (I don’t blame you), I’ll cover a great time-saving tip in my next post.

One thought on “How To Make a Boxplot in Excel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s