When you're analyzing data, outliers can skew the results and make them less reliable. For example, if one or more of your values are substantially higher than the majority of the data, they make the mean higher, which might not reflect the data as a whole. Removing these outlying data points is, therefore, an important step in performing reliable statistical calculations. With no dedicated outliers function, the easiest way to perform an outlier test in Excel is to use the interquartile range to identify outliers, but you can also use the Trimmean function to approximate the same result.
Calculating the Interquartile Range
The interquartile range of data is the range covered by the "box" on a box-and-whisker plot, or more precisely, the result of subtracting the value for the first quartile of the data from that of the third quartile. Excel contains a built-in feature to calculate any quartile for your data. Find a spare cell and type "=QUARTILE([data range], [quartile number])" and put the range of cells for your data where it says "[data range]" and the quartile you want where it says "[quartile number]."
For example, if you have data in cells A2 to A101 and you want the value for the first quartile, you enter "=QUARTILE(A2:A101, 1)" to find the value. For the first part of the argument, you can highlight the relevant cells with your mouse, but after the comma, you need to write the number of the quartile you want. For the third quartile with the same data, you type "=QUARTILE(A2:A101, 3)" to get the result.
Video of the Day
Using another empty cell, subtract the value of the first quartile cell from the value of the third quartile cell. If the first quartile is in cell C2 and the third quartile is in cell D2, type "=D2-C2" to get the result. This is the interquartile range.
Outlier Analysis in Excel
To find outliers, you can now use the interquartile range in the outlier formula, which states that the upper limit of the data is the value of the third quartile plus 1.5 times the interquartile range, and the lower limit is the value of the first quartile minus 1.5 times the interquartile range.
If the first quartile value is in C2, the third quartile value is in cell D2, and the interquartile range is in cell E2, you would type "=C2-(1.5 E2)" to find the lower limit and "=D2+(1.5 E2)" to find the upper limit. In general, you enter "=[first quartile] – (1.5 [interquartile range])" to find the lower limit and "=[third quartile] + (1.5 [interquartile range])" to find the upper limit.
Anything below the lower limit or above the upper limit is an outlier.
To finish the outlier test in Excel, use the logical "OR" function to identify which values in your data class are outliers in an efficient manner. Enter "=OR([data cell]>[upper limit], [data cell]<[lower limit])" to find the outliers, with relevant cell references in place of the quantities in square brackets. For example, if the data is in cells A2 to A101, the upper limit is in cell F2, and the lower limit is in cell G2, go to cell B2 and type "=OR(A2>$F$2, A2<$G$2)" to use the function (where the dollar signs before "F," "G" and "2" tell Excel that this shouldn't change when you drag the formula down).
If the value in A2 is above the upper limit or below the lower limit, it displays "TRUE," indicating that the value is an outlier. You can drag this formula down by clicking the bottom right corner of the cell with the formula and dragging it down so that it finishes beside the final data cell to perform the same calculation on each data point.
You can also highlight the data, and go to "Conditional Formatting" in the "Styles" section of the "Home" tab if you want to change the formatting for the outliers. Choose "New Rule" and highlight the "Use a formula to determine which cells to format" option. Type the same formula as in the previous paragraph and then click the "Format" option to choose the unique formatting for outliers.
The "Trimmean" function is a simpler way of identifying outliers. Type "=TRIMMEAN([data range], [proportion to trim])" to use the function, with the range of cells containing data in place of "[data range]" and a decimal percentage you want to trim where it says "[proportion to trim]." This removes the extreme values at the top and bottom and then calculates the mean based on those remaining. So, if you trimmed 10 percent, it would remove the top 5 percent and the bottom 5 percent before calculating the mean.
If the data runs from A2 to A101 and you want to trim the extreme 5 percent of values, you enter "=TRIMMEAN(A2:A101, 0.05)" to find the adjusted mean. You could trim 15 percent by writing "=TRIMMEAN(A2:A101, 0.15)" instead.