How to Remove Outliers in SPSS

Outliers in statistical analyses are extreme values that do not seem to fit with the majority of a data set. If not removed, these extreme values can have a large effect on any conclusions that might be drawn from the data in question, because they can skew correlation coefficients and lines of best fit in the wrong direction. SPSS is one of a number of statistical analysis software programs that can be used to interpret a data set and identify and remove outlying values.

...
Outliers are extreme values that can skew the results of a statistical analysis and create inaccurate conclusions.

Exploratory Data Anaylsis

Step

Click on "Analyze." Select "Descriptive Statistics" followed by "Explore."

Step

Drag and drop the columns containing the dependent variable data into the box labeled "Dependent List." Click "OK."

Step

Remove any outliers identified by SPSS in the stem-and-leaf plots or box plots by deleting the individual data points. Alternatively, you can set up a filter to exclude these data points.

Step

Select "Data" and then "Select Cases" and click on a condition that has outliers you wish to exclude. Determine a value for this condition that excludes only the outliers and none of the non-outlying data points.

Step

Choose "If Condition is Satisfied" in the "Select" box and then click the "If" button just below it. Enter the rule to exclude outliers that you determined in the previous step into the box at the upper right. For example, if you were excluding measurements above 74.5 inches from the condition "height," you would enter "height < = 74.5." Click "Continue" and "OK" to activate the filter.

Regression Analysis

Step

In the "Analyze" menu, select "Regression" and then "Linear." Select the dependent and independent variables you want to analyze.

Step

Click "Save" and then select "Cook's Distance." The values calculated for Cook's distance will be saved in your data file as variables labeled "COO-1."

Step

Run a boxplot by selecting "Graphs" followed by "Boxplot." Click on "Simple" and select "Summaries of Separate Variables." Enter "COO-1" into the box labeled "Boxes Represent," and then enter an ID or name by which to identify the cases in the "Label Cases By" box.

Step

Enlarge the boxplot in the output file by double-clicking it. Make a note of cases that lie beyond the black lines—these are your outliers. You may choose to remove all of the outliers or only the extreme outliers, which are marked by a star (*).

Step

Go back into the data file and locate the cases that need to be erased. Working from the bottom up, highlight the number at the extreme left, in the gray column, so the the entire row is selected. Click on "Edit" and select "Clear." Repeat this step for each outlier you have identified from the boxplot.