Doing Social Research Logo

Step 1. Open SPSS on your computer

The first screen that’ll pop up asks what data you want to open. I usually just ‘x’ out of this screen or hit cancel.

Step 2. You’ll end up on a blank spreadsheet. To open the data:

1) click “file” in the top left hand corner, then

2) Open, then

3) Data
Note: Other types of files you are likely to need are “syntax” where you can type in code and “output” where you will be getting your results.

Side note: On the right is what an output file looks like. If this pops up on your screen, for now, you can ignore it. Eventually, this is what you’ll be looking at to find the results of your analyses.

Step 3. After you click data, you’ll get a file menus for you to find the data. For this course, we are using GSS27.sav. Double click on your data file, or click once and then hit ‘open.’

Note: Make sure you have downloaded the data first!

Now your data are open. You should end up with a screen that looks something like one of the examples to the right.

To Toggle back and forth between these two screens, click the “Data View” or “Variable View” Tabs at the bottom of the screens.

Note: All of the values in the data view were changed to protect the anonymity of the data for this sheet, so the numbers will not match the real data.

Data View 

The data view screen shows you the particular responses for person.

Each ROW represents one case (e.g. one person, one school, one whatever your unit of analysis is)

Each COLUMN represents the variable with the corresponding value for the participant’s response.

NAMES for the variables are shown in the very first row.

Thus, the cell highlighted in yellow shows that the 2nd person interviewed is coded as 1 for “sex.” But what does that mean? We’ll need to look at the variable view to better understand.

The VARIABLE VIEW spells out more details about each variable. I’ve given a description of the columns that will matter to you, skipping ones that you can ignore for now.

1. The first column is the NAME, which corresponds with the first row in the data view.

2. TYPE refers to what SPSS understand the values to be. Most often this will just say “numeric” which means SPSS treats the responses as a number. Other options are: Comma, dot, scientific notation, data, dollar, custom currency, string or restricted numeric (integer with leading zeros). If it says “string” that means SPSS will view each value as a word and not be able to calculate any statistics for it until you transform it into a “numeric” value. Sometimes if you can’t get SPSS to run, this is why. Even if the values look like numbers in the data view, if the variable is typed as a string, SPSS will see all the 1’s as though they are “one”.

3. LABELS is one of the most useful columns telling us what the often incomprehensible name means. In this case, the first variable is “Record identification,” the second is “Age group of respondent (groups of 10).” If you can’t read the entire label, click on the line separating the label and values columns (marked with a blue arrow) and slide the box to make it bigger.

4. VALUES is tied with Labels for most useful. This tells you how the variables are coded. Most variables, such as “sex” do not have an inherently meaningful number associated with them. So some data sets might code 1 as male whereas another would assign 0 to men. To see what the values are in the data, expand the values box then click on the cell you’re interested in. You’ll see a little blue box with three dots. Click on the little box and a screen will pop up telling you what each value label is.

Note! If the variable is interval ratio, there often will not be labels for the values since, as in this example at the bottom for number of children in the household, 1 means 1 child, 2 means 2 children. It would be redundant to label this, but they do label 4 since 4 doesn’t mean 4 children, it means 4 or more children. 0 was labeled as none to make it clear that there were no children (I guess).

 Note! Most Statistics Canada data (and much of the data collected around the world) will use the values of 6, 7, 8, 9 or 96, 97, 98, 99, or 996, 997, 998, 999 (depending on the width of the variable) to indicate missing data or values that do not really answer the question of the variable. We often code this out as missing data. If all you see in the value labels screen are these numbers, this probably means that you’re dealing with an interval-ratio variable.

5. MISSING explains which values SPSS is going to treat as missing and not include in the data. There are times where the values indicate, e.g., that 6-9 are missing, but this column is empty in SPSS. This means that for those cases the values are skipped, unknown, refused, or otherwise not there, so you need to tell SPSS that a 6, for example, should not be included in any calculations. Take for example, the variable family size, if we left the ‘9’ as ‘9’ SPSS would just assume that that person has a family size of 9, when really we just don’t know how big the family is. Their family might be 1 person or 125 people, we just don’t know so we don’t want to include the 9 and miscalculate statistics about family size.

(skipping columns that really don’t matter for our purposes)

6. MEASURE is a tricky column you need to be careful of. This indicates the level of measurement that the variable is assigned. However, these are often entered inaccurately or they represent how the variable was originally collected but not the state that it is in now. For example, “RECORD IDENTIFICATION” is listed as a ‘scale’ variable to indicate interval-ratio, but an ID is always nominal. Although we can technically ranks the IDs from high to low or low to high, each number simply represents a person, not a meaningful number. The ID numbers are arbitrarily assigned and one could not say they have a greater or lesser ID number. Thus, I recommend ignoring this column and using your own sense to decide what statistics you can calculate.