- Always save a clean version of your data that you do not touch (especially if accessing your data is difficult) / Never save over your only data file with recoded variables
- Always use syntax, so you have a record of what you are doing
- Keep your syntax nicely organized
- Use notes liberally in your syntax, in SPSS and STATA this means starting a line with *
- Organize syntax into sections, e.g.
- recodes for the dependent variable(s), then
- recodes for main independent variables, then
- other control variables; then
- descriptive statistics; then
- t-tests (or chi-square or ANOVA or whatever you’re using to test differences between groups); then
- main analyses;
- then sensitivity tests
- Always recode your variables (even if you are not transforming them for the following reasons:
- you get a better feel for the data–how they are coded, missing values, shape of the distributions
- new variables tend to end up at the end of the list of variables, which makes them easier to find if you’re looking at your list of variables
- you can give them names that are more intuitively memorable than QA_E84C or whatever they start out as
- When making dummies, DO NOT label them by the original variable but by the category you have assigned a value of 1 to. E.g. If you turn gender into a dummy, don’t call it “dumgender” if 1=male and 0=female; call it “dummale” instead
- Always label variables and values with meaningful words that make sense
- Always check your recodes (usually with cross-tabs)
- Always check your missing data
- How many cases are missing?
- <10% listwise deletion is probably okay
- >10% perhaps explore multiple imputation or other variables
- Are they missing at random or is there some kind of pattern to the missing cases?
- How many cases are missing?
- If a result seems contrary to the literature, question your result before you question the literature.
- Did you make an error in the coding?
- Are you interpreting the coding correctly (are high values indicative of high values? e.g. if a scale from 1-5 is from strongly agree (1) to strongly disagree (5), this means higher scores means stronger disagreement. Make sure variables are coded in the direction that you are interpreting them.
- After recoding all of your variables, save the syntax without saving the data file. Then reopen a clean data file and run your syntax file again to make sure that you hadn’t accidentally recoded something in your testing stage that got saved incorrectly.
- Depending on the number of variables and how big your data set is, you might want to have one syntax file for recoding and then a separate syntax file for analysis. This way you can delete all the variables you’re using and start the analysis with a fresh data set fully recoded, without having to run the recode code every time.
- Pros: Quicker data runs and easier to find the variables you need
- Cons: If you realize later that there’s a variable you need to include, you have to go through everything again.
Categories