Samples vs. Populations

When doing social research, what we care about most are populations, but what we have access to are samples. Samples are subsets of a population that we use to make estimates about populations. For example, if I wanted to know about American attitudes about the president, I couldn’t just pick up the phone and call the 327+ million American people and say “hey, gurrrl, what do you think about the prez?” That would be extremely annoying to both the people I’m calling and to me to have to take the time to do that. So, instead of making full use of my cross-country calling package, I would collect a sample of people from across the U.S.

Importantly, however, each person would need the same basic chance of being picked. If some people have a higher or lower chance of being picked, then our sample will be biased in favor of people like that kind of person. Let’s say I want to know about all Americans, but I only call people with landlines who live in upstate New York. My sample is extremely unlikely to be able to represent all Americans, since landline holders in upstate New York are going to be a particularly unique lot of people (no offense).

Sometimes, we are only able to get one particular group of people and this isn’t necessarily a bad thing, but it means that we have to change what we say our population is. So, maybe we only can get a random sample of upstate New Yorkers with a landline, we can do that, but then we would need to be clear that we are estimating the population of upstate New Yorkers with a landline, not all Americans.

Whether or not a sample is representative of a population depends entirely on how the sample was selected. Random sampling is required for a sample to be treated as representative of a population for the use of inferential statistics. However, every research project does not necessary want or need a random sample. Check out my flowchart for deciding which you might want.

Also, although some people are impressed by large samples, bigger is not necessary better. A sample with a million people who clicked on a link at their favorite website is far less useful for making estimates about a population, than would be a sample of 2,000 people who were randomly selected from Census data. Larger samples are better at estimating populations, but they have to be randomly selected for size to matter.