This lesson talks about the definition, formula, and use of the sample proportion. We also see a brief intro into the concept of margin of error and selection of sample size.
After completing the lesson, take a short quiz to see what you have learned.
Sample and Population Proportions
Mr. Smith is a poll worker on the campaign of Bill Jones. He wants to predict the percentage of the state’s citizens who will vote for Mr. Jones. But, there are millions of voters in the state, and he cannot poll every single voter. So, he must estimate the proportion of the population by taking a sample (polling).
Proportion is the decimal form of a percentage, so 100% would be a proportion of 1.000; 50% would be a proportion of 0.500, etc. The proportion of the population voting for Mr.
Jones is symbolized by the symbol p. The proportion has the following formula:p = (number of favorable outcomes) / (number of outcomes in the population)If we are talking about the proportion of a sample rather than a population, then we would use this slightly modified formula:
In this second formula, the symbol on the left is called p-hat and is used to refer to the proportion of a sample of the population as opposed to the proportion of the whole population. While the formulas look very similar, the difference is very important. In many situations, polling or sampling the entire population is difficult, way too expensive, or impossible.
Also note that in statistics, the word ‘favorable’ simply refers to the outcomes we are studying and is not always what we would consider a ‘good’ outcome.
Mr. Smith polls a sample of n = 1,000 likely voters in the state. 520 of those polled say that they intend to vote for Mr. Jones. This means that we have 520 favorable outcomes out of the 1,000 polled and 520 / 1000 = .
520.Since this is only an estimate of the true proportion voting for Jones, it is likely different from p. In statistics, we call this sampling error. Error in this case does not mean a mistake. It refers to a difference between the sample and population that results from not having access to the entire population.
To explore sampling further, let’s take a hypothetical situation that in actuality 54% of the population will vote for Mayor Jones, so p = 0.540. We could simulate our sampling process by polling 1,000 voters, measuring the proportion of ‘yes’ votes, and then repeating that process 100 times, each time with a new sample of 1,000 voters. We list the first 10 results:
|Sample #||Proportion of 1,000 Voting for Jones||Percent of 1,000 Voting for Jones|
A histogram graph of all 100 samples of 1,000 voters can be seen here:
The horizontal axis shows the sample proportion; the vertical axis shows the number of times that sample proportion occurred. Every one of our 100 samples resulted in between 51% and 57% voting for Jones, and as expected, the graph is centered on 54% since that is the population proportion.
Is the Sample Representative of the Population?
As you can see, the variation is very little even though we only sampled 1,000 voters. As surprising as this may seem, if you want to be able to estimate the proportion within plus or minus 0.03, or 3%, you only need a sample of about 1,100 regardless of the population size.
An important thing to note, however, is the importance of having a sample that is representative of the population. Suppose you want to know what percentage of the U.S. population feels that men and women get equal pay. You should make sure that you sample an equal amount from each gender in your sample, or you may wind up with a deceptively high or low percentage.
How to Determine the Needed Sample Size
Keep in mind that if your sample proportion is going to be reliably used to determine the population proportion, then the product of the sample size and the smaller of p-hat and 1-p-hat must be at least 5.
Another important thing is to determine how big a sample size you need. We can use this formula:
Of course, we don’t know what our sample proportion is since we haven’t taken a sample yet, and we haven’t taken a sample yet because we don’t know what our sample size is – kind of a Catch-22! So we assume a sample proportion of 0.
500. If the actual proportion is different, the n we get will be an overestimate if anything. So a p-hat of 0.500 is a good conservative assumption to make.Incidentally, what if n is a fraction? (which it probably will be)? In that case, we always round up. If our formula gives us n = 10.
001, we would need to sample 11. To round down to 10, would give us an error higher than the one we are looking for.
Proportion is the decimal form of the percentage of a population or sample meeting a certain criteria. Since we often don’t have the access or resources to poll or sample an entire population, we use the sample proportion to estimate the population proportion. It is important to keep sampling error in mind, which is the difference between the sample and population that results from not having access to the entire population.
Optimally, we want the product of the sample size and the smaller of p-hat or 1 – p-hat to be at least 5.Although estimating the sample size needed to poll is a complicated topic, a basic rule of thumb is to use the formula 1 / E2, where E is the margin of error expressed as a decimal. Just as important as having a good sample size is making sure that we have an unbiased sample, in other words that our sample is representative of the population.
After completing this lesson, students should be able to:
- Define proportion
- Describe how to calculate a sample proportion
- Identify the meaning of sampling error