📚

 > 

📊 

 > 

📊

Unit 5 Overview: Sampling Distributions

4 min readdecember 31, 2022

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Harrison Burnside

Harrison Burnside

Jed Quiaoit

Jed Quiaoit

Josh Argo

Josh Argo

Harrison Burnside

Harrison Burnside


AP Statistics 📊

265 resources
See Units

"This unit applies probabilistic reasoning to sampling, introducing students to sampling distributions of statistics they will use when performing inference in Units 6 and 7. Students should understand that sample statistics can be used to estimate corresponding population parameters and that measures of center (mean) and variability (standard deviation) for these sampling distributions can be determined directly from the population parameters when certain sampling criteria are met. For large enough samples from any population, these sampling distributions can be approximated by a normal distribution. Simulating sampling distributions helps students to understand how the values of statistics vary in repeated random sampling from populations with known parameters." -- College Board, AP Statistics course description

What is a Sampling Distribution?

A sampling distribution is a distribution where we take ALL possible samples of a given size and put those sample statistics together as a data set.
For example, let's say we are looking at average number of snap peas taken from a field. If we take all possible samples of size 30, average each field, and then average those averages together, we would get a REALLY good picture of what the population parameter was (which is likely unrealistic to actually calculate). Sampling distributions are important because they lead the way to statistical inference: the act of making a prediction or testing a claim regarding a population parameter.
https://cdn.pixabay.com/photo/2017/10/25/22/29/bayesian-2889576_960_720.png

image courtesy of: pixabay.com

Sampling Distribution for Proportions

The first type of sampling distribution you will encounter is a sampling distribution for proportions used to estimate a population proportion.
For a sampling distribution for proportions, we will take the sample proportion from all possible samples of our given size and average those together to find the mean of our sampling distribution. Our standard deviation is found using a formula given on the reference page. Once you have those two things, you have the crux of a sampling distribution for population proportion.

Conditions for Sampling Distribution

As we get into statistical inference, you'll find that sampling distributions hinge on certain conditions that make our sampling distributions an accurate portrayal of our population proportion.

Random

The first and possibly most important condition necessary for creating a sampling distribution is that our sample is randomly selected. If our sample is not randomly selected, then all the math and calculations we do are all for naught because our point estimate, or sample statistic, is biased. 😱

Independence (10% Condition)

In order for the standard deviation formula to be accurate, our samples have to be chosen independently of one another. Since we are sampling without replacement, this is technically impossible. However, by checking the 10% condition, we can determine that the amount of dependence is so negligible that our samples are essentially independent.
In order to check this condition, you need to make sure that the population is at least 10 times our sample size! ✅

Normality (Large Counts Condition)

In order to eventually calculate the probability of obtaining certain samples using a sampling distribution, we need to verify that our sampling distribution is approximately normal.
For categorical data (proportions), we need to check the large counts condition, which states that the number of expected successes and failures are at least 10. In other words, np is greater than or equal to 10 and n(1-p) is greater than or equal to 10.

Sampling Distribution for Means

When dealing with means, our center is the average of all of our sample means from all possible samples of size n. In other words, it's the average of the averages. Our standard deviation is found by dividing our population standard deviation by the square root of our sample size. As our sample size increases, our standard deviation decreases, which plays a huge part in why a large sample size is vital in accurately estimating our population mean. 🤓

Conditions for Sampling Distribution

As you will find as we get into statistical inference, sampling distributions hinge on certain conditions that make our sampling distributions an accurate portrayal of our population proportion.

Random

Just as with estimating population proportions, it is essential that our sampling distribution is based on random samples. No mathematics or fancy statistics can "fix" a biased sample. 😕

Independence (10% Condition)

Again, as with population proportions, we must check the 10% condition the same way as we do for population proportions

Normality (Central Limit Theorem)

Our check to be sure that our sampling distribution is normal is different than our condition for population proportions. In order to make sure the sampling distribution for our mean is normal, we must verify one of two things: either that our population is normally distributed or our sample size is at least 30. This is known as the central limit theorem.

Sampling Distributions for the Differences in Means and Proportions

The last type of sampling distribution we encounter is when we are seeing if there is a difference in two populations. In this type of sampling distribution, our center is the difference in our two samples (which is presumably 0 if the two populations are not different). The necessary formulas for the center and spread of these sampling distributions can be found on the reference page. This plays a huge part in statistical inference when checking if two populations are in fact different, which is essential in experimental studies.

Conditions for Inference

In order to check the conditions for inference when there are two samples, you are basically doing the same checks above but doing it twice: checking randomness, independence, and normality for both samples. 🏡
🎥 Watch: AP Stats - Sampling Distributions for Means