Chapter 7 — Sampling Distributions

Lecture Notes

1. Sampling Distribution of \(\bar{X}\)

When we repeatedly draw samples of size \(n\) from a population and compute the sample mean \(\bar{X}\) each time, the distribution of all those sample means is called the sampling distribution of \(\bar{X}\).

Population
(\(\mu, \sigma\))
↓  draw samples  ↓
CSUF
\(\bar{X}_1\)
UCI
\(\bar{X}_2\)
USC
\(\bar{X}_3\)
UCLA
\(\bar{X}_4\)

Each university yields a sample mean; together they form the sampling distribution.

Key Formulas

Mean
\(\mu_{\bar{X}} = \mu\)
Standard Error
\(\sigma_{\bar{X}} = \dfrac{\sigma}{\sqrt{n}}\)
Distribution
\(\bar{X} \sim N\!\left(\mu,\;\dfrac{\sigma}{\sqrt{n}}\right)\)

When Is It Normal?

2. Sampling Distribution of \(\hat{p}\)

For a categorical variable (yes/no), we estimate the population proportion \(p\) with the sample proportion \(\hat{p}\). Repeated sampling gives a distribution of \(\hat{p}\) values.

Key Formulas

Mean
\(\mu_{\hat{p}} = p\)
Standard Error
\(\sigma_{\hat{p}} = \sqrt{\dfrac{p(1-p)}{n}}\)
Distribution
\(\hat{p} \sim N\!\left(p,\;\sqrt{\dfrac{p(1-p)}{n}}\right)\)

Normality Check

The sampling distribution of \(\hat{p}\) is approximately Normal when both conditions hold:

\(np \ge 5 \quad\text{and}\quad n(1-p) \ge 5\)

3. Z-Score Conversions

To find probabilities, convert to the standard Normal distribution using the appropriate Z-score formula.

For \(\bar{X}\)
\(Z = \dfrac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)
For \(\hat{p}\)
\(Z = \dfrac{\hat{p} - p}{\sigma_{\hat{p}}}\)
μ (or p) −Z +Z

4. Problem Types

N1 — Left Tail
\(P(\bar{X} < a)\) or \(P(\hat{p} < a)\)
N2 — Right Tail
\(P(\bar{X} > a)\) or \(P(\hat{p} > a)\)
N3 — Between
\(P(a < \bar{X} < b)\) or \(P(a < \hat{p} < b)\)
Tip: For N2 (right-tail) problems, use \(P(Z > z) = 1 - P(Z < z)\). For N3 (between) problems, compute the two tail areas and subtract.

5. Worked Examples

Example 1 — Fullerton Household Incomes

Problem Setup

Let \(X\) = Fullerton household income. The population has \(\mu = 72{,}000\) and \(\sigma = 6{,}000\). Samples are drawn from various universities (CSUF, UCI, USC, UCLA). Describe the sampling distribution of \(\bar{X}\).

Mean of \(\bar{X}\): \(\mu_{\bar{X}} = \mu = 72{,}000\)
Standard error: \(\sigma_{\bar{X}} = \dfrac{\sigma}{\sqrt{n}} = \dfrac{6{,}000}{\sqrt{n}}\)
Shape: Normal if the population is Normal or \(n \ge 30\) (CLT).
Key Insight: The mean of the sampling distribution always equals the population mean, regardless of sample size. Increasing \(n\) only reduces the spread.

Example 2 — Trader Joe's Customers

Problem Setup

76% of Trader Joe's customers read ingredients before purchasing (\(p = .76\)). A random sample of \(n = 400\) customers is selected.

Standard error: \(\sigma_{\hat{p}} = \sqrt{\dfrac{.76 \times .24}{400}} = \sqrt{\dfrac{.1824}{400}} = \sqrt{.000456} = .0214\)
Normality check: \(400 \times .76 = 304 \ge 5\)    \(400 \times .24 = 96 \ge 5\)
Distribution: \(\hat{p} \sim N(.76,\;.0214)\)

(i) Find \(P(\hat{p} > .75)\)  — N2

Convert: \(Z = \dfrac{.75 - .76}{.0214} = \dfrac{-.01}{.0214} = -.4673\)
\(P(\hat{p} > .75) = P(Z > -.4673) = P(Z < .4673) = .6799\)

(ii) Find \(P(.73 < \hat{p} < .79)\)  — N3

Lower: \(Z_1 = \dfrac{.73 - .76}{.0214} = -1.4019\)
Upper: \(Z_2 = \dfrac{.79 - .76}{.0214} = 1.4019\)
\(P(.73 < \hat{p} < .79) = P(-1.40 < Z < 1.40) = 2 \times P(Z < 1.40) - 1 \approx .8385\)

(iii) Find \(P(\hat{p} < .75)\)  — N1

From part (i): \(Z = -.4673\)
\(P(\hat{p} < .75) = P(Z < -.4673) = 1 - .6799 = .3201\)
Key Insight: Parts (i) and (iii) are complements — they must sum to 1. This is a great self-check.

Example 3 — First-Time Amazon Customers

Problem Setup

30% of customers are first-time Amazon buyers (\(p = .30\)). A sample of \(n = 100\) is drawn.

Standard error: \(\sigma_{\hat{p}} = \sqrt{\dfrac{.30 \times .70}{100}} = \sqrt{\dfrac{.21}{100}} = \sqrt{.0021} = .0458\)
Normality check: \(100 \times .30 = 30 \ge 5\)    \(100 \times .70 = 70 \ge 5\)
Distribution: \(\hat{p} \sim N(.30,\;.0458)\)
Key Insight: Even with a relatively low proportion (\(p = .30\)), a sample of 100 is more than enough to satisfy the normality conditions.

Example 4 — Fullerton Solar Energy

Problem Setup

20% of Fullerton households use solar energy (\(p = .20\)). A sample of \(n = 100\) is drawn.

Standard error: \(\sigma_{\hat{p}} = \sqrt{\dfrac{.20 \times .80}{100}} = \sqrt{\dfrac{.16}{100}} = \sqrt{.0016} = .04\)
Normality check: \(100 \times .20 = 20 \ge 5\)    \(100 \times .80 = 80 \ge 5\)
Distribution: \(\hat{p} \sim N(.20,\;.04)\)
Key Insight: The standard error \(\sigma_{\hat{p}}\) is also called the standard deviation of \(\hat{p}\) — both terms are used interchangeably.

Example 5 — Car Insurance

Problem Setup

The mean annual cost of car insurance is \(\mu = \$939\) with \(\sigma = \$245\). A random sample of \(n = 50\) policies is selected.

(a) Mean: \(\mu_{\bar{X}} = \mu = 939\)
(b) Standard error: \(\sigma_{\bar{X}} = \dfrac{245}{\sqrt{50}} = \dfrac{245}{7.071} = 34.65\)
(c) Shape: Normal, since \(n = 50 \ge 30\) (CLT). Thus \(\bar{X} \sim N(939,\;34.65)\).

(d) Find \(P(\bar{X} < 964)\)

939 964 P(X̄ < 964)
Convert to Z: \(Z = \dfrac{964 - 939}{34.65} = \dfrac{25}{34.65} = 0.72\)
Look up: \(P(\bar{X} < 964) = P(Z < 0.72) = .7642\)
Key Insight: There is about a 76.4% chance that the sample mean insurance cost is less than $964 — even though individual policies vary widely (\(\sigma = 245\)), the sampling distribution is much tighter (\(\sigma_{\bar{X}} = 34.65\)).

📝 Practice Problems

Test your understanding. Click each question to reveal the answer.

1. A population has \(\mu = 500\) and \(\sigma = 80\). If you take a sample of \(n = 64\), what is the standard error \(\sigma_{\bar{X}}\)?

2. A poll finds that \(p = 0.45\) of voters support a measure. If \(n = 200\), what is \(\sigma_{\hat{p}}\)?

3. Can you use the Normal approximation for \(\hat{p}\) if \(p = 0.02\) and \(n = 100\)?

4. Household income has \(\mu = \$50{,}000\) and \(\sigma = \$12{,}000\). For a sample of \(n = 36\), find \(P(\bar{X} > 53{,}000)\).

5. In a city, 60% of residents recycle. A sample of \(n = 150\) is taken. Find \(P(\hat{p} < 0.55)\).

6. Which is larger: the standard error when \(n = 25\) or when \(n = 100\)? Why?