Sampling Theorems
Sampling Theorems
Central Limit Theorem (CLT)
For large sample size \(n\), sample mean \(\bar{X}\) follows normal distribution:
Law of Large Numbers
As \(n \to \infty\): \(\bar{X} \to \mu\) (sample mean converges to population mean)
Standard Error
Key Point
CLT applies when \(n \geq 30\) regardless of population distribution
Conditional Probability
Conditional Probability
Definition
Probability of event A given event B has occurred:
Bayes’ Theorem
Total Probability
Independence
Events A and B are independent if: \(P(A|B) = P(A)\) or \(P(A \cap B) = P(A) \cdot P(B)\)
Measures of Central Tendency
Mean, Median, Mode
Arithmetic Mean
Median
-
For odd \(n\): Middle value when data is ordered
-
For even \(n\): Average of two middle values
-
Position: \(\frac{n+1}{2}\)th term
Mode
Most frequently occurring value in the dataset
Relationship
For skewed distributions: Mean \(\neq\) Median \(\neq\) Mode
Standard Deviation
Standard Deviation & Variance
Population Variance
Sample Variance
Standard Deviation
Alternative Formula
Random Variables
Random Variables
Definition
Random Variable (RV): Function that assigns numerical values to outcomes of random experiment
Expected Value
Variance
Properties
-
\(E[aX + b] = aE[X] + b\)
-
\(E[X + Y] = E[X] + E[Y]\)
-
If X, Y independent: \(Var(X + Y) = Var(X) + Var(Y)\)
Discrete Distributions
Discrete Distributions
Probability Mass Function (PMF)
Cumulative Distribution Function (CDF)
Common Discrete Distributions
-
Bernoulli: \(X \sim \text{Ber}(p)\)
-
Binomial: \(X \sim \text{Bin}(n,p)\)
-
Poisson: \(X \sim \text{Pois}(\lambda)\)
-
Geometric: \(X \sim \text{Geom}(p)\)
Continuous Distributions
Continuous Distributions
Probability Density Function (PDF)
Cumulative Distribution Function (CDF)
Common Continuous Distributions
-
Uniform: \(X \sim U(a,b)\)
-
Normal: \(X \sim N(\mu, \sigma^2)\)
-
Exponential: \(X \sim \text{Exp}(\lambda)\)
Poisson Distribution
Poisson Distribution
Definition
Models number of events in fixed interval: \(X \sim \text{Pois}(\lambda)\)
PMF
Parameters
-
\(E[X] = \lambda\)
-
\(Var(X) = \lambda\)
-
\(\sigma = \sqrt{\lambda}\)
Applications
-
Number of calls per hour
-
Number of defects per unit
-
Approximation to Binomial when \(n\) large, \(p\) small, \(np = \lambda\)
Normal Distribution
Normal Distribution
Definition
\(X \sim N(\mu, \sigma^2)\) - Bell-shaped curve
Standard Normal
\(Z \sim N(0,1)\): \(Z = \frac{X - \mu}{\sigma}\)
Properties
-
\(P(\mu - \sigma < X < \mu + \sigma) \approx 0.68\)
-
\(P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95\)
-
\(P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997\)
Binomial Distribution
Binomial Distribution
Definition
\(n\) independent trials, each with success probability \(p\): \(X \sim \text{Bin}(n,p)\)
PMF
Parameters
-
\(E[X] = np\)
-
\(Var(X) = np(1-p)\)
-
\(\sigma = \sqrt{np(1-p)}\)
Normal Approximation
When \(np \geq 5\) and \(n(1-p) \geq 5\): \(X \approx N(np, np(1-p))\)
Correlation Analysis
Correlation Analysis
Pearson Correlation Coefficient
Properties
-
\(-1 \leq r \leq 1\)
-
\(r = 1\): Perfect positive correlation
-
\(r = -1\): Perfect negative correlation
-
\(r = 0\): No linear correlation
Interpretation
-
\(|r| > 0.8\): Strong correlation
-
\(0.5 < |r| < 0.8\): Moderate correlation
-
\(|r| < 0.5\): Weak correlation
Regression Analysis
Regression Analysis
Simple Linear Regression
Least Squares Estimates
Coefficient of Determination
Key Point
Regression line always passes through \((\bar{x}, \bar{y})\)
Important Formulas Summary
Quick Formula Reference
Distribution Parameters
Distribution | Mean | Variance |
---|---|---|
Binomial | \(np\) | \(np(1-p)\) |
Poisson | \(\lambda\) | \(\lambda\) |
Normal | \(\mu\) | \(\sigma^2\) |
Uniform | \(\frac{a+b}{2}\) | \(\frac{(b-a)^2}{12}\) |
Key Relationships
-
\(P(A|B) = \frac{P(A \cap B)}{P(B)}\)
-
\(Var(X) = E[X^2] - (E[X])^2\)
-
\(r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\)
-
\(b = \frac{S_{xy}}{S_{xx}}\), \(a = \bar{y} - b\bar{x}\)