Probability and Statistics Notes for GATE Electrical Engineering (EE)

Sampling Theorems

Sampling Theorems

Central Limit Theorem (CLT)

For large sample size \(n\), sample mean \(\bar{X}\) follows normal distribution:

\[\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\]

Law of Large Numbers

As \(n \to \infty\): \(\bar{X} \to \mu\) (sample mean converges to population mean)

Standard Error

\[SE = \frac{\sigma}{\sqrt{n}}\]
where \(\sigma\) is population standard deviation

Key Point

CLT applies when \(n \geq 30\) regardless of population distribution

Conditional Probability

Conditional Probability

Definition

Probability of event A given event B has occurred:

\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

Total Probability

\[P(B) = \sum_{i=1}^{n} P(B|A_i) \cdot P(A_i)\]

Independence

Events A and B are independent if: \(P(A|B) = P(A)\) or \(P(A \cap B) = P(A) \cdot P(B)\)

Measures of Central Tendency

Mean, Median, Mode

Arithmetic Mean

\[\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}\]

Median

  • For odd \(n\): Middle value when data is ordered

  • For even \(n\): Average of two middle values

  • Position: \(\frac{n+1}{2}\)th term

Mode

Most frequently occurring value in the dataset

Relationship

For skewed distributions: Mean \(\neq\) Median \(\neq\) Mode

Standard Deviation

Standard Deviation & Variance

Population Variance

\[\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2\]

Sample Variance

\[s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\]

Standard Deviation

\[\sigma = \sqrt{\sigma^2}, \quad s = \sqrt{s^2}\]

Alternative Formula

\[\sigma^2 = E[X^2] - (E[X])^2\]
\[s^2 = \frac{\sum x_i^2 - n\bar{x}^2}{n-1}\]

Random Variables

Random Variables

Definition

Random Variable (RV): Function that assigns numerical values to outcomes of random experiment

Expected Value

\[E[X] = \sum_{i} x_i P(X = x_i) \quad \text{(Discrete)}\]
\[E[X] = \int_{-\infty}^{\infty} x f(x) dx \quad \text{(Continuous)}\]

Variance

\[Var(X) = E[X^2] - (E[X])^2\]
\[Var(aX + b) = a^2 Var(X)\]

Properties

  • \(E[aX + b] = aE[X] + b\)

  • \(E[X + Y] = E[X] + E[Y]\)

  • If X, Y independent: \(Var(X + Y) = Var(X) + Var(Y)\)

Discrete Distributions

Discrete Distributions

Probability Mass Function (PMF)

\[P(X = k), \quad \sum_k P(X = k) = 1\]

Cumulative Distribution Function (CDF)

\[F(x) = P(X \leq x) = \sum_{k \leq x} P(X = k)\]

Common Discrete Distributions

  • Bernoulli: \(X \sim \text{Ber}(p)\)

  • Binomial: \(X \sim \text{Bin}(n,p)\)

  • Poisson: \(X \sim \text{Pois}(\lambda)\)

  • Geometric: \(X \sim \text{Geom}(p)\)

Continuous Distributions

Continuous Distributions

Probability Density Function (PDF)

\[f(x) \geq 0, \quad \int_{-\infty}^{\infty} f(x) dx = 1\]
\[P(a < X < b) = \int_a^b f(x) dx\]

Cumulative Distribution Function (CDF)

\[F(x) = P(X \leq x) = \int_{-\infty}^x f(t) dt\]
\[f(x) = \frac{dF(x)}{dx}\]

Common Continuous Distributions

  • Uniform: \(X \sim U(a,b)\)

  • Normal: \(X \sim N(\mu, \sigma^2)\)

  • Exponential: \(X \sim \text{Exp}(\lambda)\)

Poisson Distribution

Poisson Distribution

Definition

Models number of events in fixed interval: \(X \sim \text{Pois}(\lambda)\)

PMF

\[P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}, \quad k = 0, 1, 2, \ldots\]

Parameters

  • \(E[X] = \lambda\)

  • \(Var(X) = \lambda\)

  • \(\sigma = \sqrt{\lambda}\)

Applications

  • Number of calls per hour

  • Number of defects per unit

  • Approximation to Binomial when \(n\) large, \(p\) small, \(np = \lambda\)

Normal Distribution

Normal Distribution

Definition

\(X \sim N(\mu, \sigma^2)\) - Bell-shaped curve

PDF

\[f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

Standard Normal

\(Z \sim N(0,1)\): \(Z = \frac{X - \mu}{\sigma}\)

\[\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}}\]

Properties

  • \(P(\mu - \sigma < X < \mu + \sigma) \approx 0.68\)

  • \(P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95\)

  • \(P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997\)

Binomial Distribution

Binomial Distribution

Definition

\(n\) independent trials, each with success probability \(p\): \(X \sim \text{Bin}(n,p)\)

PMF

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n\]
where \(\binom{n}{k} = \frac{n!}{k!(n-k)!}\)

Parameters

  • \(E[X] = np\)

  • \(Var(X) = np(1-p)\)

  • \(\sigma = \sqrt{np(1-p)}\)

Normal Approximation

When \(np \geq 5\) and \(n(1-p) \geq 5\): \(X \approx N(np, np(1-p))\)

Correlation Analysis

Correlation Analysis

Pearson Correlation Coefficient

\[r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \sum_{i=1}^{n}(y_i - \bar{y})^2}}\]
Alternative: \(r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\)

Properties

  • \(-1 \leq r \leq 1\)

  • \(r = 1\): Perfect positive correlation

  • \(r = -1\): Perfect negative correlation

  • \(r = 0\): No linear correlation

Interpretation

  • \(|r| > 0.8\): Strong correlation

  • \(0.5 < |r| < 0.8\): Moderate correlation

  • \(|r| < 0.5\): Weak correlation

Regression Analysis

Regression Analysis

Simple Linear Regression

\[y = a + bx + \epsilon\]
where \(y\) is dependent, \(x\) is independent variable

Least Squares Estimates

\[b = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}\]
\[a = \bar{y} - b\bar{x}\]

Coefficient of Determination

\[R^2 = r^2 = \frac{\text{Explained Variation}}{\text{Total Variation}}\]
\(R^2\) represents proportion of variance explained by regression

Key Point

Regression line always passes through \((\bar{x}, \bar{y})\)

Important Formulas Summary

Quick Formula Reference

Distribution Parameters

Distribution Mean Variance
Binomial \(np\) \(np(1-p)\)
Poisson \(\lambda\) \(\lambda\)
Normal \(\mu\) \(\sigma^2\)
Uniform \(\frac{a+b}{2}\) \(\frac{(b-a)^2}{12}\)

Key Relationships

  • \(P(A|B) = \frac{P(A \cap B)}{P(B)}\)

  • \(Var(X) = E[X^2] - (E[X])^2\)

  • \(r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\)

  • \(b = \frac{S_{xy}}{S_{xx}}\), \(a = \bar{y} - b\bar{x}\)