Probability Theory

Last updated on Oct 29, 2021

Probability

Probability Space

A probability space is a triple (Ω,A,P) where

  • Ω is the sample space.
  • A is the σ-algebra on Ω.
  • P is a probability measure.

The sample space Ω is the space of all possible events.

What is a σ-algebra and a probability measure?

Sigma Algebra

A nonempty set (of subsets of Ω) A2Ω is a sigma algebra (σ-algebra) of Ω if the following conditions hold:

  1. ΩA
  2. If AA, then (ΩA)A
  3. If A1,A2,A, then i=1AiA

The smallest σ-algebra is {,Ω} and the largest one is 2Ω (in cardinality terms).

Suppose Ω=R. Let C={(a,b],a<b<}. Then the Borel σ- algebra on R is defined by B(R)=σ(C)

Probability Measure

A probability measure P:A[0,1] is a set function with domain A and codomain [0,1] such that

  1. P(A)0 AA
  2. P is σ-additive: is AnA are pairwise disjoint events (AjAk= for jk), then P(n=1An)=n=1P(An)
  3. P(Ω)=1

Properties

Some properties of probability measures

  • P(Ac)=1P(A)
  • P()=0
  • For A,BA, P(AB)=P(A)+P(B)P(AB)
  • For A,BA, if AB then P(A)P(B)
  • For AnA, P(n=1An)n=1P(An)
  • For AnA, if AnA then limnP(An)=P(A)

Conditional Probability

Let A,BA and P(B)>0, the conditional probability of A given B is P(A|B)=P(AB)P(B)

Two events A and B are independent if P(AB)=P(A)P(B).

Law of Total Probability

Theorem (Law of Total Probability)

Let (En)n1 be a finite or countable partition of Ω. Then, if AA, P(A)=nP(A|En)P(En)

Bayes Theorem

Theorem (Bayes Theorem)

Let (En)n1 be a finite or countable partition of Ω, and suppose P(A)>0. Then, P(En|A)=P(A|En)P(En)mP(A|Em)P(Em)

For a single event EΩ, P(E|A)=P(A|E)P(E)P(A)

Random Variables

Definition

A random variable X on a probability space (Ω,A,P) is a (measurable) mapping X:ΩR such that BB(R),X1(B)A

The measurability condition states that the inverse image is a measurable set of Ω i.e. X1(B)A. This is essential since probabilities are defined only on A.

In words, a random variable it’s a mapping from events to real numbers such that each interval on the real line can be mapped back into an element of the sigma algebra (it can be the empty set).

Distribution Function

Let X be a real valued random variable. The distribution function (also called cumulative distribution function) of X, commonly denoted FX(x) is defined by FX(x)=Pr(Xx)

Properties

  • F is monotone non-decreasing
  • F is right continuous
  • limxF(x)=0 and limx+F(x)=1

The random variables (X1,..,Xn) are independent if and only if F(X1,,Xn)(x)=i=1nFXi(xi)xRn

Density Function

Let X be a real valued random variable. X has a probability density function if there exists fX(x) such that for all measurable AR, P(XA)=AfX(x)dx

Moments

Expected Value

The expected value of a random variable, when it exists, is given by E[X]=ΩX(ω)dP When X has a density, then E[X]=RxfX(x)dx=RxdFX(x)

The empirical expectation (or sample average) is given by En[xi]=1ni=1Nxi

Variance and Covariance

The covariance of two random variables X, Y defined on Ω is Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y] In vector notation, Cov(X,Y)=E[XY]E[X]E[Y].

The variance of a random variable X, when it exists, is given by Var(X)=E[(XE[X])2]=E[X2]E[X]2 In vector notation, Var(X)=E[XX]E[X]E[X].

Properties

Let X,Y,Z,TL2 and a,b,c,dR

  • Cov(X,X)=Var(X)
  • Cov(X,Y)=Cov(Y,X)
  • Cov(aX+b,Y)=a Cov(X,Y)
  • Cov(X+Z,Y)=Cov(X,Y)+Cov(Z,Y)
  • Cov(aX+bZ,cY+dT)=acCov(X,Y)+adCov(X,T)+bcCov(Z,Y)+bdCov(Z,T)

Let X,YL1 be independent. Then, E[XY]=E[X]E[Y].

If X and Y are independent, then Cov(X,Y)=0.

Note that the converse does not hold: Cov(X,Y)=0XY.

Sample Variance

The sample variance is given by Varn(xi)=1ni=1N(xix¯)2 where xi¯=En[xi]=1ni=1Nxi.

Finite Sample Bias Theorem

Theorem: The expected sample variance E[σn2]=E[1ni=1N(yiEn[Y])2] gives an estimate of the population variance that is biased by a factor of 1n and is therefore referred to as biased sample variance.

Proof: E[σn2]=E[1ni=1n(yiEn[Y])2]==E[1ni=1n(yi1ni=1nyi)2]==1ni=1nE[yi22nyij=1nyj+1n2j=1nyjk=1nyk]==1ni=1n[n2nE[yi2]2njiE[yiyj]+1n2j=1nkjE[yjyk]+1n2j=1nE[yj2]]==1ni=1n[n2n(μ2+σ2)2n(n1)μ2+1n2n(n1)μ2+1n2n(μ2+σ2)]]==n1nσ2

Inequalities

  • Triangle Inequality: if E[X]<, then |E[X]|E[|X|]

  • Markov’s Inequality: if E[X]<, then Pr(|X|>t)1tE[|X|]

  • Chebyshev’s Inequality: if E[X2]<, then Pr(|Xμ|>tσ)1t2Pr(|Xμ|>t)σ2t2

  • Cauchy-Schwarz’s Inequality: E[|XY|]E[X2]E[Y2]

  • Minkowski Inequality: (k=1n|xk+yk|p)1p(k=1n|xk|p)1p+(k=1n|yk|p)1p

  • Jensen’s Inequality: if g() is concave (e.g. logarithmic function), then E[g(x)]g(E[X]) Similarly, if g() is convex (e.g. exponential function), then E[g(x)]g(E[X])

Law of Iterated Expectations

Theorem (Law of Iterated Expectations) E(Y)=EX[E(Y|X)] > This states that the expectation of the conditional expectation is the unconditional expectation. > > In other words the average of the conditional averages is the unconditional average.

Law of Total Variance

Theorem (Law of Total Variance) Var(Y)=VarX(E[Y|X])+EX[Var(Y|X)]

Since variances are always non-negative, the law of total variance implies Var(Y)VarX(E[Y|X])

Distributions

Normal Distribution

We say that a random variable Z has the standard normal distribution, or Gaussian, written ZN(0,1), if it has the density ϕ(x)=12πexp(x22),<x< If ZN(0,1) and X=μ+σZ for μR and σ0, then X has a univariate normal distribution, written XN(μ,σ2). By change-of-variables X has the density f(x)=12πσ2exp((xμ)22σ2),<x<

Multinomial Normal Distribution

We say that the k -vector Z has a multivariate standard normal distribution, written ZN(0,Ik) if it has the joint density f(x)=1(2π)k/2exp(xx2),xRk If ZN(0,Ik) and X=μ+BZ, then the k-vector X has a multivariate normal distribution, written XN(μ,Σ) where Σ=BB0. If σ>0, then by change-of-variables X has the joint density function f(x)=1(2π)k/2det(Σ)1/2exp((xμ)Σ1(xμ)2),xRk

Properties

  1. The expectation and covariance matrix of XN(μ,Σ) are $\mathbb E = \muandVar =\Sigma$.
  2. If (X,Y) are multivariate normal, X and Y are uncorrelated if and only if they are independent.
  3. If XN(μ,Σ) and Y=a+bB, then XN(a+Bμ,BΣB).
  4. If XN(0,Ik), then XXχk2, chi-square with k degrees of freedom.
  5. If XN(0,Σ) with Σ>0, then XΣXχk where k=dim(X).
  6. If ZN(0,1) and Qχk2 are independent then ZQ/ktk, student t with k degrees of freedom.

Normal Distribution Relatives

These distributions are relatives of the normal distribution

  1. χq2i=1qZi2 where ZiN(0,1)
  2. tnZχn2/n
  3. F(n1,n2)χn12/n1χn22/n2

The t distribution is approximately standard normal but has heavier tails. The approximation is good for n30: tn30N(0,1)

Previous
Next