OLS Algebra
Last updated on Oct 29, 2021
The Gauss Markov Model
Definition
A statistical model for regression data is the Gauss Markov Model if each of its distributions satisfies the conditions
-
Linearity: a statistical model
over data satisfies linearity if for each element of , the data can be decomposed in -
Strict Exogeneity:
. -
No Multicollinerity:
is strictly positive definite almost surely. Equivalent to require with probability . Intuition: no regressor is a linear combination of other regressors. -
Spherical Error Variance: -
-
The Extended Gauss Markov Model also satisfies assumption
- Normal error term:
and .
Implications
- Note that by (2) and (4) you get homoskedasticity:
- Strict exogeneity is not restrictive since it is sufficient to
include a constant in the regression to enforce it
- This implies
by the LIE. - These two conditions together imply
.
Projection
A map
The Gauss Markov Model assumes that the conditional expectation
function (CEF)
Code - DGP
This code draws 100 observations from the model
# Set seed
Random.seed!(123);
# Set the number of observations
n = 100;
# Set the dimension of X
k = 2;
# Draw a sample of explanatory variables
X = rand(Uniform(0,1), n, k);
# Draw the error term
σ = 1;
ε = rand(Normal(0,1), n, 1) * sqrt(σ);
# Set the parameters
β = [2; -1];
# Calculate the dependent variable
y = X*β + ε;
The OLS estimator
Definition
The sum of squared residuals (SSR) is given by
Consider a dataset
When we can write
Derivation
Theorem
Under the assumption that
Proof
Taking the FOC:
Finally,
The
Futher Objects
- Fitted coefficient:
- Fitted residual:
- Fitted value:
- Predicted coefficient:
- Prediction error:
- Predicted value:
Notes on Orthogonality Conditions
- The normal equations are equivalent to the moment condition
. - The algebraic result
is called ortogonality property of the OLS residual . - If we have included a constant in the regression,
. by strict exogeneity (assumed in GM), but . This is why is just an estimate of .- Calculating OLS is like replacing the
equations with and forcing them to hold (remindful of GMM).
The Projection Matrix
The projection matrix is given by
is a normalized length of the observed regressor vector . In the OLS regression framework it captures the relative influence of observation on the estimated coefficient. Note that .
The Annihilator Matrix
The annihilator matrix is given by
Then we can equivalently write
(defined by stacking into a vector) as .
Estimating Beta
# Estimate beta
β_hat = inv(X'*X)*(X'*y)
## 2×1 Array{Float64,2}:
## 1.8821600407711814
## -0.9429354944506099
# Equivalent but faster formulation
β_hat = (X'*X)\(X'*y)
## 2×1 Array{Float64,2}:
## 1.8821600407711816
## -0.9429354944506098
# Even faster (but less intuitive) formulation
β_hat = X\y
## 2×1 Array{Float64,2}:
## 1.8821600407711807
## -0.9429354944506088
Equivalent Formulation?
Generally it’s not true that
# Wrong formulation
β_wrong = inv(cov(X)) * cov(X, y)
## 2×1 Array{Float64,2}:
## 1.8490257777704475
## -0.9709213554007003
Equivalent Formulation (correct)
But it’s true if you include a constant,
# Correct, with constant
α = 3;
y1 = α .+ X*β + ε;
β_hat1 = [ones(n,1) X] \ y1
## 3×1 Array{Float64,2}:
## 3.0362313477745615
## 1.8490257777704477
## -0.9709213554007007
β_correct1 = inv(cov(X)) * cov(X, y1)
## 2×1 Array{Float64,2}:
## 1.8490257777704477
## -0.9709213554007006
Some More Objects
# Predicted y
y_hat = X*β_hat;
# Residuals
ε_hat = y - X*β_hat;
# Projection matrix
P = X * inv(X'*X) * X';
# Annihilator matrix
M = I - P;
# Leverage
h = diag(P);
OLS Residuals
Homoskedasticity
The error is homoskedastic if
The error is heteroskedastic if
Residual Variance
The OLS residual variance can be an object of interest even in a
heteroskedastic regression. Its method of moments estimator is given by
Note that
can be rewritten as
However, the method of moments estimator is a biesed estimator. In fact
Under conditional homoskedasticity, the above expression simplifies to
Sample Variance
The OLS residual sample variance is denoted by
The sum of squared residuals can be rewritten as:
.
The OLS residual sample variance is an unbiased estimator of the error
variance
Another unbiased estimator of
Uncentered R^2
One measure of the variability of the dependent variable
The uncentered
Centered R^2
A more natural measure of variability is the sum of centered squares
The coefficient of determination,
Always use the centered
unless you really know what you are doing.
Code - Variance
# Biased variance estimator
σ_hat = ε_hat'*ε_hat / n;
# Unbiased estimator 1
σ_hat_2 = ε_hat'*ε_hat / (n-k);
# Unbiased estimator 2
σ_hat_3 = mean( ε_hat.^2 ./ (1 .- h) );
Code - R^2
# R squared - uncentered
R2_uc = (y_hat'*y_hat)/ (y'*y);
# R squared
y_bar = mean(y);
R2 = ((y_hat .- y_bar)'*(y_hat .- y_bar))/ ((y .- y_bar)'*(y .- y_bar));
Finite Sample Properties of OLS
Conditional Unbiasedness
Theorem
Under the GM assumptions (1)-(3), the OLS estimator is conditionally
unbiased, i.e. the distribution of
Proof
OLS Variance
Theorem
Under the GM assumptions (1)-(3),
Proof:
Higher correlation of the
Intuition: individual observations carry less information. You are exploring a smaller region of the
space.
BLUE
Theorem
Under the GM assumptions (1)-(3),
Theorem
Under the GM assumptions (1)-(3),
BLUE Proof
Consider four steps:
- Define three objects: (i)
, (ii) such that , and (iii) . - Decompose
as - By assumption,
must be unbiased: Hence, it must be that
BLUE Proof (2)
- We know by (2)-(3) that
. We can now calculate its variance. since , and .
is meant in a positive definite sense.
Code - Variance
# Ideal variance of the OLS estimator
var_β = σ * inv(X'*X)
## 2×2 Array{Float64,2}:
## 0.0609402 -0.0467732
## -0.0467732 0.0656808
# Standard errors
std_β = sqrt.(diag(var_β))
## 2-element Array{Float64,1}:
## 0.24686077212177054
## 0.25628257446345265