2021-10-29
A real \(n \times m\) matrix \(A\) is an array
\[ A= \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1m} \newline a_{21} & a_{22} & \dots & a_{2m} \newline \vdots & \vdots & \ddots & \vdots \newline a_{n1} & a_{n2} & \dots & a_{nm} \end{bmatrix} \]
We write \([A]_ {ij} = a_ {ij}\) to indicate the \((i,j)\)-element of \(A\).
We will usually take the convention that a real vector \(x \in \mathbb R^n\) is identified with an \(n \times 1\) matrix.
The \(n \times n\) identity matrix \(I_n\) is given by
\[
[I_n] _ {ij} = \begin{cases} 1 \ \ \ \text{if} \ i=j \newline
0 \ \ \ \text{if} \ i \neq j \end{cases}
\]
The trace of a square matrix \(A\) with dimension \(n \times n\) is \(\text{tr}(A) = \sum_{i=1}^n a_{ii}\).
The determinant of a square \(n \times n\) matrix A is defined according to one of the following three (equivalent) definitions.
Vectors \(x_1,...,x_k\) are linearly independent if the only solution to the equation \(b_1x_1 + ... + b_k x_k=0, \ b_j \in \mathbb R\), is \(b_1=b_2=...=b_k=0\).
The rank of a matrix, \(rank(A)\) is equal to the maximal number of linearly independent rows for \(A\).
Let \(A\) be an \(n \times n\) matrix. The \(n \times 1\) vector \(x \neq 0\) is an eigenvector of \(A\) with corresponding eigenvalue \(\lambda\) is \(Ax = \lambda x\).
Theorem: Let \(A\) be an \(n \times n\) symmetric matrix. Then \(A\) can be factored as \(A = C \Lambda C'\) where \(C\) is orthogonal and \(\Lambda\) is diagonal.
If we postmultiply \(A\) by \(C\), we get
This is a matrix equation which can be split into columns. The \(i\)th column of the equation reads \(A c_i = \lambda_i c_i\) which corresponds to the definition of eigenvalues and eigenvectors. So if the decomposition exists, then \(C\) is the eigenvector matrix and \(\Lambda\) contains the eigenvalues.
Theorem: The rank of a symmetric matrix equals the number of non zero eigenvalues.
Proof: \(rank(A) = rank(C\Lambda C') = rank(\Lambda) = | \lbrace i: \lambda_i \neq 0 \rbrace |\). \[\tag*{$\blacksquare$}\]
Theorem: The nonzero eigenvalues of \(AA'\) and \(A'A\) are identical.
Theorem: The trace of a symmetric matrix equals the sum of its eignevalues.
Proof: \(tr(A) = tr(C \Lambda C') = tr((C \Lambda)C') = tr(C'C \Lambda) = tr(\Lambda) = \sum_ {i=1}^n \lambda_i.\) \[\tag*{$\blacksquare$}\]
Theorem: The determinant of a symmetric matrix equals the product of its eignevalues.
Proof: \(det(A) = det(C \Lambda C') = det(C)det(\Lambda)det(C') = det(C)det(C')det(\Lambda) = det(CC') det(\Lambda) = det(I)det(\Lambda) = det(\Lambda) = \prod_ {i=1}^n \lambda_i.\) \[\tag*{$\blacksquare$}\]
Theorem: For any symmetric matrix \(A\), the eigenvalues of \(A^2\) are the square of the eignevalues of \(A\), and the eigenvectors are the same.
Proof: \(A = C \Lambda C' \implies A^2 = C \Lambda C' C \Lambda C' = C \Lambda I \Lambda C' = C \Lambda^2 C'\) \[\tag*{$\blacksquare$}\]
Theorem: For any symmetric matrix \(A\), and any integer \(k>0\), the eigenvalues of \(A^k\) are the \(k\)th power of the eigenvalues of \(A\), and the eigenvectors are the same.
Theorem: Any square symmetric matrix \(A\) with positive eigenvalues can be written as the product of a lower triangular matrix \(L\) and its (upper triangular) transpose \(L' = U\). That is \(A = LU = LL'\)
Note that \[ A = LL' = LU = U'U = (L')^{-1}L^{-1} = U^{-1}(U')^{-1} \] where \(L^{-1}\) is lower triangular and \(U^{ -1}\) is upper trianguar. You can check this for the \(2 \times 2\) case. Also note that the validity of the theorem can be extended to symmetric matrices with non- negative eigenvalues by a limiting argument. However, then the proof is not constructive anymore.
A quadratic form in the \(n \times n\) matrix \(A\) and \(n \times 1\) vector \(x\) is defined by the scalar \(x'Ax\).
Theorem: Let \(A\) be a symmetric matrix. Then \(A\) is PD(ND) \(\iff\) all of its eigenvalues are positive (negative).
Some more results:
Theorem: If \(A\) is \(n\times k\) with \(n>k\) and \(rank(A)=k\), then \(A'A\) is PD and \(AA'\) is PSD.
The semidefinite partial order is defined by \(A \geq B\) iff \(A-B\) is PSD.
Theorem: Let \(A\), \(B\) be symmetric,square , PD, conformable. Then \(A-B\) is PD iff \(A^{-1}-B^{-1}\) is PD.
We first define matrices blockwise when they are conformable. In particular, we assume that if \(A_1, A_2, A_3, A_4\) are matrices with appropriate dimensions then the matrix \[ A = \begin{bmatrix} A_1 & A_1 \newline A_3 & A_4 \end{bmatrix} \] is defined in the obvious way.
Let \(F: \mathbb R^m \times \mathbb R^n \rightarrow \mathbb R^p \times \mathbb R^q\) be a matrix valued function. More precisely, given a real \(m \times n\) matrix \(X\), \(F(X)\) returns the \(p \times q\) matrix
\[
\begin{bmatrix}
f_ {11}(X) & ... & f_ {1q}(X) \newline \vdots & \ddots & \vdots \newline
f_ {p1}(X)& ... & f_ {pq}(X)
\end{bmatrix}
\]
The derivative of \(F\) with respect to the matrix \(X\) is the \(mp \times nq\) matrix \[
\frac{\partial F(X)}{\partial X} = \begin{bmatrix}
\frac{\partial F(X)}{\partial x_ {11}} & ... & \frac{\partial F(X)}{\partial x_ {1n}} \newline \vdots & \ddots & \vdots \newline
\frac{\partial F(X)}{\partial x_ {m1}} & ... & \frac{\partial F(X)}{\partial x_ {mn}}
\end{bmatrix}
\] where each \(\frac{\partial F(X)}{\partial x_ {ij}}\) is a \(p\times q\) matrix given by
\[
\frac{\partial F(X)}{\partial x_ {ij}} = \begin{bmatrix}
\frac{\partial f_ {11}(X)}{\partial x_ {ij}} & ... & \frac{\partial f_ {1q}(X)}{\partial x_ {ij}} \newline
\vdots & \ddots & \vdots \newline
\frac{\partial f_ {p1}(X)}{\partial x_ {ij}} & ... & \frac{\partial f_ {pq}(X)}{\partial x_ {ij}}
\end{bmatrix}
\] The most important case is when \(F: \mathbb R^n \rightarrow \mathbb R\) since this simplifies the derivation of the least squares estimator. Also, the trickiest thing is to make sure that dimensions are correct.