MPO 581 Class number 16/27 Mar 21, 2011
1. Principal components/ EOFs
Suppose you have a 2D data matrix (m x n in size) X = {Xmn}.
Perhaps X is a function of
space and time X(x,t), with m
spatial points, n temporal points.
"Space" x might be 2D or 3D, but you've used reshape() (Matlab) or
reform() (IDL) to collapse
those into one m dimension. In fact it doesn't need to be "space", it
can be a set of different
variables (like the principal component of the mud core elemental
data).
Basically it is whatever "structural" dimension you want results as a
function of,
while the t dimension (with n values) is your "statistical or samples
or realizations" dimension.
You could make the spatial matrix of temporal covariances C (with size
m x m).
You could make the temporal matrix of spatial covariances C' (with size
n x n).
Both are positive semi-definite matrices (even if X is complex).
Any positive semi-definite matrix C
of size m x m has a set of m real,
positive
eigenvalues
λ and corresponding orthogonal
eigenvectors ei.
Ce = e
There's a good course at MIT about linear algebra with demos and video:
(Strang)
http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
Principal components analysis (PCA) finds these eigenvectors and
eigenvalues of a correlation or covariance matrix, often through the
matrix method of singular value decomposition (SVD).
The eigenvectors are called Empirical Orthogonal Functions, each of
which is a set of "coefficients", and the eigenvalues indicate how much
of the total variance they represent. The projection or inner product
of the data onto these structures at each time level gives a time
series called the Principal Component (PC) Time Series, or "scores".
In essence we model X(x,t) =
EOF1(x) PC1(t) + EOF2(x) PC2(t) + ...
Each term represents the most variance it can, and the total set of m
EOF/PC pairs is enough to reconstruct the whole data set (all variance
is retained). To see more theory, read the handout (Hsieh Ch2), a very
concise treatment, or the other resource linked on the web page: the
vSZ textbook, or Tim DelSole's
chapter is especially good.
Canonical Correlation Analysis (CCA) is an extension to 2 variables.
how to: online help Matlab IDL
Open questions, assignments, and
loose
ends for next class:
Testable questions about today's
material:s