MPO 581 Class number 16/27  Mar 21, 2011

1. Principal components/ EOFs

Suppose you have a 2D data matrix (m x n in size) X = {Xmn}.

Perhaps X is a function of space and time X(x,t), with m spatial points, n temporal points.

"Space" x might be 2D or 3D, but you've used reshape() (Matlab) or reform() (IDL) to collapse
those into one m dimension. In fact it doesn't need to be "space", it can be a set of different
variables (like the principal component of the mud core elemental data).
Basically it is whatever "structural" dimension you want results as a function of,
while the t dimension (with n values) is your "statistical or samples or realizations" dimension.

You could make the spatial matrix of temporal covariances C (with size m x m).
You could make the temporal matrix of spatial covariances C' (with size n x n).
Both are positive semi-definite matrices (even if X is complex).

Any positive semi-definite matrix C of size m x m has a set of m real, positive eigenvalues λ and corresponding orthogonal eigenvectors ei.

Ce
= e

There's a good course at MIT about linear algebra with demos and video: (Strang)
http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/

Principal components analysis (PCA) finds these eigenvectors and eigenvalues of a correlation or covariance matrix, often through the matrix method of singular value decomposition (SVD). The eigenvectors are called Empirical Orthogonal Functions, each of which is a set of "coefficients", and the eigenvalues indicate how much of the total variance they represent. The projection or inner product of the data onto these structures at each time level gives a time series called the Principal Component (PC) Time Series, or "scores".

In essence we model X(x,t) = EOF1(x) PC1(t) + EOF2(x) PC2(t) + ...

Each term represents the most variance it can, and the total set of m EOF/PC pairs is enough to reconstruct the whole data set (all variance is retained). To see more theory, read the handout (Hsieh Ch2), a very concise treatment, or the other resource linked on the web page: the vSZ textbook, or Tim DelSole's chapter is especially good.

Canonical Correlation Analysis (CCA) is an extension to 2 variables.

how to:   online help   Matlab   IDL 

Open questions, assignments, and loose ends for next class:

Testable questions about today's material:s