MPO 581 (Applied Data Analysis), Spring 2011

Brian Mapes, mapes at miami edu.
First time teaching this... page under heavy construction...

Course and grading matters:

Note: Syllabus (Original plan .pdf ) was superceded somewhat since I discovered wikis.
  • We have small-team projects, and 5 homeworks, each led by one student via Wiki.
  • Other students will earn credit through
    • translating core codes into another language
    • tackling one or more of the "extra credit" extensions of each homework
    • contributing especially well to their team
  • There is a Wiki page of "testable questions": students (and me) contribute, it is open to all at all times
At grading time, each student proposes their grade (don't be shy: A) and sends me a set of links pointing (broadly) to your areas of main contribution: parts of your team's work, homework extensions or translations, testable questions added, etc.

Resources:

Books/notes:

Link to ebrary book, "Statistical Analysis in Climate Research", von Storch and Zwiers (vSZ) Off-campus access link
Link to "Machine Learning in the Environmental Sciences", Hsieh, as handed out in class.
Similar course on climate statistics at Wisconsin, using Hartmann/Wallace (U. of Washington) course notes.

Computing:

Computer and language links

Math symbols for your Web pages: a cut-paste cheat sheet
IDL-Matlab-Python translation sheet (PDF) from Mathesaurus


Homework leaders and team leaders - table

Homework Wikis - "TA" leads; we all build up the core code examples; individual student pages display results and link to your actual code file you used (with your, unique comments in it).

HW1 Wiki - (Emily is "TA") Read netCDF {var1, var2,...} as f(lon, lat, mo)  & some basic stats (motivated by science)
HW2 Wiki - (Adam is "TA") Histograms, and bin averages of other fields (aka composites)
HW3 Wiki - (Milan is "TA") Co-variance, correlation, regression and curve fitting  
HW4 Wiki - (Changheng is "TA") Scale decomposition, spectral (Fourier) methods 
HW5 Wiki - (Hosmay is "TA") Eigen-analysis of correlation matrices


Team Wikis -- Leader creates it, members can edit it, all can see

Time-height data team (atmosphere: ARM site data, radars, etc.): Siwon (lead), Emily; Katinka, Sil (auditors)
Eddies/weather (structures in complicated 2D fields): Greta (lead), Changheng, Joni, David; Trip (auditor) ?
Climate diagnostics (mostly monthly maps) - Angela (lead), Jie, Johnna, Hosmay
Exploiting historical data: Teddy (lead), Milan, Amelia
SST (Minnett team) - Adam, Yang, Elizabeth; Trip (auditor) ?


Class Sessions: 27 of them

  1. (Jan 19) Introduction to course
  2. (Jan 24) Bits, bytes, words; numbers and characters; data types and operations; arrays
  3. (Jan 26) Fields and lists. Imperfect data challenges. Closer to assigning Teams and roles...
  4. (Jan 31) Basic stats: averages, variance, etc.  Finally, computing something!   --> introduce HW1 Wiki
  5. (Feb 2) averages, moments, on to distributions
  6. (Feb. 7) The Most Interesting 5% of Statistics  stray link: Wolfram statistics
  7. (Feb 9) Distributions of under-sampled Normal Populations (t, χ2, F); CDFs & histogram demo         
  8. (Feb 14) HW1 due; 2D binning example material.
  9. (Feb 16) Histograms, scatterplots, multidimensional histograms  Assign HW2
  10. (Feb 21) Histogram bins (modes or quantiles) as keydates for composites.
  11. (Feb 23) Automation in code. Decomposability of variance into averages and deviations (pptx).
Start working from book:
Hsieh, Machine Learning Methods in the Env. Sciences, Cambridge Press 2009.
His chapters 1-3 are the most compact treatment of the essentials of the correlation-regression-covariance-spectra-EOFs nexus of methods that I have found. He is trying to get them out of the way to get on to fancier ideas in that book, so the treatment is very brisk. That is great for us, so I will basically go through Chapters 1-3 as lectures in class, to fix notation and ideas. A time-longitude data set of a few variables along the equator will give us material for all the remaining homeworks and examples, relevant to all teams without being any team's particular work.
  1. (Feb 28) Hsieh Ch 1: "Basic notions in classical data analysis". Regression, line fitting, correlation. 
  2. (Mar 2) Hsieh Ch 1 and 2HW2 review
  3. (Mon Mar 7) Hsieh Ch. 2 Correlation line  = Principal Component #1  
  4. (Wed Mar 9) Hsieh Ch. 2  Principal component/EOF methods. Prof. Tim Del Sole's notes (PDF)   vSZ book chapter

  5. (Mon Mar 21) Principal components     how to: online help pdfs   Matlab   IDL  
  6. (Wed Mar 23) Assign HW 3-4-5 w/ same datasets: everybody pick 2 for correlation/covariance/cospectra/CEOF
  7. (Mon Mar 28) Fourier crash course: Convolution theorem, Parseval's theorem, FT pairs (Gaussian<->Gaussian, wow).
  8. (Wed Mar 30) Wavelet analysis (Pedro DiNezio)
  9. (Mon Apr 4) Cospectra: coherence and phase between 2 fields. Handout: Hsieh Ch3. HW3 (partially) due.
  10. (Apr 6)
  11. (Apr 11)
  12. (Apr 13)
  13. (Apr 18) HW4 results presentation (Changheng Chen)
  14. (Apr 20) Historical team; ARM (time-height) team.
  15. (Apr 25) HW5 results presentation, SST team.
  16. (Apr 27) Climate var team, complex fields team

Team pages

Testable questions compilation