ESS116
ESS 116
Introduction to Data Analysis in Earth Science
Image Credit: NASA
Instructor: Mathieu Morlighem
E-mail: mmorligh@uci.edu (include ESS116 in subject line)
Office Hours: 3218 Croul Hall, Friday 2:00 pm – 3:00 pm
This content is protected and may not be shared uploaded or distributed
Course introduction
Syllabus review
Why this class matters
Logistics
Lecture 1 – Introduction to MATLAB
Why MATLAB?
Starting with MATLAB
Path and common file operations
Variable types and Casting
Random numbers
Today’s lecture
This content is protected and may not be shared uploaded or distributed
Mathieu Morlighem
Professor of Earth System Science
Ice sheet, glacier dynamics, Sea level rise
Developer of a JPL/UCI new generation ice sheet model
Simulation on big computers
Office Hours:
Fridays 8:30 – 10:30 pm or by appointment (Zoom)
Office:
3218 Croul Hall (3rd floor)
Email: mathieu.morlighem@uci.edu
Please include ESS116 in the subject line
About Me
This content is protected and may not be shared uploaded or distributed
Where I am from
2008-2014 NASA JPL
What I am working on
You are here
About Me
This content is protected and may not be shared uploaded or distributed
Our TA
Chia-Chun Liang
chiachl6@uci.edu
Office hours:
Thursday 2-3pm
Zoom
This content is protected and may not be shared uploaded or distributed
About you
This content is protected and may not be shared uploaded or distributed
About you
This content is protected and may not be shared uploaded or distributed
Please send me an e-mail by the next lecture (include ESS116 in the subject line) the answers to the following:
Name
Major
Why are you taking this class? (be honest…)
What do you expect from this class?
Have you had any programming experience before?
First assignment (ungraded)
This content is protected and may not be shared uploaded or distributed
This course is highly multi-disciplinary:
Statistics (mean, standard deviation, modes, confidence intervals…)
Probability (Probability Density functions, Normal distributions,…)
Data visualization (histogram, curves, image processing,…)
Hypothesis testing (t-test, χ2 test)
Interpolation, Extrapolation, Predictions,…
Required (new) skill: Programming
MATLAB
No previous programming experience required
This course will bump up your CV!
About this course
This content is protected and may not be shared uploaded or distributed
The National Association of Colleges and Employers (NACE) surveys employers nationally and asks them to rate the top qualities and skills they seek in new college hires.
Why this course matters
Ability to work well on a team
Leadership
Written communications skills
Problem-solving skills
Strong work ethic
Analytical / quantitative skills
Technical skills.
Verbal communication skills.
Initiative
Computer skills
Flexibility / adaptability
Interpersonal skills
Detail-oriented
Organizational ability
Strategic planning skill
This content is protected and may not be shared uploaded or distributed
The National Association of Colleges and Employers (NACE) surveys employers nationally and asks them to rate the top qualities and skills they seek in new college hires.
Why this course matters
Ability to work well on a team
Leadership
Written communications skills
Problem-solving skills
Strong work ethic
Analytical / quantitative skills
Technical skills
Verbal communication skills.
Initiative
Computer skills
Flexibility / adaptability
Interpersonal skills
Detail-oriented
Organizational ability
Strategic planning skill
This course will be a direct attack to 7 of the top 15!
Including indirectly, on 13 of the top 15!
This content is protected and may not be shared uploaded or distributed
Labs – Mandatory and very important. It will be impossible to pass this class if you do not attend.
ESS 116: Data Analysis Syllabus
Suggested reading
Stormy Attaway “MATLAB A practical introduction to programming and problem solving” (3-5 Ed.)
Deborah Rumsey “Statistics Essentials for Dummies”
Martin H. Trauth: “MATLAB Recipes for Earth Sciences” (4th Ed.)
Available on class website https://canvas.eee.uci.edu/courses/23989
This content is protected and may not be shared uploaded or distributed
Class website: https://canvas.eee.uci.edu/courses/23989 (Syllabus, Lecture Notes, Labs, etc.)
Provisional grading Scheme:
Participation (watch all lectures, do all quizzes) 50 pts 5%
Lab Assignments (HW) 450 pts 45%
Mid-term 250 pts 25%
Final Exam 250 pts 25%
Extra Credit (Evaluations) 10 pts 1%
ESS 19: Website & Syllabus Review
This content is protected and may not be shared uploaded or distributed
Do you have any programming experience?
Yes, only MATLAB
Yes, other than MATLAB (python, java, C, C++, etc.)
Yes, MATLAB and other languages as well
No
Example i>Clicker question
This content is protected and may not be shared uploaded or distributed
WEEK TOPIC LAB
1 Introduction and MATLAB Introduction to MATLAB
2 File I/O vectors and matrices in MATLAB Vectors and matrices
3 MATLAB programming Scripts and functions
4 Selection statements and loops Loops
5 Descriptive Statistics Statistics
6 Hypothesis testing t-tests
7 Curve fitting and interpolation (+MIDTERM) MIDTERM
8 Image processing Glacier retreat
9 Time series and bivariate statistics GRACE and El Niño
10 Review session FINAL EXAM (lab)
11 FINAL EXAM (June 11th 11:30-1:30pm)
Class schedule
Midterm exam:
Week 7, part 1 (in class) Tuesday May 12th after short lecture 2:30-3:20am
Week 7, part 2 (in lab) during your regular lab session (Wednesday May 13th 9am or 1pm)
Final exam:
Week 10, part 1 (in lab) during your regular lab session (Wednesday June 3rd 9am or 1pm)
Week 11, part 2 (in class) Thursday June 11th 11:30am-1:30pm
During lab sessions:
Chia-chun (and me) will be available to assist you
Wednesday 9:00-11:50am and 1:00-3:50pm
Office hours
Chia-chun: Thursday 2-3pm
Me: Fridays 8:30am-10:30am
All zoom links will be available on the Syllabus (canvas)
Email us outside of these times
PLEASE! Reach out, we are here to help!
Zoom
This content is protected and may not be shared uploaded or distributed
Cheating in any form is never allowed. If a student is caught cheating, the guidelines established by the UCI Academic Senate will be followed and, at the very least, the student will receive a zero for the work in question.
UCI Academic Senate guidelines are on the web in detail at: https://aisc.uci.edu/students/academic-integrity/index.php
This includes:
Group answers for homeworks and exams, i.e., don’t copy
copying and pasting code from a web source
Don’t use other people’s i>Clicker
Posting Course Materials online (e.g. Course Hero, Koofers, Chegg, etc) violates university policy. Students caught doing so are subject to a failing grade in the course and disciplinary action.
Academic Honesty and Civility
This content is protected and may not be shared uploaded or distributed
Any Questions so far?
Questions
This content is protected and may not be shared uploaded or distributed
Lecture 1 – Introduction to MATLAB
Why MATLAB?
Earth Scientists Deal with Large Datasets
Processing and Visualization Should be Automated
Can make your own tools
Research is new, so no tools may exist
Why Learn To Code?
Ice flow and Bed topography of Store Gletscher, West Greenland
It doesn’t really matter
Once you know one language, you can learn new ones
Most languages are more similar than different
Commonly used programming languages in Earth sciences
MATLAB / Octave
Python
C / C++
Fortran
Perl
Java / JavaScript
Mathematica / Maple
R
Various Linux/UNIX shell scripting languages (sh, bash, csh, tcsh, ksh, etc…)
Lots of others, too…
All have strengths and weaknesses
What Language Should I Learn?
This content is protected and may not be shared uploaded or distributed
Scientists use MATLAB a lot. Why?
Easier than traditional programming
Runs on all major platforms (Win, Mac, Linux/UNIX)
It is a full programming language (unlike Excel), so tasks can be fully automated (saves time)
Has tons of built-in functions that do complex math
diff, inv, chol, svd, fft, eig, and lots more!
Has built-in plotting and visualization tools
MATLAB’s Strengths
Above: Mathematical Inversion of GPS data for strain in SoCal
Above: Fault slip rates on the Sierra Madre fault in SoCal
MATLAB is not free
can become expensive if you use toolboxes
FREE for UCI students !! (see class website)
Can be slow for some operations
Launching MATLAB is very slow
Interpreted language (not formally compiled): Language is converted into machine language on the fly
Good news: most commands are highly optimized
Awkward handling of non-numeric data
MATLAB’s Weaknesses
Slip rate vectors on the Hollywood fault, CA
This content is protected and may not be shared uploaded or distributed
Nearly anything that you need to know is in the MATLAB documentation
Online: http://www.mathworks.com/help/matlab/
In MATLAB Command Window: “doc matlab”
Don’t be afraid to Google it!
Don’t copy code verbatim!
You must be able to explain what you did
Code in assignments cannot exceed what we have covered
The Attaway text is also a nice reference
MATLAB is Well-Documented
This content is protected and may not be shared uploaded or distributed
Starting MATLAB
Starting MATLAB
The Command Window
Starting MATLAB
Command History
Starting MATLAB
Workspace
(List/Info of defined variables)
Starting MATLAB
Current Folder
& File Info
Starting MATLAB
Current Folder (clickable)
The current folder is important to pay attention to!!!
When you save something, this is where it is saved (unless specified)
If you want to execute a script or function, it must be here or in your system path
To automate many tasks: must know how to navigate the file system with keyboard commands!
For these reasons, we must learn about paths and associated commands
File operations
MATLAB offers a set of commands that perform common file operations.
http://www.mathworks.com/help/matlab/file-operations.html
You should know how to use
ls, pwd, cd, copyfile, delete, mkdir, movefile, rmdir
MATLAB: Common File Operations
This content is protected and may not be shared uploaded or distributed
Hard disks are subdivided into directories and subdirectories
Sometimes called “folders”
The top level directory: root directory
“C:” on windows (sometimes D: or any other letter)
“/” on Mac and Linux/UNIX
You can use either / or C: in MATLAB (supports both)
Sample File System Directory Tree
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
Path: “Address” of a file or directory in a file system
Use “/” or “\” to separate directory names (MATLAB accepts both)
Absolute path: start from the root directory (C: or /)
Relative path: start from the current directory
. means “current directory”
.. means “parent directory”
Sample File System Directory Tree
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
To change current location, you must call cd followed by the path to where you want to go
Absolute Path: begins with the root directory
>> cd /Users/mmorligh
>> cd C:\Users\mmorligh
Sample File System Directory Tree
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
You can specify a relative or absolute path:
Absolute Path: begins with the root directory
>> cd /Users/mmorligh/Documents
Relative Path: assumes you are starting from your pwd
>> cd Documents
Sample File System Directory Tree
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
You can specify a relative or absolute path:
Absolute Path: begins with the root directory
>> cd /Users/mmorligh/Documents/MATLAB
Relative Path: assumes you are starting from your pwd
>> cd Documents/MATLAB
Absolute vs. Relative Paths
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
Absolute vs. Relative Paths
You can specify a relative or absolute path:
Absolute Path: begins with the root directory
>> cd /Users/Shared
Relative Path: assumes you are starting from your pwd
>> cd ../../../Shared
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
i>Clicker question
You are in “MATLAB”, How do you get to “Desktop”?
>> cd ../Desktop
>> cd ../../Desktop
>> cd ../../../Desktop
>> cd /Users/Network/mmorligh/Desktop
Users
Guest
Shared
mmorligh
Adobe
Library
Desktop
Desktop
Downloads
scripts
Documents
Adobe
MATLAB
ISSM
seism
Pictures
C/C++
Microsoft
Apple
/ or C:
Library
Network
System
This content is protected and may not be shared uploaded or distributed
MATLAB provides a command to display the contents of a directory
>> ls
Sometimes you may only want to see certain files.
Use the “*” wildcard.
Only list files ending in “.txt”
>> ls *.txt
Only list files beginning with “data”
>> ls data*
Only list files that have “temp” anywhere in the file name
>> ls *temp*
Listing Directory Contents & Wildcards
This content is protected and may not be shared uploaded or distributed
What does this list?
>> ls file*412*.dat
All the files that start with “file”
All the files that start with “.dat”
412 files that start with “file” and end with “.dat”
All the files that start with “file”, end with “.dat” and have “412” somewhere in between
i>Clicker question
This content is protected and may not be shared uploaded or distributed
Let’s try it out!
>> cd
>> pwd
>> ls
>> movefile
>> copyfile
>> mkdir
>> delete
>> rmdir
>> %comments
MATLAB Time
This content is protected and may not be shared uploaded or distributed
Basic MATLAB usage
Values can be stored in variables using MATLAB
Variable goes on left, what you want to put goes on right
>> variablename = expression
>> a = 6
>> 6 = a %This gives an error!
>> mass = 2.5e9
MATLAB follows order of operations and parentheses ()
Don’t mix (), [], {}
>> badIdea = ([2+3]*4)+6
These [] {} often have special meanings
Assigning Variables
This content is protected and may not be shared uploaded or distributed
Must begin with a letter of the alphabet
>> myvar = 4
>> 2num = 6 % Does not work !
>> rad23 = 2.3e3
There is a limit to the length of a variable name. Typically ≤ 63 characters
MATLAB is case-sensitive, so capitalization matters
>> mynum = 4
>> MYNUM = 6
>> myNum = 8
Certain words are reserved and cannot be used as variable names
>> for = 4 %this is not allowed
Names of built-in functions should not be used as variable names
>> sin = 2 %technically allowed, but may cause problems
Variable names should be mnemonic & short
>> aabcaded = 2 %what is aabcaded?
>> temperatureInIrvineCAAug19RecordedByChrisSmithTannerTheSecond=2 %Yuk!
>> tempCA = 2 %much better
Variable Name Rules
This content is protected and may not be shared uploaded or distributed
All functions in MATLAB use a similar basic syntax:
functionCall(arguments)
examples:
>> sin(4*pi)
>> sqrt(64)
Non-numeric arguments: surrounded with single quotes
>> disp(‘Hello’)
Many functions return values that can be stored:
output = functionCall(arguments)
example:
>> angleRad=atan(0.5)
Let’s try it out!
Basic MATLAB Usage
This content is protected and may not be shared uploaded or distributed
MATLAB allows for several different classes of variables
double (double precision): the default type for floating point numbers
(e.g. 2.3, 5.1234, -2.543e+12, etc)
int8, int16, int32, int64 (integer): integers (saves memory See Ch1 of Attaway)
(e.g. 1, 2, 100, 123, -530, etc)
char (character): single character (e.g. ‘x’ or ‘z’). A group of characters is called a string (e.g. ‘hello’ or ‘thisIsAString’)
logical: stores true/false values
Once a variable has been defined, it can be changed into another type by type casting x = double(x)
Use whos to see the list of variables and their types
Variable Types and Casting
MATLAB can generate pseudo-random numbers
Useful for creating synthetic data, or adding noise
Gives a random floating point number between 0-1
>> rand
To get random integers use round
>> round(rand*10) %gives rand integers from 0-10
Random Numbers
This content is protected and may not be shared uploaded or distributed
MATLAB Commands to remember
Lab 1: Introduction to MATLAB
DUE: one week after the lab starts (canvas)
Bring a USB drive !!
What’s next?
Lecture 2: File Input/output, vectors and matrices
This content is protected and may not be shared uploaded or distributed
/tmp/tp1dfaf2c6_024a_4aed_b9cc_c2eb40d25e0d.eps
1 5 20 65 230 800 3000 10000
70.4
°
N
70.6
°
N
5
0
.5
°
W
5
0
.0
°
W
a
Ice velocity (m/yr)
10 km
−1000 −800 −600 −400 −200 0 200 400 600 800 1000
b
Bed topography (m)
ESS116: MATLAB Cheat Sheet
1 Path and file operations
cd Change Directory (followed by absolute or relative path of a directory)
cd ../../Shared (relative path)
cd /Users/Shared (absolute path)
pwd display current directory’s absolute path (Path Working Directory)
ls display list of files and directories in the current directory
(can be followed by a path and/or file name pattern with *)
ls ../file*mat
ls *.txt
ls /Users/mmorligh/Desktop/
copyfile copy existing file into a new directory, and/or rename a file
copyfile(‘/Users/Shared/foo.txt’,’.’);
copyfile(‘foo.txt’,’bar.txt’);
mkdir create a directory
mkdir Lab1
2 Fundamental MATLAB classes
double floating point number (1.52, pi, …) → MATLAB’s default type
int8 Integer between -128 and 127 (8 bits, saves memory)
uint8 Unsigned integer between 0 and 255 (used primarily for images)
int16 Integer between -32768 and 32767 (16 bits)
logical true/false
string data type for text (str = ‘This is a string’;)
cell cell array, used by textscan
3 Matrices
Use square [] to create a matrix, and ; to separate rows
A=[1 2 3;4 5 6;7 8 9];
ones, zeros create a matrix full of ones or zeros
A=ones(5,2);
‘ transpose a matrix
B=A’;
length return length of a vector (do not use for matrices)
size returns the size of a matrix
(number of rows then columns, then 3rd dimension if 3D, etc)
[nrows,ncols]=size(A); [nrows,ncols,nlayers]=size(A3D);
linspace and : to create vectors
A=2:3:100;
A=linspace(2,100,10);
find return the linear indices where a condition on the elements of a matrix is met
pos=find(A==−9999);
pos=find(A>100);
Extract the first 10 even columns of a matrix
B=A(:,2:2:20);
Removing elements: use empty brackets
A(:,2)= [];
Concatenate matrices
A=’This is ‘; B=[A ‘an example’];
Replacing elements in a matrix (use either linear or row,col notation)
A(10,3)=5.5;
pos=find(A==−9999);
A(pos)= 0;
Element-by-element operation: use a dot (.) before the operator
A= C.*D;
4 I/O
load loads a MATLAB file (*.mat) into the workspace, or a text file with only numbers
and consistent number of columns
load(‘data.mat’);
data=load(‘data.txt’);
textscan loads a text file into a cell array (as many elements as there are columns in the file)
Use %d for integers, %f for floating point numbers %s for strings
fid = fopen(‘filename’);
data = textscan(fid,’%d %f %s %s’,’Headerlines’,5);
fclose(fid);
%Put first column in A, and second column in B
A = data{1}; B = data{2};
5 fprintf
fprintf print text (and variables) to the screen. First argument is a string with placeholders.
fprintf(‘The radius is %7.2f and A = %d !!\n’,EarthRadius,10);
– Special characters: \n (new line) %% (percent sign) ” (apostrophe)
– Variable specifiers: %s (string) %d (integer) %e (exponential) %f (float)
– %010.3f: leading 0, 10 total spaces, 3 decimals. Ex: 000003.142
6 Visualization
plot displays a list of points (x,y)
plot(x,y,’−r’);
plot(x,y,’r+:’,’MarkerFaceColor’,’g’,’MarkerSize’,5,’LineWidth’,2);
axis controls x and y axes
axis([xmin xmax ymin ymax]);
axis equal tight
legend adds a legend to previously plotted curves
legend(‘First curve’,’second curve’)
figure creates a new figure window
figure(2)
xlabel/ylabel/title control x/y axis labels and plot title
xlabel(‘Distance (km)’);
hold on keep current plot so that whatever follows is plotted on the same plot
subplot divide figure into several subplots
subplot(2,3,1)
histogram make a histogram for a vector
histogram(tmax,20);
histogram(tmax,round(sqrt(length(tmax))));
7 Relational and Logical operators
== equal to, ~= not equal to, > greater than, >= greater than or equal to,
< less than, <= less than or equal to.
&& and, || or, ~ not.
A=( (1>10)|| (3~=4));
8 If/elseif/else and for loops (examples)
Counting algorithm
%Initialize counters
counter1 = 0;
counter2 = 0;
%Go over all of the elements of T and increment counters when a condition is met
for i=1:length(T)
if T(i)>100
counter1 = counter1+1
elseif T(i)<0
counter2 = counter2+1
end
end
fprintf('Found %d days with T>%f, and %d with T<%f\n',counter1,100,counter2,0)
Extracting (after counting!)
%You first need to count how many times T>100 (for example), then: allocate memory
hotdays = zeros(counter1,1);
%Go through T, again, and store temperatures>100 in hotdays
count = 1;
for i=1:length(T)
if T(i)>100
hotdays(count)=T(i);
count = count+1;
end
end
9 Statistic
mean computes mean of a vector
median computes median of a vector
std computes standard deviation
min returns minimum value in a vector
max returns minimum value in a vector
skewness returns skewness
kurtosis returns kurtosis
normcdf/tcdf/chi2cdf cumulative density function for a normal, t and χ2 distributions
norminv/tinv/chi2inv inverse of the cumulative density function
p0 = normcdf(x0,mu,sigma);
x0 = norminv(p0,mu,sigma);
p0 = tcdf(x0,V);
x0 = tinv(p0,V);
10 Polynomials and interpolations
polyval returns the value of a polynomials (represented by its coefficient) for some x
coeff = [3 2.7 1 −5.7];
x=0:0.2:0.6;
y=polyval(coeff,x)
polyfit returns the coefficient of the polynomials that best fit data points
coeff = polyfit(datax,datay,3); %3 means cubic polynomial
interp1 interpolates between data points (spline or linear)
y1=interp1(datax,datay,’linear’);
y2=interp1(datax,datay,’spline’);
11 Image processing
imread loads an image (as a matrix) into the workspace
A=imread(‘image.png’)
image display an image (2D or 3D)
imagesc display a 2D image and scale indices to use all the colors in the color map
colormap set a colormap (only for indexed images)
Indexed Images (2D)
Indexed image, need two matrices:
iMat a 2D matrix with indices
cMap is a nx3 2D matrices with the RGB code for each index
To display this image:
image(iMat);
colormap(cMap);
If the colormap is not consistent with indices, you need to use imagesc(iMat).
True-color image (3D, RGB)
No need to prescribe a colormap:
iMat(:,:,1) Red matrix (between 0–255 if uint8, or 0–1 if double)
iMat(:,:,2) Green matrix
iMat(:,:,3) Blue matrix
To display this image: image(iMat).
12 Miscellaneous
rand returns a random floating point number between 0 and 1
x=rand
x=rand(10,2)
round round input to closest integer
A=round(rand*10);
whos displays list of all variables in MATLAB workspace
tic/toc displays cpu time for a chunk of code
sqrt square root
13 Functions
Calling a function: [output1,output2] = functionname(arg1,arg2);
Function header (top lines of the file that implements this function):
function [output1,output2] = functionname(arg1,arg2)
% H1 line: describe what the function does
ESS116, M. Morlighem, Updated: March 14, 2019