ESS116
ESS 116
Introduction to Data Analysis in Earth Science
Image Credit: NASA
Instructor: Mathieu Morlighem
E-mail: mmorligh@uci.edu (include ESS116 in subject line)
Office Hours: 3218 Croul Hall, Friday 2:00 pm – 3:00 pm
This content is protected and may not be shared uploaded or distributed
WEEK TOPIC LAB
1 Introduction and MATLAB Introduction to MATLAB
2 File I/O vectors and matrices in MATLAB Vectors and matrices
3 MATLAB programming Scripts and functions
4 Selection statements and loops Loops
5 Descriptive Statistics Statistics
6 Hypothesis testing t-tests
7 Curve fitting and interpolation (+MIDTERM) MIDTERM
8 Image processing Glacier retreat
9 Time series and bivariate statistics GRACE and El Niño
10 Review session FINAL EXAM (lab)
11 FINAL EXAM (June 11th 11:30-1:30pm)
Class schedule
Midterm exam:
Week 7, part 1 (in class) Tuesday May 12th after short lecture 2:30-3:20am
Week 7, part 2 (in lab) during your regular lab session (Wednesday May 13th 9am or 1pm)
Final exam:
Week 10, part 1 (in lab) during your regular lab session (Wednesday June 3rd 9am or 1pm)
Week 11, part 2 (in class) Thursday June 11th 11:30am-1:30pm
Lecture 3 quick review
fprintf
scripts
functions
Lecture 4 – Selection statements and loops
Relational expressions
If/elseif/else
For loop
Today’s lecture
Lecture 3 – review
fprintf: prints numeric values and literal text to the screen
Special Characters:
\n : new line
%% : percent sign (%)
” : apostrophe
Variable specifiers (placeholder for MATLAB variables):
%s : string
%d : integer
%e : exponential notation
%f : floating point number
Output formatting: fprintf(‘%010.3f\n’,pi) → 000003.142
0 indicates that we want leading zeros
10 we want the entire variable to occupy 10 total spaces
.3 means that we want 3 decimals
Output Statements: fprintf
MATLAB file that executes a series of MATLAB commands
Has no input or output
Stored in a text file *.m extension
To create a new script called for example myscript.m
>> edit myscript
To execute a script: enter its name in the command window (no extension), or click on “Run”
>> myscript
When execution completes, the variables remain in the MATLAB workspace.
MATLAB script
MATLAB file that executes a series of MATLAB commands
Takes inputs and outputs (might not be any)
Stored in a text file *.m extension
To create a new function called for example myfunction.m
>> edit myfunction
To execute a function: enter its name in the command window:
>> [output1, output2] = myfunction(arg1,arg2)
Any variables that you create within a function are stored within a workspace specific to that function, which is separate from the base workspace.
MATLAB function
myfunction takes 0 argument and returns 0 output
myfunction takes 1 argument and returns 0 output
myfunction takes 1 argument and returns 1 output
myfunction takes 1 argument and returns 2 outputs
i>Clicker question
function [a,b] = myfunction(str1)
% print input variable to the screen
%
% myfunction.m
% Mathieu Morlighem, Aug 26th 2018
%Use fprintf to print first argument str1
fprintf(‘%s\n’,str1);
%Define two variables
a=1; b=2;
MATLAB function
function [a,b] = myfunction(str1)
% print input variable to the screen
%
% myfunction.m
% Mathieu Morlighem, Aug 26th 2018
%Use fprintf to print first argument str1
fprintf(‘%s\n’,str1);
%Define two variables
a=1; b=2;
myfunction
str1
a
b
Function Name
(same as file name)
Outputs
(returned)
Inputs (arguments)
Lecture 4 – Selection statements
and loops
What we have done so far:
Scripts/Functions have executed all commands in order, no matter what
What we often need:
A piece of code that executes a series of commands, if and only if some condition is met
MATLAB provides several built-in statements that allow for conditional behavior
if/elseif/else
switch
menu
Selection Statements
Relational expressions
New class of variable: Logical
Only two possible values:
1 (true)
0 (false)
Lets look at some sample Boolean expressions
Relational/Boolean Expressions
To make selections, we must be able to determine if a condition is met
Logical expression
Relational expression
Relational/Boolean Expressions
Relational operators in MATLAB
Logical operators in MATLAB
Truth Table for Logical Operators
Examples:
a=3; b=5;
(a<5 && b>7 ) → false
(a<5 || b>7 ) → true
(a==3 && ~(b~=5)) → true
We define the following variables:
alpha = 20; beta = 0;
Is the following statement true or false?
(alpha~=21 || beta==alpha)
True
False
I don’t know
i>Clicker question
We have 2 variables, x and y and
we would like to test whether x and y satisfy:
x needs to be between 0 and 10, AND
y needs to be greater or equal to 5
(x>0 && x<10 && y>=5)
(x>0 && x<10) && (y>5 || y==5)
x>0 && x<10 && ~(y<5)
All of the Above
i>Clicker question
Selection statements
Nearly all programming languages have something similar
Can be entered in the command window
More commonly used in scripts/functions
The If Statement
if condition
action(s)
end
The “if” keyword (“if” is a reserved word in MATLAB)
A logical or relational condition
Warning: Only works for scalars
Actions to be performed if condition(s) is(are) TRUE
Note: The actions should be indented (easier to read)
The “end” keyword that ends the “if” statement
The If Statement
% Define age
age = 12;
%Can you drink a beer?
if age>=21
fprintf(‘your age is %d, you can have a beer!\n’,age);
end
Script
Function
function [] = testage(age)
%Can you drink a beer?
if age>=21
fprintf(‘your age is %d, you can have a beer!\n’,age);
end
What if more than one condition needs to be tested?
Use a nested if-else statement
Can execute different actions depending on if condition is met
More efficient than separate if statements
The If-Else Statement
if condition
action(s)
else
action(s)
end
Actions only executed if condition is TRUE
Actions only executed if condition is FALSE
Do NOT do this!
It may work, but this is poor coding
Each if statement must always be checked
Inefficient
Use nested if-else statements
Poor Usage of If Statements
%Can you drink a beer?
if age>=21
fprintf(‘your age is %d, you can have a beer!\n’,age);
end
if age<21
fprintf('your age is %d, you cannot drink\n',age);
end
Same thing, but using nested if-else statements
More efficient
Better style
Proper Use of Nested If-Else Statement
%Can you drink a beer?
if age>=21
fprintf(‘your age is %d, you can have a beer!\n’,age);
else
fprintf(‘your age is %d, you cannot drink\n’,age);
end
What if multiple conditions need to be tested
Each one results in different actions
Use a nested if-elseif-else statement
MUCH more efficient than separate if statements
Can have as many elseif statements as needed
The If-Elseif-Else Statement
if condition1
action(s)
elseif condition2
action(s)
else
action(s)
end
Only executed if condition1 is true
Only executed if condition1 is FALSE and condition2 is TRUE
Only executed if condition1 and condition2 are BOTH FALSE
More than one of these conditions tested can never be true at the same time: “mutually exclusive”
There is a more elegant and efficient way to code this
Nested if, elseif, else statements
Poor Usage of If Statements
%Test whether you can drink and vote
if age>=21
fprintf(‘You can drink and vote!\n’);
end
if age>=18 && age<21
fprintf('You can vote but cannot drink\n');
end
if age<18
fprintf('You can neither vote, nor drink, sorry...\n');
end
If conditions are mutually exclusive
Use nested if, elseif, if
Nested if statements save space and CPU time
Also, much easier to read
Proper Use of Nested If-Elseif-Else
%Test whether you can drink and vote
if age>=21
fprintf(‘You can drink and vote!\n’);
elseif age>=18 && age<21
fprintf('You can vote but cannot drink\n');
else
fprintf('You can neither vote, nor drink, sorry...\n');
end
For-Loops
Think back to any very difficult quantitative problem that you had to solve in some science class
How long did it take?
How many times did you solve it?
What if you had millions of data points that had to be processed using the same solution?
Only need to solve it once
Tell the computer to repeat n-times
This is the basis for the programming concept called a loop statement
Repeated Tasks are for Computers
Used with a pre-defined number of iterations of the loop variable
The For Loop
for loopVar = range
action(s)
end
The “for” keyword (“for” is a reserved word in MATLAB)
The loop variable, which can be any valid variable name
Traditionally, we use i, j, k, or other single letters
The range or list of values the loopVar will take on
Can be any vector, but typically the colon operator is used
The action(s) that will be performed during each iteration
The “end” keyword. Signifies that the code should go back to beginning of the loop statement if more iterations remain
Lets make a simple for loop to see how everything works
During each iteration
Loop variable i is changed
i is printed using fprintf
Note that the actions must be indented
Makes code easier to read
MATLAB editor does this automatically
First For Loop
The loop variable can change by any increment
Most commonly the increment is 1
The loop iterations can be defined by any vector (not typical)
For Loop Examples
What will we see on the screen?
e = 0
e = 2
e = 8
e = 9
i>Clicker question
for a=0:2:9,
b=a+10;
c=a^2;
d=a+b+c;
e = a;
end
fprintf(‘e = %d \n’,e);
A common use of a loop:
perform an operation on each and every data point, one by one
Doesn’t matter how many data points there are*
For Loop Examples
*Obviously, there is some limit based on your computer’s RAM, CPU speed, and operating system
Lets mimic the behavior of the MATLAB function “sum”
Use a for loop
For Loop Examples
What is the value of A at the end of the program?
A = 1 2
A = 1 2 1 2 0
A = 0 1 2
A = 1 2 0 1 2
I don’t know
i>Clicker question
A = [1 2];
for i=0:2,
A=[A,i];
end
CPU time
Because MATLAB code is interpreted on the fly
(i.e. not compiled into binary exe files)
Each time the loop restarts, the whole loop must be compiled again
For most operations, loops are still fast enough
You can vectorize your code… (See Attaway book, Chap. 5)
Drawbacks of Loops in MATLAB
Each time an entry is added to a matrix, the whole matrix must be recreated
To calculate execution times of scripts, MATLAB provides a timer function
“tic” starts the timer
“toc” stops the timer and prints the total execution time to the command window
Makes most sense to use in scripts/functions
Using tic toc, we can determine if code is efficient
Timing CPU Time in Scripts/Functions
Combining For and If
A loop to find the maximum of a vector
Combining Loops With If Statements
Important Algorithms
We can count the number of times a value meets some criteria. (Example: days hotter than 100 F)
Algorithm #1: COUNTING
%Generate random temperatures for 365 days (range is 0-120 F)
temps = rand(365,1)*120;
% Count the number of days hotter than 100 F
count = 0;
for i=1:length(temps)
if temps(i)>100
count = count+1;
end
end
fprintf(‘%.2f%% days were >100 F\n’,count/length(temps)*100);
For loop
We can count the number of times a value meets some criteria. (Example: days hotter than 100 F)
Algorithm #1: COUNTING
%Generate random temperatures for 365 days (range is 0-120 F)
temps = rand(365,1)*120;
% Count the number of days hotter than 100 F
count = length(find(temps>100));
fprintf(‘%.2f%% days were >100 F\n’,count/length(temps)*100);
Vectorized option (faster)
Extract the temps of hot days only (> 100 F)
Algorithm #2: EXTRACTING
%Generate random temperatures for 365 days (range is 0-120 F)
temps = rand(365,1)*120;
%STEP1: Count the number of days hotter than 100 F
count = 0;
for i=1:length(temps)
if temps(i)>100
count = count+1;
end
end
%STEP2: Allocate vector of days > 100 F (just use 0s for now)
hotdays = zeros(count,1);
%STEP3: Extract temperatures of hot days only
count = 1;
for i=1:length (temps)
if temps(i)>100
hotdays(count) = temps(i);
count = count+1;
end
end
For loop
Extract the temps of hot days only (> 100 F)
Algorithm #2: EXTRACTING
%Generate random temperatures for 365 days (range is 0-120 F)
temps = rand(365,1)*120;
%Find the indices for which temp > 100 F
pos = find(temps>100);
%STEP2: extract all days > 100 F
hotdays = temps(pos);
Vectorized code
Nested for Loops
We can nest “for” loops into inner and outer loops
Often useful for dealing with 2D data
Nested For Loops
for loopVar1 = range1
for loopVar2 = range2
action(s)
end
end
The outer loop. “loopVar1” is iterated over “range1”
The inner loop. “loopVar2” is iterated over “range2”
Lets try a simple nested for loop.
Nested For Loops
Lets try another simple nested for loop.
Nested For Loops
This can be very handy for making a grid of data points (A common task in any quantitative science)
This is NOT the way to do it!
Grid of XY Points?
Lets try a nested for loop
To make a 2D grid we need a nested for loop
Outer loop: x-range; Inner loop: y-range
Grid of XY Points
MATLAB Commands to remember
Lab 4: Loops and selection statements
DUE: at the end of the week (Canvas)
Bring a USB drive
Lecture 5: Descriptive statistics
What’s next?
ESS116: MATLAB Cheat Sheet
1 Path and file operations
cd Change Directory (followed by absolute or relative path of a directory)
cd ../../Shared (relative path)
cd /Users/Shared (absolute path)
pwd display current directory’s absolute path (Path Working Directory)
ls display list of files and directories in the current directory
(can be followed by a path and/or file name pattern with *)
ls ../file*mat
ls *.txt
ls /Users/mmorligh/Desktop/
copyfile copy existing file into a new directory, and/or rename a file
copyfile(‘/Users/Shared/foo.txt’,’.’);
copyfile(‘foo.txt’,’bar.txt’);
mkdir create a directory
mkdir Lab1
2 Fundamental MATLAB classes
double floating point number (1.52, pi, …) → MATLAB’s default type
int8 Integer between -128 and 127 (8 bits, saves memory)
uint8 Unsigned integer between 0 and 255 (used primarily for images)
int16 Integer between -32768 and 32767 (16 bits)
logical true/false
string data type for text (str = ‘This is a string’;)
cell cell array, used by textscan
3 Matrices
Use square [] to create a matrix, and ; to separate rows
A=[1 2 3;4 5 6;7 8 9];
ones, zeros create a matrix full of ones or zeros
A=ones(5,2);
‘ transpose a matrix
B=A’;
length return length of a vector (do not use for matrices)
size returns the size of a matrix
(number of rows then columns, then 3rd dimension if 3D, etc)
[nrows,ncols]=size(A); [nrows,ncols,nlayers]=size(A3D);
linspace and : to create vectors
A=2:3:100;
A=linspace(2,100,10);
find return the linear indices where a condition on the elements of a matrix is met
pos=find(A==−9999);
pos=find(A>100);
Extract the first 10 even columns of a matrix
B=A(:,2:2:20);
Removing elements: use empty brackets
A(:,2)= [];
Concatenate matrices
A=’This is ‘; B=[A ‘an example’];
Replacing elements in a matrix (use either linear or row,col notation)
A(10,3)=5.5;
pos=find(A==−9999);
A(pos)= 0;
Element-by-element operation: use a dot (.) before the operator
A= C.*D;
4 I/O
load loads a MATLAB file (*.mat) into the workspace, or a text file with only numbers
and consistent number of columns
load(‘data.mat’);
data=load(‘data.txt’);
textscan loads a text file into a cell array (as many elements as there are columns in the file)
Use %d for integers, %f for floating point numbers %s for strings
fid = fopen(‘filename’);
data = textscan(fid,’%d %f %s %s’,’Headerlines’,5);
fclose(fid);
%Put first column in A, and second column in B
A = data{1}; B = data{2};
5 fprintf
fprintf print text (and variables) to the screen. First argument is a string with placeholders.
fprintf(‘The radius is %7.2f and A = %d !!\n’,EarthRadius,10);
– Special characters: \n (new line) %% (percent sign) ” (apostrophe)
– Variable specifiers: %s (string) %d (integer) %e (exponential) %f (float)
– %010.3f: leading 0, 10 total spaces, 3 decimals. Ex: 000003.142
6 Visualization
plot displays a list of points (x,y)
plot(x,y,’−r’);
plot(x,y,’r+:’,’MarkerFaceColor’,’g’,’MarkerSize’,5,’LineWidth’,2);
axis controls x and y axes
axis([xmin xmax ymin ymax]);
axis equal tight
legend adds a legend to previously plotted curves
legend(‘First curve’,’second curve’)
figure creates a new figure window
figure(2)
xlabel/ylabel/title control x/y axis labels and plot title
xlabel(‘Distance (km)’);
hold on keep current plot so that whatever follows is plotted on the same plot
subplot divide figure into several subplots
subplot(2,3,1)
histogram make a histogram for a vector
histogram(tmax,20);
histogram(tmax,round(sqrt(length(tmax))));
7 Relational and Logical operators
== equal to, ~= not equal to, > greater than, >= greater than or equal to,
< less than, <= less than or equal to.
&& and, || or, ~ not.
A=( (1>10)|| (3~=4));
8 If/elseif/else and for loops (examples)
Counting algorithm
%Initialize counters
counter1 = 0;
counter2 = 0;
%Go over all of the elements of T and increment counters when a condition is met
for i=1:length(T)
if T(i)>100
counter1 = counter1+1
elseif T(i)<0
counter2 = counter2+1
end
end
fprintf('Found %d days with T>%f, and %d with T<%f\n',counter1,100,counter2,0)
Extracting (after counting!)
%You first need to count how many times T>100 (for example), then: allocate memory
hotdays = zeros(counter1,1);
%Go through T, again, and store temperatures>100 in hotdays
count = 1;
for i=1:length(T)
if T(i)>100
hotdays(count)=T(i);
count = count+1;
end
end
9 Statistic
mean computes mean of a vector
median computes median of a vector
std computes standard deviation
min returns minimum value in a vector
max returns minimum value in a vector
skewness returns skewness
kurtosis returns kurtosis
normcdf/tcdf/chi2cdf cumulative density function for a normal, t and χ2 distributions
norminv/tinv/chi2inv inverse of the cumulative density function
p0 = normcdf(x0,mu,sigma);
x0 = norminv(p0,mu,sigma);
p0 = tcdf(x0,V);
x0 = tinv(p0,V);
10 Polynomials and interpolations
polyval returns the value of a polynomials (represented by its coefficient) for some x
coeff = [3 2.7 1 −5.7];
x=0:0.2:0.6;
y=polyval(coeff,x)
polyfit returns the coefficient of the polynomials that best fit data points
coeff = polyfit(datax,datay,3); %3 means cubic polynomial
interp1 interpolates between data points (spline or linear)
y1=interp1(datax,datay,’linear’);
y2=interp1(datax,datay,’spline’);
11 Image processing
imread loads an image (as a matrix) into the workspace
A=imread(‘image.png’)
image display an image (2D or 3D)
imagesc display a 2D image and scale indices to use all the colors in the color map
colormap set a colormap (only for indexed images)
Indexed Images (2D)
Indexed image, need two matrices:
iMat a 2D matrix with indices
cMap is a nx3 2D matrices with the RGB code for each index
To display this image:
image(iMat);
colormap(cMap);
If the colormap is not consistent with indices, you need to use imagesc(iMat).
True-color image (3D, RGB)
No need to prescribe a colormap:
iMat(:,:,1) Red matrix (between 0–255 if uint8, or 0–1 if double)
iMat(:,:,2) Green matrix
iMat(:,:,3) Blue matrix
To display this image: image(iMat).
12 Miscellaneous
rand returns a random floating point number between 0 and 1
x=rand
x=rand(10,2)
round round input to closest integer
A=round(rand*10);
whos displays list of all variables in MATLAB workspace
tic/toc displays cpu time for a chunk of code
sqrt square root
13 Functions
Calling a function: [output1,output2] = functionname(arg1,arg2);
Function header (top lines of the file that implements this function):
function [output1,output2] = functionname(arg1,arg2)
% H1 line: describe what the function does
ESS116, M. Morlighem, Updated: March 14, 2019