程序代写代做代考 matlab algorithm ESS116

ESS116

ESS 116
Introduction to Data Analysis in Earth Science

Image Credit: NASA
Instructor: Mathieu Morlighem
E-mail: mmorligh@uci.edu (include ESS116 in subject line)
Office Hours: 3218 Croul Hall, Friday 2:00 pm – 3:00 pm
This content is protected and may not be shared uploaded or distributed

Exceptionally:
No office hours this Friday

Monday May 6 3:00-4:15pm
Announcement

Lecture 4 quick review

Lecture 5 – Descriptive statistics
Sampling data
Descriptive Statistics
Histograms
Theoretical distribution
Distributions and probabilities
Sampling distribution (time permitting…)
Today’s lecture

Relational/Boolean expression

Examples:
var1=10; var2=5;

(var1>=5 && var2>7 ) → false
(var1<5 || var2>3 ) → true
(var1==10 && var2~=5) → false

The If-Elseif-Else Statement

if condition1
action(s)
elseif condition2
action(s)
else
action(s)
end

Only executed if condition1 is true

Only executed if condition1 is FALSE and condition2 is TRUE

Only executed if condition1 and condition2 are BOTH FALSE

if condition
action(s)
else
action(s)
end

Actions only executed if condition is TRUE

Actions only executed if condition is FALSE

Used with a pre-defined number of iterations of the loop variable
The For Loop

for loopVar = range
action(s)
end

The “for” keyword

The loop variable, which can be any valid variable name
Traditionally, we use i, j, k, or other single letters

The range or list of values the loopVar will take on
Can be any vector, but typically the colon operator is used

The action(s) that will be performed during each iteration

The “end” keyword. Signifies that the code should go back to beginning of the loop statement if more iterations remain

What is the value of A at the end of the program?
A = 2
A = 4
A = 6
A = 8
I don’t know
i>Clicker question
A = 1;
for i=1:3,
A=A*2;
end

Lecture 4 – Descriptive Statistics

Population and Sample

Data variables in Earth science are usually continuous
But samples of data are inherently discrete
Consequence: most datasets in Earth Sciences have a limited sample size and contain a number of uncertainties
e.g. Air temperature, Precipitations, topography,…

Sampling

World weather stations

Argo array (monitor ocean chemistry)

Population: the actual properties of the real world
Sample: set of values imperfectly representing the population

WARNING: Unfortunately, the term “sample” is employed with different meanings in geology and statistics

Population vs Sample
Geology Statistics
Collection Sample
Sample Observation

Sample size: number of measurements in a sample

We want to predict the characteristics of a larger population from a much smaller sample
Only if the sample is very large does the sample become an OK representation of the population

Parameters: refer to the population (e.g., μ and σ)
Statistics: refer to the sample (e.g., and s)

Population vs Sample

Which is the most accurate representation of π ?

3.2
3.2415926536
i>Clicker question

Which is the most precise representation of π ?

3.2
3.2415926536
i>Clicker question

Accuracy: quality of being close to the true value
Ex: π = 3.1415926536
3.2 is a more accurate representation of π than 3.24159265

Precision:
Mathematics: number of significant digits in a numerical value (measurements or calculation)
Experiments: related to instrumental resolution (ability of replicating measurements)
Ex: π = 3.1415926536
3.2 is a less precise representation of π than 3.24159265
Accuracy vs Precision

While precision is obvious to assess, accuracy is not. To a large extent, what is behind statistics is an effort to evaluate accuracy.
Accuracy vs Precision

Sample Visualization

We divide the interval of variation of the data into bins (generally of the same length)
All observations are assigned to their corresponding bin
Result: count of relative frequency of the classes

Frequency Table

Example: Maximum temperature in SNA Aug 2014 to Aug 2015 (from NOAA: http://www.ncdc.noaa.gov)
Frequency Table

Frequency Table
14 °C
16 °C
18 °C
20 °C
22 °C
15 °C
17 °C
19 °C
21 °C
Bin edges
Bin centers
How many
values
in this bin?
How many
values
in this bin?
How many
values
in this bin?
How many
values
in this bin?

Frequency Table
Bin/Class Count Frequency (%)
1 12-14 °C 0 0
2 14-16 °C 1 0.27
3 16-18 °C 10 2.74
4 18-20 °C 38 10.41
5 20-22 °C 48 13.15
6 22-24 °C 69 18.90
7 24-26 °C 46 12.60
8 26-28 °C 67 18.36
9 28-30 °C 43 11.78
10 30-32 °C 14 3.84
11 32-34 °C 21 5.75
12 34-36 °C 4 1.10
13 36-38 °C 3 0.82
14 38-40 °C 1 0.27

A histogram is a graphical representation of a frequency table
Histograms

Rules for choosing histogram bins

Number of bins large enough to bring out the shape of the data
Not too large to leave empty bins across the histogram
Rule of thumb:
number of bins ≈ number of data values

All data values must be covered
Bin widths should be equal if possible.

Histograms

Effect of bin spacing

Histograms in MATLAB
>> histogram(tmax);
If only one argument is given to “histogram”…
Matlab automatically chooses the bin number and properties
To better control the bins, you can optionally provide a second argument to “histogram”. Either:
A number, to specify a different number of bins
>> histogram(tmax,20);
>> histogram(tmax,round(sqrt(length(tmax))));
Or a N+1-length vector specifying the actual bin edges that are desired (where N is the number of bins)
>> histogram(tmax,12:2:40);

Display each observation xi versus the proportion of the sample that is not larger than xi.
Cumulative Frequency

88% of the temperature measurements are <30˚C MATLAB: use “ecdf” (Empirical Cumulative Density Function) >> [cf,x] = ecdf(tmax);
>> plot(x,cf*100,’+’)
Cumulative Frequency

Descriptive Statistics

We have collected N measurements xi from a specific object (e.g., ocean pH, glacier thickness, etc)

Size of x might be large: we use descriptive statistics to summarize the characteristics of the data:
Median
Mean
Standard Deviation
etc

Statistical description of the data

Outliers: values so markedly different from the rest of the sample that they rise the suspicion that they may be from a different population or that they may be in error,
Outliers

doubts that frequently are hard to clarify. In any sample, outliers are always few, if any.

Outlier

Summary Statistics

The histogram displays graphically the essential properties of univariate data:
Central tendency (or location): “average” position of the whole distribution along the scale (mean, median, mode)
Dispersion: the extent to which the distribution is spread along the scale from the central value (range and standard deviation)
General shape: both the symmetry and the pattern of the frequency distribution (skewness and kurtosis)

Properties of Histograms

Arithmetic Mean (or mean):

Sensitive to outliers (extreme values that may be very different from the majority of the data)

Central Tendency: MEAN

Median: the x-value that is in the middle of the dataset
50% of the observations < median 50% of the observations > median

For a dataset sorted in ascending order:

Central Tendency: MEDIAN

If N is odd
If N is even

After sorting our temperature measurements:

T = [15.2 15.4 16.0 18.0 18.2 19.0 20.8 21.0]

What is the median of this dataset?

16.0
18.0
18.1
18.2
i>Clicker question

Robustness: ability of statistical methods to work well not only under ideal conditions, but in the presence of data problems, mild to moderate departures from assumptions, or both.

Example: in the presence of large errors, the median is a more robust statistic than the mean

Robustness

Mode: the most frequent x-value or the center of the class with the largest number of observations
e.g., no mode, unimodal, bimodal, trimodal, multimodal

Central Tendency: MODE

Central Tendency

Range: difference between the highest and lowest value in the dataset

Defined by two extreme data points: very sensitive to outliers

Dispersion: RANGE

Standard deviation: average deviation of each data point from the mean

Variance: square of the standard deviation

Dispersion

Which of the following distributions has the highest standard deviation?
i>Clicker question

0 100 200 300
0 2 4 6 8 10 12
A
B

Skewness: measure of the asymmetry of the tails of a distribution

Shape: Skewness

Which of the following histograms has a negative skew?
i>Clicker question

A
B

Kurtosis: measure of whether the data are peaked or flat relative to a normal distribution
Shape: Kurtosis

Platykurtic

Kurtosis < 3 Flat Mesokurtic Kurtosis = 3 “normal” Leptokurtic Kurtosis > 3
Peaked

Central Tendency:
Mean (average)
Median (50% higher, 50% lower)
Mode(s) (peak value(s))
Dispersion:
Range (max – min)
Standard deviation (average distance to mean)
Shape:
Skewness (positive: tail to the right, negative: tail to the left)
Kurtosis (<3: flat, >3: peaked)

Know how they relate to visual features on a histogram

Statistical parameters
what you need to remember

Probability Density Function

Probability is a measure of the likelihood that an event, A, may occur. It is commonly denoted by P(A)

0 ≤ P(A) ≤ 1
P(A) = 0, A is an impossible event
P(A) = 1, A will happen with certainty

P(A) is a degree of belief in A, even if no random process is involved nor a count is possible

Probability

Histograms: empirical frequency distribution of our sample.
If we sample the variable sufficiently often and the output ranges are narrow, we obtain a very smooth version of the histogram
A histogram for and an infinitely small bin size will produce a Probability Density function (PDF)
The probability that x is between x1 and x2 is:

Probability Density Functions

Theoretical Distributions

Normal distribution
A.k.a. Gaussian distribution
2 parameters:
Mean, μ
Standard deviation, σ
99.7% of its values lie in μ ±3σ
Theoretical Distributions: NORMAL

Example: μ = 3

Normal Distributions

For a distribution f, the probability that x30, almost identical to a z-distribution
Used in hypothesis testing
Theoretical Distributions: t

χ2 distribution (Chi square):
Number of degrees of freedom: Φ
In this course: Φ = N-1

Properties:
Used in hypothesis testing

Theoretical Distributions: χ2

Normal (μ,σ)
Given x0, find p0
>> p0 = normcdf(x0,mu,sigma);
Given p0, find x0
>> x0 = norminv(p0,mu,sigma);
Z-distribution
>> p0 = normcdf(x0);
>> x0 = norminv(p0);
t-distribution
>> p0 = tcdf(x0,V);
>> x0 = tinv(p0,V);
χ2-distribution
>> p0 = chi2cdf(x0,V);
>> x0 = chi2inv(p0,V);

MATLAB theoretical distributions

p0 = P( x < x0) e.g.: 0.88 = P(x < 1.17) The average concentration of salt in seawater 35 ‰ (standard deviation 3 ‰) What is the maximum salinity you should be able to measure to cover 90% of the possible values? Example 1 Example 1 Distribution of seawater salinity μ = 35 F σ = 3 ppt P(can measure) = P(SClicker question

What would happen to the standard deviation of the sample mean (right) if we increase the number of rolls for all sample (n=20 or n=100) ?

The standard deviation would increase
The standard deviation would decrease
The standard deviation would remain unchanged
Don’t know….

Variability of X is measured by the standard deviation
There might be a “gap” between the sample mean and the population mean μ
Standard Error: variability in the sample mean
Standard Error (SE)

Population standard deviation
Sample size
Decreases as the sample size increases (more precise)

MATLAB Commands to remember

Lab 5: Descriptive statistics
DUE: one week after the lab starts (canvas)
Bring a USB drive
Lecture 6: Hypothesis testing

What’s next?

Open un’l next lecture (0.5% Extra Credit!!)

• What should be improved ?
• What can I do to help you learn or is there something

that isn’t working ?

• Are the quick reviews useful ?
• Are the lab useful ? Should they be longer ?
• Do you like working with real data or would you like

simpler formats ?

• Do you want more MATLAB, more stats, or is this a good
balance ?

Midterm Evaluation

Open un’l next lecture (0.5% Extra Credit!!)
•What should be improved ?
•What can I do to help you learn or is there something
that isn’t working ?
•Are the quick reviews useful ?
•Are the lab useful ? Should they be longer ?
•Do you like working with real data or would you like
simpler formats ?
•Do you want more MATLAB, more stats, or is this a good
balance ?
Midterm Evaluation

q

x = (x1, x2, x3, …, xN )

x̄ =
x1 + x2 + …+ xN

N

=
1

N

NX

i=1

xi

x̃ = x(N+1)/2

x̃ =
x(N/2) + x(N/2)+1

2

�x = x
max

� x
min

s

2 =
1

N � 1

NX

i=1

(xi � x̄)
2

s =

vuut 1
N � 1

NX

i=1

(xi � x̄)
2

skewness =
NX

i=1

(xi � x̄)
3

s

3

kurtosis =
NX

i=1

(xi � x̄)
4

s

4

N ! 1

P (x1 < x < x2) = Z x2 x1 f(x)dx f(x) = 1 � p 2⇡ exp � 1 2 ✓ x� µ � ◆2! P (x  x1) = Z x1 �1 f(x) dx P (x  1.17) = Z 1.17 �1 f(z) dz = 0.88 f(z) = 1 p 2⇡ exp ✓ � z2 2 ◆ Z = X � µ � f(x) = � ✓ �+ 1 2 ◆ � ✓ � 2 ◆ 1p �⇡ 1 ✓ 1 + x 2 � ◆�+1 2 � ! 1 f(x) = 1 2 � 2 � (�/2) x ��2 2 e � x 2 x̄ � x̄ = � p n ESS116: MATLAB Cheat Sheet 1 Path and file operations cd Change Directory (followed by absolute or relative path of a directory) cd ../../Shared (relative path) cd /Users/Shared (absolute path) pwd display current directory’s absolute path (Path Working Directory) ls display list of files and directories in the current directory (can be followed by a path and/or file name pattern with *) ls ../file*mat ls *.txt ls /Users/mmorligh/Desktop/ copyfile copy existing file into a new directory, and/or rename a file copyfile('/Users/Shared/foo.txt','.'); copyfile('foo.txt','bar.txt'); mkdir create a directory mkdir Lab1 2 Fundamental MATLAB classes double floating point number (1.52, pi, ...) → MATLAB’s default type int8 Integer between -128 and 127 (8 bits, saves memory) uint8 Unsigned integer between 0 and 255 (used primarily for images) int16 Integer between -32768 and 32767 (16 bits) logical true/false string data type for text (str = 'This is a string';) cell cell array, used by textscan 3 Matrices Use square [] to create a matrix, and ; to separate rows A=[1 2 3;4 5 6;7 8 9]; ones, zeros create a matrix full of ones or zeros A=ones(5,2); ' transpose a matrix B=A'; length return length of a vector (do not use for matrices) size returns the size of a matrix (number of rows then columns, then 3rd dimension if 3D, etc) [nrows,ncols]=size(A); [nrows,ncols,nlayers]=size(A3D); linspace and : to create vectors A=2:3:100; A=linspace(2,100,10); find return the linear indices where a condition on the elements of a matrix is met pos=find(A==−9999); pos=find(A>100);

Extract the first 10 even columns of a matrix
B=A(:,2:2:20);

Removing elements: use empty brackets
A(:,2)= [];

Concatenate matrices
A=’This is ‘; B=[A ‘an example’];

Replacing elements in a matrix (use either linear or row,col notation)
A(10,3)=5.5;

pos=find(A==−9999);
A(pos)= 0;

Element-by-element operation: use a dot (.) before the operator
A= C.*D;

4 I/O
load loads a MATLAB file (*.mat) into the workspace, or a text file with only numbers

and consistent number of columns
load(‘data.mat’);

data=load(‘data.txt’);

textscan loads a text file into a cell array (as many elements as there are columns in the file)
Use %d for integers, %f for floating point numbers %s for strings
fid = fopen(‘filename’);

data = textscan(fid,’%d %f %s %s’,’Headerlines’,5);

fclose(fid);

%Put first column in A, and second column in B

A = data{1}; B = data{2};

5 fprintf
fprintf print text (and variables) to the screen. First argument is a string with placeholders.

fprintf(‘The radius is %7.2f and A = %d !!\n’,EarthRadius,10);

– Special characters: \n (new line) %% (percent sign) ” (apostrophe)
– Variable specifiers: %s (string) %d (integer) %e (exponential) %f (float)
– %010.3f: leading 0, 10 total spaces, 3 decimals. Ex: 000003.142

6 Visualization
plot displays a list of points (x,y)

plot(x,y,’−r’);
plot(x,y,’r+:’,’MarkerFaceColor’,’g’,’MarkerSize’,5,’LineWidth’,2);

axis controls x and y axes
axis([xmin xmax ymin ymax]);

axis equal tight

legend adds a legend to previously plotted curves
legend(‘First curve’,’second curve’)

figure creates a new figure window
figure(2)

xlabel/ylabel/title control x/y axis labels and plot title
xlabel(‘Distance (km)’);

hold on keep current plot so that whatever follows is plotted on the same plot
subplot divide figure into several subplots

subplot(2,3,1)

histogram make a histogram for a vector
histogram(tmax,20);

histogram(tmax,round(sqrt(length(tmax))));

7 Relational and Logical operators
== equal to, ~= not equal to, > greater than, >= greater than or equal to,
< less than, <= less than or equal to. && and, || or, ~ not. A=( (1>10)|| (3~=4));

8 If/elseif/else and for loops (examples)
Counting algorithm
%Initialize counters
counter1 = 0;
counter2 = 0;
%Go over all of the elements of T and increment counters when a condition is met
for i=1:length(T)

if T(i)>100
counter1 = counter1+1
elseif T(i)<0 counter2 = counter2+1 end end fprintf('Found %d days with T>%f, and %d with T<%f\n',counter1,100,counter2,0) Extracting (after counting!) %You first need to count how many times T>100 (for example), then: allocate memory
hotdays = zeros(counter1,1);
%Go through T, again, and store temperatures>100 in hotdays
count = 1;
for i=1:length(T)

if T(i)>100
hotdays(count)=T(i);
count = count+1;

end
end

9 Statistic
mean computes mean of a vector
median computes median of a vector
std computes standard deviation
min returns minimum value in a vector
max returns minimum value in a vector
skewness returns skewness
kurtosis returns kurtosis

normcdf/tcdf/chi2cdf cumulative density function for a normal, t and χ2 distributions
norminv/tinv/chi2inv inverse of the cumulative density function

p0 = normcdf(x0,mu,sigma);

x0 = norminv(p0,mu,sigma);

p0 = tcdf(x0,V);

x0 = tinv(p0,V);

10 Polynomials and interpolations
polyval returns the value of a polynomials (represented by its coefficient) for some x

coeff = [3 2.7 1 −5.7];
x=0:0.2:0.6;

y=polyval(coeff,x)

polyfit returns the coefficient of the polynomials that best fit data points
coeff = polyfit(datax,datay,3); %3 means cubic polynomial

interp1 interpolates between data points (spline or linear)
y1=interp1(datax,datay,’linear’);

y2=interp1(datax,datay,’spline’);

11 Image processing
imread loads an image (as a matrix) into the workspace

A=imread(‘image.png’)

image display an image (2D or 3D)
imagesc display a 2D image and scale indices to use all the colors in the color map
colormap set a colormap (only for indexed images)

Indexed Images (2D)
Indexed image, need two matrices:

iMat a 2D matrix with indices
cMap is a nx3 2D matrices with the RGB code for each index

To display this image:

image(iMat);

colormap(cMap);

If the colormap is not consistent with indices, you need to use imagesc(iMat).

True-color image (3D, RGB)
No need to prescribe a colormap:

iMat(:,:,1) Red matrix (between 0–255 if uint8, or 0–1 if double)
iMat(:,:,2) Green matrix
iMat(:,:,3) Blue matrix

To display this image: image(iMat).

12 Miscellaneous
rand returns a random floating point number between 0 and 1

x=rand

x=rand(10,2)

round round input to closest integer
A=round(rand*10);

whos displays list of all variables in MATLAB workspace
tic/toc displays cpu time for a chunk of code
sqrt square root

13 Functions
Calling a function: [output1,output2] = functionname(arg1,arg2);

Function header (top lines of the file that implements this function):
function [output1,output2] = functionname(arg1,arg2)

% H1 line: describe what the function does

ESS116, M. Morlighem, Updated: March 14, 2019