请提交一份简洁的书面报告和 一份Python代码的打印件。 python代码将被标记为良好的 编程⻛格(例如,注释、结构/ 缩进、变量名、子例程的使用 等)。所有其他的标记都是基于 报告的,也就是说,你需要在 报告中提供所有问题的答案, 我们不会在你的代码中寻找它 们。请按照你的报告中练习(第 1节)的章节/小节结构。请将您 的报告和电子代码通过黑板作
为一个单一的PDF文件。不要 上传压缩档案。
cipal component mode of North Atlantic/European sea level pressure and investigate its association with European surface climate.
Please hand in a concise written report and a printout of your Python code. The python code will be marked for good programming style (e.g., comments, structure/indentation, variable names, use of subroutines as ap- propriate). All other marks will be based on the report, i.e. you will need to provide answers to all questions in the report, we will not look for them in your code. Please follow the section/subsection structure of the exercise (section 1) in your report. Please hand in your report and code electron- ically through Blackboard as a single PDF document. Do not upload zip archives.
Please do not forget to read the supporting information in sections 2 and 3 before you start. 在开始之前,请不要忘记阅读第2节和第3节中的支持信息。
Due: Monday, 20th April 2020, 12 noon
1 Exercise: PCA definition of the SNAO 1.1 Mean field (7 points)
The file MTMG06 ass3 slp.nc contains the sea level pressure data required for this analysis in NetCDF format. Load the data into a python array. What is the size of the array dimensions and what variables do the dimen- sions stand for? Familiarise yourself with the data: Where and at which
resolution are the data available in space and time, and what are the units?
文件MTMG06 ass3 slp。nc包含本分析所需的NetCDF格式的海平面压力数据。将数据加载到python数组中。
数组的维数是多少?代表什么变量?熟悉数据:数据在空间和时间中的位置和分辨率,以及单位是什么? 1
MTMG06 Assignment 3: The Summer North Atlantic Oscillation (SNAO)
夏季北大⻄洋振荡
Reinhard Schiemann
National Centre for Atmospheric Science, Department of Meteorology, University of Reading
March 18, 2020
在这次作业中,你将把SNAO[1]定义为北大⻄洋/欧洲海平面气压的主要的prin- cipal分量模式,并研究它与欧洲表面气候的关系。 In this assignment, you will define the SNAO [1] as the leading prin-
Calculate the climatological-mean pressure field and make a contour plot of this field over a map of the North Atlantic/Europe.
计算气候平均气压场,并在北大⻄洋/欧洲地图上绘制该气压场的等高线图。
python hints: particularly sections 3.1, 3.2, 3.3
1.2 Data matrix and covariance matrix (11 points) 数据矩阵和协方差矩阵
Principal component analysis (PCA) will be conducted using the variance-
covariance matrix of the data. First, create the data matrix X from the
array you loaded in section 1.1. What dimensions does it have, and what
are the rows and columns? Now calculate the covariance matrix S from X.
What are the dimensions and properties of S? Remember that the pressure
data are given on a regular latitude-longitude grid. Have you scaled X (and
S) accordingly? How? 主成分分析(PCA)将使用数据的方差-协方差矩阵进行。首先,从1.1节中加载的数组创建数据矩阵X。它的维数是什么,行和列是什么?现在从x中
计算协方差矩阵S。S的维数和性质是什么?请记住,压力数据是在一个常规的经纬度网格上给出的。你对X(和S)进行了相应的缩放了吗?如何? python hints: particularly section 3.2
1.3 Principal component loadings (7 points) 主成分载荷
Conduct PCA by determining the eigenvalues and eigenvectors of S. Make
contour plots to show the four leading principal component (PC) loadings.
Can you think of a way to scale these loadings such that they have the
physical units of pressure? (Hint: Complete section 1.4 first if you do not
have a good idea how to scale). We define the SNAO as the first PC mode.
Does the loading for this mode resemble that of the winter NAO that you
1.4 Principal component scores (9 points) 主成分得分
1.4.1 SNAO score time series SNAO得分时间序列
Calculate the PC scores by projecting the data onto the PC loadings. This can be achieved by a single matrix multiplication: U = XE, where E is the matrix whose columns contain the PC loadings. What are the dimensions of U, and which elements of U contain the PC scores for the kth PCA mode? Show the time series of principial component scores for the leading (SNAO) mode. Are there any obvious long-term trends? (No formal test is required here.)
通过将数据投影到PC负载来计算PC分数。这可以通过一个矩阵乘法实现:U = XE,其中E是列包含PC负载的矩阵。U的维数是多少? U的哪些元 素包含第k个PCA模式的PC分数?显示领先(SNAO)模式的主要成分得分的时间序列。有没有什么明显的⻓期趋势?(这里不需要正式的测试。)
2
know from your meteorology classes or the literature?
通过确定s的特征值和特征向量进行主成分分析,绘制等高线图来表示四种主要的主成分(PC)载荷。你能想出一种方法来衡量这些负荷,使它们有压力的物理单位 吗?(提示:如果您不知道如何缩放,请先完成1.4节)。我们将SNAO定义为第一种PC模式。这个模式的负载是否类似于你从气象学课程或文献中了解到的冬季NAO ?
python hints: particularly section 3.4
方差-协方差矩阵及其特征值 和特征向量是从有限数据样 本中计算出来的,因此受到 sam- pling不确定性的影响。 在一些比较宽松的假设条件 下(多元高斯数据,大量的独 立观测值N), S的第k个特征 值的抽样分布可以表示为
1.5.2 Sampling uncertainty under asymtoptic assumptions 非对称假设下的采样不确定性
The variance-covariance matrix S and its eigenvalues and eigenvectors are calculated from a finite data sample and are therefore subject to sam- pling uncertainty. Under some fairly generous assumptions (multivariate- Gaussian data, large number N of independent observations), the sampling distribution for the kth eigenvalue of S can be shown to be
ˆk ⇠ N ✓ k, 2 2k◆ , (1) N
and is approximately independent of that of all other eigenvalues. Equiva-
1.4.2 Hypothetical pressure fields 假设压力领域
Given all PC loadings, the complete set of PC scores for a given day con-
tains all the information needed to construct the pressure field on that day.
Consider the following hypothetical situations:
给定所有的PC负荷,给定一天的PC分数的完整集包含了当天构建压力场所需的所有信息。考虑以下假设情况: 1. a day for which all PC scores are equal to zero,
这一天,所有的电脑分数都等于零,
2. a day for which the SNAO score is equal to and all other scores are equal to zero,
一天的SNAO得分=σ和所有其他分数都等于零
3. a day for which the SNAO score is equal to and all other scores
are equal to zero,
一天的SNAO得分=-σ和所有其他分数都等于零,
where is the standard deviation of the SNAO score time series. What is
the pressure field for these three situations? Make the corresponding contour
plots.
σ是SNAO评分时间序列的标准差。这三种情况的压强场是什么?制作相应的等值线图。
python hints: particularly sections 3.2, 3.3
1.5 Scree plot and sampling uncertainty (9 points)
1.5.1 Scree plot
Make a scree plot including the first 12 PC modes. Also show the individ-
ually and cumulatively explained variance fraction (you can tabulate these
碎石图和采样不确定度
values or include them in the plot).
制作一个包括前12个PC模式在内的剧情。还显示独立和累积解释的方差分数(您可以将这些值制成表格或包含在图中)。
lently, 近似独立于所有其它特征s值。
N ˆk k
⇠ N (0, 1) , (2) 3
2 k
!
在2007年7月20日做一个压 力分布的等高线图,放大到一 个地区,包括不列颠群岛和周 围海域和附近的大陆。你应该 能看到英格兰东南部和英吉利 海峡上空的低气压。这个系统 在英格兰和威尔士造成了大量 的降雨和大面积的洪水。
你相信你计算的置信区间吗?你认为最关键的假设是什么?
1.6 Principal component filtering (9 points)
主成分筛选
Make a contour plot of the pressure distribution on 20th July 2007 zooming in on a region including the British Isles and the sourrounding seas and near Continent. You should see a depression over the southeast of England and the Channel. This system caused large amounts of rainfall and widespread flooding in England and Wales [3].
Approximate the sea level pressure field on this day by projecting the
data onto a truncated principal-component basis where the leading (i) 5, (ii)
20 and (iii) 200 eigenvectors are retained. Make a contour plot for all three
cases. Is the depression captured? 将数据投影到一个截断的主成分基上,保留领先的(i) 5、(ii) 20和(iii) 200个特征向量,
从而近似这一天的海平面压力场。为这三种情况绘制等高线图。大萧条被捕捉到了吗? 1.7 Composites and multiple testing 复合材料和多重测试
1.7.1 Composites (13 points)
左侧的量服从标准正态分布(高斯分布),均值为0,方差为1。λk(真正的)人口提取和ˆλk样本估计,N是抽样的数量乘以。 i.e. the quantity on the left-hand side follows a standard normal (Gaussian)
distribution with mean 0 and variance 1. k is the (true) population eigen- value and ˆk is the sample estimate, N is the number of sampling times.
Use this result to estimate 95% confidence intervals for the first 12 eigen-
values and add them to your scree plot. Is the first eigenvalue well separated
from the second? Do you trust the confidence intervals you have calculated?
What do you think is the most critical assumption?
使用这个结果来估计前12个特征值的95%置信区间,并将它们添加到您的碎石图中。第一个特征值与第二个特征值分离得好吗?
复合是研究两个气候变量之间关 Composites are a simple means to investiagte the relationship between two
系的一种简单方法。文件 MTMG06 ass3档案。nc包含每日
climate variables. The file MTMG06 ass3 precip.nc contains daily European land precipitation data. Calculate the precipitation composite, i.e. the mean
欧洲陆地降水数据。计算SNAO评
分为正的所有天的降水组合,即 precipitation, for all days when the SNAO score is positive. Also calculate
平均降水。同时计算SNAO负分数
的合成。绘制SNAO+和SNAO-复 the composite for negative SNAO scores. Make contour plots of the SNAO+
and SNAO- composites, and of the di↵erence SNAO+ – SNAO-.
合材料的等高线图,以及SNAO+ – Choose a suitable statistical test to determine if the SNAO+ and SNAO-
SNAO-的差异。
precipitation are significantly di↵erent from one another. Conduct the test
for each land grid box with data. Show areas where the composite di↵erence
is significant at the 95% level in your contour plots, for example by stippling
these areas. 选择合适的统计检验来确定SNAO+和SNAO-降水是否存在显著性差异。对每个带数据的土地方格进行测试。
在您的等高线图中,通过点画这些区域来显示在95%水平上复合差异显著的区域。
Carry out the same analysis for the surface temperature data in the file
MTMG06 ass3 temp.nc and also create the three contour plots. 对MTMG06 ass3 temp.nc文件中的地表温度数据进行相同的分析,并创建三个等高线图。
4
的,即温度与SNAO之间不存在关
系?
relationship between temperature and SNAO?
In reality, of course, the tests for di↵erent grid boxes are not independent.
Can you outline a method to test if there is a significant relationship between
the SNAO and the temperature field as a whole? (You do not need to do it,
当然,在现实中,不同网格盒的测试并不是独立的。你能概括出一种方法来测试SNAO和整体温度场之间是否有显著的关系吗? (你不需要这样做,你可能有任何想法的草图就足够了。)
2 Data files
• MTMG06 ass3 slp.nc:
Daily July/August sea level pressure in the North Atlantic and Europe
in hPa. Pressure variable is msl with coordinates time, lon, lat.
北大⻄洋和欧洲hPa七月/八月的每日海平面压力。压力变量为msl,坐标为时间,经度,纬度。数据期为1900年7月1日至2010年8月31日。 • MTMG06 ass3 precip.nc:
1.7.2 Multiple testing (6 points)
用于测试复合差异是否显著的带 What is the total number of grid boxes with temperature data you have used 有温度数据的网格盒的总数是多 for testing whether the composite di↵erence is significant? How many of
少?这些方框中有多少产生了重
要/有意义的结果?假设不同格子these boxes yield a significant/insignifiant result? Assuming the significance
的显著性检验是独立的,那么有 tests for the di↵erent grid boxes are independent, how many significant 多少显著性检验结果是完全随机 test results would you expect solely by chance, i.e. even if there was no
sketching any idea you may have is enough.)
Coding style: 6 points Total: 77 points
Data period is 01/07/1900 – 31/08/2010.
Daily July/August precipitation in Europe in mm per day. Precip-
itation variable is precip with coordinates time, lon, lat. Data
7月/ 8月欧洲每日降水量(毫米)。降雨-降雨变量是降雨与坐标时间,经度,纬度。数据期为1950年7月1日至2013年8月31日。 • MTMG06 ass3 temp.nc:
Daily July/August 2m temperature in Europe in degrees Celsius. Tem- perature variable is temp with coordinates time, lon, lat. Data period is 01/07/1950 – 31/08/2013.
7月/ 8月欧洲每日200万摄氏度的高温。温度变量为温度,坐标为时间、经度、纬度。数据期为1950年7月1日至2013年8月31日。 Python coding hints
period is 01/07/1950 – 31/08/2013.
3
Below some code snippets that you may find useful to adapt in your own program. 下面是一些代码片段,您可以在自己的程序中使用它们。
5
NetCDF是在大气科学中经常使
用的一种二进制格式。这个示
例向您展示了如何从提供的
NetCDF文件中读取海平面压力
数据。变量类型numpy。ndarray
是为压力数据和坐标轴创建
的。函数num2date用于将“原始”
时间轴数据转换为python能够理
3.1 Read NetCDF data from file 从文件中读取NetCDF数据
NetCDF is a binary format used frequently in the atmospheric sciences.
This example shows you how to read the sea level pressure data from the NetCDF file provided. Variables of type numpy.ndarray are created for the pressure data and the coordinate axes. The function num2date is used to convert the “raw” time axis data into a time format python understands.
# a range of useful packages
# they will be assumed to be loaded in subsequent examples
import matplotlib . pyplot as plt import numpy as np
# open the NetCDF file
input file = ’./MTMG06 ass3 slp.nc’ nc = NetCDFFile( input file )
# the NetCDF variable name,
# here ‘‘msl’’ for mean sea level (pressure)
ncvar = ’msl’
# longitude and latitude values
lons = nc.variables[’lon’][:] lats = nc.variables[’lat’][:]
# array with pressure data
a p = nc.variables[ncvar][:]
# read time axis
Similarly for the precipitation and temperature data (we do not have these data for the early part of the period): 同样,降水和温度数据(我们在前期没有这些数据):
解的时间格式。
1 2 3 4 5 6 7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1
2
import numpy.ma as ma
import datetime as dt
from netCDF4 import num2date , date2num , Dataset as
NetCDFFile
from mpl toolkits . basemap import Basemap import scipy . stats
raw times = nc.variables[’time’][:] times = num2date( raw times ,
nc . c l o s e ( )
units=nc. variables [ ’time ’ ]. units , calendar=nc. variables [ ’time ’ ]. calendar)
p ffn = ’./MTMG06 ass3 precip.nc’ p nc = NetCDFFile( p ffn)
6
p lons = p nc.variables[’lon’][:]; n p lon = len( p lons) p lats = p nc.variables[’lat’][:]; n p lat = len( p lats) a pr = p nc.variables[’precip’][:]
p rtimes = p nc.variables[’time’][:]
p times = num2date( p rtimes ,
3
4 5 6 7 8 9
10 11 12 13
降水和温度数据将作为numpy.ma.core返回。带有组件数据(包含实际数据)和掩码的MaskedArray——指示变量,告诉我们在时间/空间中的每个点是否有有效的数据。 3.2 Working with arrays使用数组
We recommend that you work with the arrays provided by the numpy pack-
age. See here for an introduction on how to work with them:
http://wiki.scipy.org/Tentative_NumPy_Tutorial
我们建议您使用numpy pack- age提供的数组。有关如何使用它们的介绍,请参⻅这里:http://wiki.scipy.org/Tentative_NumPy_Tutorial Some examples/notes:
• calculate the mean field averaging over the first array coordinate (index
units=p nc.variables[’time’].units, calendar=p nc.variables[’time’].
calendar ) pnc.close()
# indices of overlap between precip and msl data
The precipitation and temperature data will be returned as a
numpy.ma.core.MaskedArray with components data (containing the actual
data) and mask – an indicator variable telling us whether there is valid data
pr ii = np.where( [t in p times for t in times])[0]
从每个数组元素中减去平均 值。数组a p和mu具有不同的 维数,由于Python的一个概 念称为广播(请参阅教程), 所以采用不同的方式工作。
for each point in time/space.
计算第一个数组坐标上的平均场(索引0) 1 mu = a p.mean( axis=0)
• Subtract the mean from each array element. The arrays a p and mu have di↵erent dimensions, taking the di↵erence works anyway because of a Python concept called broadcasting (see tutorial).
1 apc=ap mu
• Simple arithmetic functions need to be preceeded by the package name,
e.g. 简单的算术函数需要前面加上包名,
1 # import numpy package
2 import numpy as np
3 …
4 sclats = np.sqrt( np.cos( lats ⇤ np.pi / 180))
7
0)
种方法是使用Basemap包。你可 以修改这里提供的例子:http:// matplotlib.org/basemap/users/ examples。超文本标记语言互联 网上有更多的文档。本例绘制 了定义在经纬度上的阵列mu的 等高线图:
1 2 3
4 5 6 7 8 9
10 11 12 13 14 15
16
17
• Other functions you may find useful to construct the data matrix are numpy.swapaxes and numpy.resize. 对于构造数据矩阵,您可能会发现其他有用的函数是numpy. swapaxes和numpy.resize。
• You can use numpy.cov to calculate the covariance matrix from the data matrix. Make sure you set the rowvar argument correctly.
您可以使用numpy。cov从数据矩阵中计算协方差矩阵。确保正确设置了rowvar参数。
• numpy.dot( a, b) calculates the matrix product if a and b are 2D
arrays, and the scalar product if a and b are vectors. numpy。点(a, b)计算矩阵乘积,如果a和b是二维数组,则计算标量积,如果a和b是向量。
• Determine the array index corresponding to a particular day: 确定特定日期对应的数组索引:
1 2 3 4 5
在地图上绘制数据
You will need to make contour plots on a geographical map throughout this assignment. One way is to use the Basemap package. You can adapt the ex- amples provided here: http://matplotlib.org/basemap/users/examples. html. There is a lot more documentation on the internet. This example draws a contour map of the array mu defined at longitudes lons and lati- tudes lats:
import datetime as dt
…
d flood = dt.datetime( 2007, 7, 20, 12) fl i = np.where( times == d flood)[0][0] …
在整个作业中,你需要在一张 3.3 地理地图上画出等高线图。一
Plotting data on maps
fig mu = plt.figure()
ax mu = fig mu.add subplot( 111)
axmu.set title( ’MeanMSLP field (hPa; JUL/AUG, 1900 2010,
daily)’)
mmu = Basemap( width=9000000,height=5500000, resolution=’l’, projection=’laea’,
lat ts=50, lat 0=50, lon 0= 10.) m mu. drawcoastlines ()
mmu.drawparallels( np.arange( 25, 75, 10)) m mu. drawmeridians ( np . arange ( 60, 60 , 10) )
grid = np.meshgrid( lons , lats)
x, y=mmu( ⇤grid)
clevs = np.arange( 1002, 1024, 3)
cs =mmu.contourf( x, y, mu, clevs, extend=’both’, cmap=’
RdYlBu r’)
cbar = mmu.colorbar( cs, location=’bottom’)
8
18
1 2
3 4 5 6
7 8 9
10 11 12 13 14 15 16
1
fig mu.show()
Here an example showing how to stipple areas on an existing contour
map: 这里的一个例子显示如何点画区域在现有的等高线地图:
…
# contour levels , here for precipiation composite difference
dp clevs = [ 2, 1, 0.5,0,0.5,1,2]
# first contour plot showing composite difference
cscd=mcd.contourf( px, py, prpmean prnmean,
dp clevs, extend=’both’, cmap=’BrBG’
)
cbarcd=mcd.colorbar( cscd, location=’right’)
# contour levels for the p values of test results
pv levs = [0 ,0.05 ,1]
# overlay second contour plot to show hatched areas # with test result contained in pr pvalues
mcd.contourf( px, py, prpvalues, pvlevs, extend=’neither’, colors=’none’,
…
3.4
Eigenvalue decomposition
特征值分解
hatches=[’.’, None])
计算矩阵S的特征值分解: lam contains the eigenvalues and the columns of E contain the eigenvectors.
lam包含特征值,E的列包含特征向量。
3.5 Hypothesis testing 假设检验
The package scipy.stats contains a range of functions for statistical hy- pothesis tests. See documentation.
包scipy。stats包含一系列用于统计hy- pothesis测试的函数。⻅文档。 References
[1] Folland, C. K., J. Knight, H. W. Linderholm, D. Fereday, S In- eson, and J. W. Hurrell, 2009: The Summer North Atlantic Os- cillation: Past, Present, and Future J. Climate, 22, 1082–1103, doi:10.1175/2008JCLI2459.1
9
This calculates the eigenvalue decomposition of the matrix S: lam, E=np.linalg.eig( S)
[2] Daniel S. Wilks. Statistical Methods in the Atmospheric Sciences. Else- vier, Academic Press, 2nd edition, 2006.
[3] https://www.metoffice.gov.uk/about-us/who/how/case-studies/ summer-2007
10