程序代写代做 C Hive html Data Focused Python

Data Focused Python
Homework 2
Mini 3 2020

Due at 9:00 pm on Tuesday, Jan. 28, 2020

You will lose 1 point every 5 minutes after that time

70 points Handling Idiosyncratically Formatted Data

In Python terms, the purpose of this part of the homework is to gain experience with input and output files, variables, decisions, loops, string processing including slices and formatting, conversions string to number or number to string, and the like.

Some data sources are in convenient formats CSV, JSON, HTML, XML, and so forth, and others are mostly unformatted documents, email messages, system and web logs, etc.. There are also idiosyncratically formatted files, with their own strange formats often made up years in the past, before standards like CSV or JSON were invented. You must be able to handle all of these kinds of data sources.

Commodity futures and option contracts of many kinds are traded on NYMEX, owned by CME Group. Each evening of each trading day, sometime between about 6:00 pm and 8:00 pm Central Time, a SPAN Standard Portfolio Analysis of Risk file is posted to ftp:ftp.cmegroup.compubspandatacme containing information about the days trading. For a given day, the name of this file is cme.YYYYMMDD.c.pa2.zip, where YYYYMMDD is the 8digit year, month, and day of the file. Files for months prior to the current month are moved into an archive subdirectory.

Download the zipped SPAN file for Friday, January 17, 2020, cme.20200117.c.pa2.zip. Unzip, then display this SPAN file. You will see that it is an enormous text file with its own unique format, unfortunately not something simple and convenient like CSV or XML or JSON.

The settlement prices in U.S. dollars contained in the SPAN file are used to mark to market each traders account, so that gainslosses can be crediteddebited each day to reduce the risk of counterparty default a trader who has to cover modest losses each day is less likely to default than a trader who has to cover huge losses at the end of a year, say. Your job is to extract these settlement prices, as well as contract expiration dates last trading dates, for one of the globally most heavily traded energy contracts: West Texas Intermediate WTI Crude Oil.

To learn about WTI Crude Oil futures contract details, see: http:www.cmegroup.comtradingenergycrudeoillightsweetcrudecontractspecifications.html

Notice that the CME Globex Product Code is CL; you will need this for scanning the SPAN file. Using other tabs at the top of this web page, you can see current quotes, recent settlements, volume, etc. If you click the Options button, just to the right of the Futures button near the upper left, you will see information about options contracts based on the underlying futures contracts. There are about two dozen different types of option contracts for this underlying; we are interested in the American Options. When you look at the contract specifications, you will discover that its Product Code is LO.

Write a Python program named hw2.1.py that reads cme.20200117.c.pa2 as its input file, and produces CLexpirationsandsettlements.txt as its output file. The output should be in exactly this form:

Futures Contract Contract Futures Options Options
Code Month Type Exp Date Code Exp Date

CL 202003 Fut 20200220
CL 202004 Fut 20200320
and so forth, through contract month 202112
CL 202003 Opt LO 20200214
CL 202004 Opt LO 20200317
and so forth, through contract month 202112

Futures Contract Contract Strike Settlement
Code Month Type Price Price

CL 202003 Fut 58.58
CL 202004 Fut 58.51
and so forth, through contract month 202112
CL 202003 Call 10.50 48.08
CL 202003 Put 10.50 0.01
CL 202003 Call 11.00 47.84
CL 202003 Put 11.00 0.01
and so forth, through contract month 202112

Do not try to create a better output format: it needs to be very easy for us to compare your output to our solution output, and to other students outputs. We will take off points if your output format varies too much from what is shown above. Our output format takes into account the order in which records appear in the SPAN file, so you dont have to remember or accumulate much information as you go. In particular, you do not have to process the contents of cme.20200117.c.pa2 more than once in order to create the table.

Do not include contract months earlier than 202003, or later than 202112.

Since there are many, many strike prices for options on futures contracts, the output file is going to be very long, but not nearly as long as the SPAN file itself.

Fortunately, there is documentation online that describes the contents of CME SPAN files. If you Google for cme span pa2 file format you will find a page named Risk Parameter File Layouts for the Positional Formats SPAN. You will want to look at Type B Records, Expanded Format, and Type 8 Records, Expanded Format, to learn how to obtain the contract name, type, month, expiration date, strike, and settlement prices that you need.

Here are detailed examples of the structure of the Type B and Type 81 records for a different energy product, Natural Gas, for an earlier year.

B NYMNG FUT201810 000000000900000001100030000330000000021643800000
001000020180926NG 00000000 0010000000000000 00 00 010000000000P 00

According to the Type B Expanded documentation athttps:www.cmegroup.comconfluencedisplaypubspanTypeBExpanded

the Record ID record type is B ,
the Exchange Acronym is NYM
the Commodity Code is NG for Natural Gas futures code
the Product Type Code is FUT futures contract contract type
the contract month is 201810 October, 2018 contract month
and the Expiration Settlement Date is 20180926 fut exp date

You need to extract and reformat the Commodity Code, Product Type Code, Contract Month, and Expiration Date for Crude Oil CL records for the top half of the first table. In your output table, these will be the Futures Code, Contract Type display Fut rather than FUT, Contract Month, and Futures Exp Date, respectively

B NYMON OOF201810 201810 002093720900000001100030000330000000021369900
000001000020180925NG M 00000000N02805000010000000000000 00 00 010000000000P 00

The Commodity Code is ON LO for WTI crude options code
the Product Type Code is OOF option on futures contract type
the contract month is 201810 contract month
the expiration date is 20180925 options exp date
The Underlying Commodity Code for this option is NG

This provides what you need to extract and reformat for the bottom half of the first table. These correspond to the Options Code, Contract Type Opt rather than OOF, Contract Month, and Options Exp Date. The Futures Code for ON options is NG; the Futures Code for LO options is CL; you can know this from looking at the CME contract specification.

For the second table, you will need to extract and reformat data from the Type 81 records. Here are examples for Natural Gas.

For the first part of the second table:

81NYMNG NG FUT 201810 000000000000000000036700367003670036700733007330073300000000280500N

The futures code is NG commodityproduct code
the contract month is 201810 fut contract month
the contract type is FUT
and the settlement price is 00000000280500, which for natural gas you need to divide by 100000.0 to get 2.805 natural gas prices are displayed to tenths of cents, unlike crude oil futures prices which are displayed to cents. For WTI crude, you will need to figure out the correct divisor, by comparing the contents of the settlement price field with actual current WTI crude futures contract prices Google is your friend, here.

For the second part of the second table:

81NYMON NG OOFC201810 201810 000275000087001000031400133001230031100558003860031400000000000820N

81NYMON NG OOFP201810 201810 000275000087001000005300233002440005500175003470042000000000000420N

The first record is for a call option Option Right Code C, which is an option to buy a futures contract; the second record is for a put option Option Right Code P, which is an option to sell a futures contract.

The Underlying Commodity Product Code futures code is NG
The Product Type Code is OOF option on futures
The Option Strike Price is 0002750 which you need to divide by 1000.0 to get 2.750,
and the settlement price is 00000000000820 for the Call and
00000000000420 for the Put.

Dividing by 10000.0, you get 0.082 as the price of a Call option, and 0.042 as the price of a Put option. For WTI crude, you will only need to figure out one divisor, not two.

A few more hints:

a Notice that the documentation counts character column positions from 1, whereas in your code you will need to count character positions from 0 for str slices.

b Check the contract specifications to discover the number of decimal places you should display for price U.S. dollars of WTI Crude Oil futures and options contracts.

c Approach the program in stages: first, make sure you can write a program that simply copies the SPAN file to the output file; next, modify your program to copy the type B and type 8 records from the SPAN file to the output file; next, modify your program to copy the type B CL and type 8 CL records; and so forth, making definite steady progress with each revision. As your coding skills improve, you can do two or three or four things in each revision step. Eventually, you will find that you can write dozens of lines of code encompassing many different tasks and goals, and it will work the first time! Or maybe not.

d There are subtypes of the type 8 records: it turns out you can just use the type 8 subtype 1 or just type 81 records, and ignore the type 82 records. For WTI Crude Oil, there are no type 83 records.

e There is a brief description of string formatting in McKinneys book, under 2.3 Python Laguage Basics Scalar Types Strings. Many more examples can be found here: https:docs.python.org3.7tutorialinputoutput.html

f Remember that collaboration is encouraged: in addition to your homework partners, feel free to compare what you are doing with other students, as well. Just make sure you submit your own homework teams code, after whatever discussions you have with others.

g Please feel free to email the TA or myself with any questions you may have.
30 points Lists, Tuples, Sets, Dicts, and Comprehensions

expenses.txt is a small text file describing business expenses. Each line after the header gives the money amount, category, date, and description of an expense.

Create a Python script file named hw2.2.py. In this script, define an empty list named records, then read the lines from expenses.txt and append each line excluding its terminating newline character to the records list. Add this code to display the lines from records:

for line in records:
printline

Confirm that the output is not doublespaced; that is, confirm that each line string in the records list does not include a terminating newline.

Close the open expenses.txt file, then open expenses.txt again. Use list comprehension notation to create and initialize a new list, records2, from the lines in the expenses.txt file, excluding the terminating newline characters. Confirm that you have done this correctly, by adding this code at the end of the script:

printnrecords records2:,
records records2, n

This should display records records2: True.

Close the open expenses.txt file, and open expenses.txt again. Learn about the str classs split function. Fields in the expenses.txt file are separated with colon characters, :, since expense descriptions often contain commas. Use nested tuple comprehension notation to create and initialize a new tuple of tuples, records3, in which each inner tuple has the form amount,category,date,description, and the outer tuple contains one inner tuple for each line of input. We use a tuple of tuples because tuples are immutable, and we want to protect the input data from accidental change.

Add this code to display the tuple of tuples records3:

for tup in records3:
printtup

The output from this loop should look like:

Amount, Category, Date, Description
5.25, supply, 20170222, box of staples

8.98, supply, 20170325, Flair pens

A function is a mapping from arguments to values. A sequence or map dict can also be thought of as a mapping from arguments to values. Creation of sequencesmaps from data can simplify function definitions, or even eliminate the need for some of them. A list or tuple is a mapping from an integer subscript to a value; a set is a mapping from a value to in True or in False; and a dict is a mapping from a key to a value.

Using set comprehension notation with records3, define: catset, the set of categories do not include the string Category in the expense records; and, dateset, the set of dates again, do not include the string Date in the expense records. Add this code to display these two sets:

printCategories:, catset, n
printDates: , dateset, n

Since sets are unordered, your exact output may differ, but the output should look something like:

Categories: supply, meal, travel, util

Dates: 20170222, 20170223, , 20170325

Using dict comprehension notation with records3, define a dict named recnumtorecord in which each entrys key is the record line number, and each entrys value is the tuple representing the data. Hint: use a combination of range and zip along with records3. In recnumtorecord, store the field names as record number 0.

Add this code to display recnumtorecord:

for rn in rangelenrecnumtorecord:
print:3d: .formatrn,
recnumtorecordrn

The output from this loop should look like:

0: Amount, Category, Date, Description
1: 5.25, supply, 20170222, box of staples

22: 212.06, util, 20170308, Duquesne Light

Add this code, using the items iterable, to display recnumtorecord:

for i in recnumtorecord.items:
print:3d: .formati0, i1

Since a dict is unordered, the output will be the same as before but perhaps with the lines in a different order or perhaps not!

Alternatively, using tuple unpacking into two loop variables, you can use for example:

for k, v in recnumtorecord.items:
print:3d: .formatk, v

When finished, put your hw2.1.py and hw2.2.py source code files into a zip archive named TeamNHW2.zip file, where N is your team number, and upload your .zip file to Canvas.