CS代考 COMP2420/COMP6420 – Introduction to Data Management,

Question Set 1-checkpoint

COMP2420/COMP6420 – Introduction to Data Management,
Analysis and Security

Copyright By PowCoder代写 加微信 powcoder

Mid-Semester Exam (Sample 2)

Instructions¶
Maximum Marks 100
Weightage 18% of the Total Course Grade
Duration 15 min Reading + 90 min Typing
Permitted Material This is an open book exam. Any course
or online material can be used

There are four questions. All answers are to be submitted via gitlab before the end of the exam time period.
Save your changes frequently, so that you do not lose your work! Do not change the names of the directories or of the files.
You can import any additional Python modules you may need for your analysis in the first code block. DO NOT try to install any modules other than those present in the Anaconda distribution.
For all coding questions please write your code after the comment YOUR CODE HERE.
In the process of testing your code, you can insert more cells or use print statements for debugging, but when submitting your file remember to remove these cells and calls respectively.
You will be marked on correctness and readability of your code/analysis/explanation. If your marker can’t understand your code/analysis/explanation, then marks may be deducted.

# Feel free to import other modules, provided they are a part of the standard conda distribution.
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from scipy import stats
from itertools import combinations
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use(‘seaborn’)
%matplotlib inline

import warnings
warnings.filterwarnings(“ignore”)

Question 1: Short Answer [10 marks]¶
Answer the following questions in the raw cell left below the question.

1.0) Consider the process of creating a test/train split for a given data set. When creating the test set, is it better to take a random sample of rows from the data set or a section of contiguous rows from the start/end of the data set?¶
Be sure to justify your answer with an explanation.

1.1) Explain the difference between Python comments begin with the # symbol or the ”’ symbols. Provide an example of each.¶

1.2) Consider the following graphs. For each graph, state the problem with how the data is represented and an alternative plotting method that would better convey the information.¶
[4 marks: 2 per graph]

Question 2: Data Analysis & Visualisation [40 marks]¶
All the COMP2420 tutors have been enjoying the recent popular Netflix series “The Last Dance”, a basketball documentary. Due to this, many disagreements and discussions have been had around basketball in recent tutor meetings. Your task for the this section is to use statistics taken from the site Basketball reference to resolve some of these disagreements.

The below table is a description of the dataset:

Field Description
player_id Unique identifier of a player
season_id Season year the statistics are from
player_age Age of the player during that season
min Total number of minutes spent on the court
fg_pct Percentage of field goals made
fg3_pct Percentage of 3 points shots made
ft_pct Percentage of free throws made
reb Total number of rebounds
ast Total number of assists
stl Total number of steals
blk Total number of shots blocked
tov Total number of turnovers
pts Totl number of points scored

The questions are as follows:

2.0) Import the Data¶
The data file is called player_data.csv in the data directory.
The resulting variable containing the dataframe should be called player_data for testing purposes.
Do not use player_id as an index. Instead, allow for pandas to create a default index.
Print out the first 5 rows of the data set along with the names of the columns.

# YOUR CODE HERE
player_data = pd.read_csv(“data/player_data.csv”)
player_data.dropna()
player_data.head()

player_id season_id player_age min fg_pct fg3_pct ft_pct reb ast stl blk tov pts
0 147 1997-98 25 1706.0 0.478 0.342 0.728 195.0 155 56.0 14.0 132.0 771
1 147 1998-99 26 1238.0 0.403 0.262 0.791 154.0 93 50.0 15.0 72.0 542
2 147 1999-00 27 2978.0 0.471 0.393 0.827 387.0 320 84.0 49.0 188.0 1457
3 147 2000-01 28 2943.0 0.457 0.339 0.828 359.0 435 65.0 43.0 211.0 1478
4 147 2001-02 29 3156.0 0.455 0.362 0.839 373.0 355 78.0 45.0 201.0 1696

2.1) Good Ol’ Days¶
Each player is listed along with the year of the season they played in. A common sentiment amongst older tutors is that players who played before 1990 are better than those who played after 1990.

Visualise the average number of ast and tov for players pre and post the 1990 season to detemine who is correct.

Note: For the category ast a higher value is better however for tov a lower value is better.
[10 marks]

# YOUR CODE HERE
pre1990_rec = player_data[player_data[‘season_id’] < '1990'] pre1990_player_id = pre1990_rec.groupby('player_id').mean().index pre1990_player = player_data[player_data['player_id'].isin(pre1990_player_id)] post1990_player = player_data[~player_data['player_id'].isin(pre1990_player_id)] pre1990_perf = pre1990_player.mean()[["ast","tov"]] post1990_perf = post1990_player.mean()[["ast","tov"]] plt.bar(["ast","tov"],[pre1990_perf["ast"],pre1990_perf["tov"]],align = 'edge', width = -0.2) plt.bar(["ast","tov"],[post1990_perf["ast"],post1990_perf["tov"]],align = 'edge', width = 0.2)

Written answer here

2.2) The Worm¶
Dennis ‘The Worm’ Rodman (player_id: 23) was a very popular player in the 90’s and was known as a very talented rebounder. In more than 10 seasons in his career, he had more total rebounds than points, an incredible feat.

How many players have also gotten more reb than pts for at least 10 of the seasons they have played?

[10 marks]

# YOUR CODE HERE
reb_pts = player_data[player_data[‘reb’]>player_data[‘pts’]]
reb_pts = np.array(reb_pts.groupby(‘player_id’).size())
for elem in np.nditer(reb_pts):
if elem > 10:
print(count)

# player_data[“reb>pts”] = 0
# player_data.loc[player_data[“reb”]>player_data[“pts”],”reb>pts”] = 1

# target = player_data.groupby(“player_id”).sum()
# len(target[target[“reb>pts”]>=10][“reb>pts”])

Written answer here

2.3) Hogging the Court¶
Players with a free throw percentage ft_pct > 80% get more minutes min on court than those who shoot under 80% on free throws. – Skip Bayless (probably)

This is a very interesting claim and the answer is not immediately obvious from looking at the data. We would however like to fact check this to ensure no misinformation is being spread around the basketball statics community. To test this claim, perform a Hypothesis Test (T-test) for the above statement, stating your hypotheses, the results, and the final acceptance or rejection statements.

[15 marks]

H0: Players with a free throw percentage ft_pct > 80% get the more minutes min on court than those who shoot under 80% on free throws.
HA: Players with a free throw percentage ft_pct > 80% get less than or equal minutes min on court than those who shoot under 80% on free throws.

# YOUR CODE HERE
players1 = player_data[player_data[“ft_pct”] > 0.8]
players1.dropna(inplace = True)
players2 = player_data[player_data[“ft_pct”] <= 0.8] players2.dropna(inplace = True) print(players1.mean()["min"]) print(players2.mean()["min"]) t,p = stats.ttest_ind(players1["min"],players2["min"]) print("p-value is: ", p/2) 1355.1180217937972 1061.4892564224226 p-value is: 1.0496958809138981e-98 Written answer here Question 3: Classification [30 Marks]¶ Afzal has come into a large sum of money and is entering a new Canberra team into the Australian 'NBL' Basketball League. He has decided to take a statistically sound approach to choosing players for his new team. He wants to use the above data to determine which attributes of a given player he should be looking for when selecting new talent and we are going to help him out! The follow questions all use the same basketball statistics dataset as question 2: 3.0) Import the Data, Prepare for Classification¶ Import the data from the player_data.csv file in the data directory to a new variable classification_player_data. This is both for testing purposes and in case any changes were made to the dataframe in the previous question. Afzal likes fast paced games where lots of points are scored so we are going to divide our players into three categories: Category Number of points Low scoring players pts <= 200 Mid scoring players pts > 200 and
pts <= 800 High scoring players pts > 800

Your task is to split the data into these three categories, ensuring that category each player falls into is based on that single players total pts in a single season. Note that this means some players may fall into different catergories in different seasons.

Hint: It may be useful to use numerical categories instead of the string names.
Hint: Make sure to handle any NaN or null values within the data.

# YOUR CODE HERE
classification_player_data = pd.read_csv(“data/player_data.csv”)
classification_player_data.dropna()
classification_player_data
mean_points = classification_player_data[[“player_id”,”player_age”,”fg_pct”,”ast”,”blk”,”tov”,”pts”]].groupby([“player_id”,”player_age”]).groups.keys()
mean_points

# mean_points

dict_keys([(2, 23), (2, 24), (2, 25), (2, 26), (2, 27), (2, 28), (2, 29), (2, 30), (2, 31), (2, 32), (2, 33), (2, 34), (2, 35), (2, 36), (3, 23), (3, 24), (3, 25), (3, 26), (3, 27), (3, 28), (3, 29), (3, 30), (3, 31), (3, 32), (3, 33), (3, 34), (3, 35), (3, 36), (3, 37), (7, 23), (7, 24), (7, 25), (7, 26), (7, 27), (7, 28), (7, 29), (7, 30), (7, 31), (7, 32), (7, 33), (7, 34), (7, 35), (7, 36), (7, 37), (7, 38), (7, 39), (7, 40), (9, 22), (9, 23), (9, 24), (9, 25), (9, 26), (9, 27), (9, 28), (9, 29), (9, 30), (9, 31), (9, 32), (9, 33), (9, 34), (9, 35), (12, 24), (12, 26), (12, 29), (15, 24), (15, 25), (15, 26), (15, 27), (15, 28), (15, 29), (15, 30), (15, 31), (15, 32), (15, 33), (15, 34), (15, 35), (15, 36), (15, 37), (17, 22), (17, 23), (17, 24), (17, 25), (17, 26), (17, 27), (17, 28), (17, 29), (17, 30), (17, 31), (17, 32), (17, 33), (17, 34), (17, 35), (17, 36), (21, 24), (21, 25), (21, 26), (21, 27), (21, 28), (21, 29), (21, 30), (21, 31), (21, 32), (21, 33), (21, 34), (22, 22), (22, 23), (22, 24), (22, 25), (22, 26), (22, 27), (22, 28), (22, 29), (22, 30), (22, 31), (22, 32), (22, 33), (23, 26), (23, 27), (23, 28), (23, 29), (23, 30), (23, 31), (23, 32), (23, 33), (23, 34), (23, 35), (23, 36), (23, 37), (23, 38), (23, 39), (24, 24), (24, 25), (24, 26), (26, 23), (26, 24), (26, 25), (26, 26), (26, 27), (26, 28), (26, 29), (26, 30), (26, 31), (26, 32), (28, 23), (28, 24), (28, 25), (28, 26), (28, 27), (28, 28), (28, 29), (28, 30), (28, 31), (28, 32), (28, 33), (28, 34), (29, 22), (29, 23), (29, 24), (29, 25), (29, 26), (29, 27), (29, 28), (30, 24), (30, 26), (30, 27), (31, 23), (31, 24), (31, 25), (31, 26), (31, 27), (31, 28), (31, 30), (32, 22), (32, 23), (32, 24), (32, 25), (35, 24), (35, 25), (35, 26), (35, 28), (35, 29), (36, 26), (36, 27), (36, 28), (36, 29), (36, 31), (36, 32), (36, 33), (37, 23), (37, 24), (37, 25), (37, 26), (37, 27), (38, 23), (38, 24), (38, 25), (38, 26), (38, 27), (38, 28), (38, 29), (38, 30), (38, 31), (38, 32), (38, 33), (41, 23), (41, 24), (41, 25), (41, 26), (41, 27), (41, 28), (42, 23), (42, 24), (42, 25), (42, 26), (42, 27), (42, 28), (42, 29), (42, 30), (42, 31), (43, 22), (43, 23), (43, 24), (43, 25), (43, 26), (43, 27), (43, 28), (43, 29), (43, 30), (43, 31), (43, 32), (45, 23), (45, 24), (45, 25), (45, 26), (45, 28), (45, 29), (45, 30), (45, 31), (45, 32), (45, 33), (45, 34), (45, 35), (46, 23), (46, 24), (46, 25), (47, 22), (47, 23), (47, 24), (47, 25), (47, 26), (47, 27), (47, 28), (47, 29), (47, 30), (47, 31), (47, 32), (47, 33), (49, 23), (49, 24), (49, 25), (49, 26), (49, 28), (49, 29), (51, 22), (51, 23), (51, 24), (51, 25), (51, 26), (51, 27), (51, 28), (51, 29), (51, 32), (52, 22), (52, 23), (52, 24), (52, 25), (52, 26), (52, 27), (52, 28), (52, 29), (52, 31), (52, 32), (53, 27), (53, 28), (53, 29), (53, 30), (53, 31), (53, 32), (53, 33), (53, 34), (53, 35), (53, 36), (53, 37), (54, 23), (54, 24), (54, 25), (54, 26), (54, 27), (54, 28), (54, 30), (55, 24), (55, 25), (55, 26), (55, 27), (55, 28), (55, 29), (55, 30), (55, 31), (55, 32), (56, 22), (56, 23), (56, 24), (56, 25), (56, 26), (56, 27), (56, 28), (56, 29), (56, 30), (56, 31), (56, 32), (56, 33), (56, 34), (56, 35), (56, 36), (56, 37), (56, 38), (57, 23), (57, 24), (57, 25), (57, 26), (57, 27), (57, 28), (57, 29), (57, 30), (57, 31), (57, 32), (57, 33), (57, 34), (57, 35), (57, 36), (57, 37), (61, 25), (61, 26), (61, 27), (61, 28), (61, 29), (61, 30), (61, 31), (61, 32), (63, 23), (63, 24), (63, 25), (63, 26), (63, 27), (63, 28), (63, 29), (64, 24), (64, 25), (64, 26), (64, 27), (64, 28), (64, 29), (64, 30), (64, 31), (64, 32), (64, 33), (64, 34), (64, 35), (64, 36), (64, 37), (64, 38), (64, 39), (64, 40), (65, 23), (65, 24), (65, 25), (65, 26), (65, 27), (66, 26), (67, 23), (67, 24), (67, 25), (67, 26), (67, 27), (67, 28), (67, 29), (67, 30), (67, 31), (67, 32), (67, 33), (67, 34), (70, 23), (70, 24), (70, 25), (70, 26), (70, 27), (70, 28), (70, 29), (70, 30), (70, 31), (70, 32), (70, 33), (70, 34), (70, 35), (70, 36), (70, 37), (71, 24), (71, 25), (71, 26), (71, 27), (71, 28), (71, 29), (71, 30), (71, 31), (71, 32), (71, 33), (72, 21), (72, 22), (72, 23), (72, 24), (72, 25), (72, 26), (72, 27), (72, 28), (72, 29), (72, 30), (72, 31), (72, 32), (72, 33), (72, 34), (73, 24), (73, 25), (73, 26), (73, 27), (73, 28), (73, 29), (73, 30), (73, 31), (73, 32), (73, 33), (73, 34), (73, 35), (73, 36), (74, 22), (76, 21), (76, 22), (76, 23), (76, 24), (76, 25), (76, 26), (76, 27), (76, 28), (76, 29), (76, 30), (76, 31), (77, 23), (77, 24), (77, 25), (77, 26), (77, 27), (77, 28), (77, 29), (77, 30), (77, 31), (77, 32), (77, 34), (78, 22), (78, 23), (78, 24), (78, 25), (78, 26), (78, 27), (78, 28), (78, 29), (78, 30), (78, 31), (78, 32), (78, 33), (78, 34), (78, 35), (78, 36), (80, 23), (80, 24), (80, 25), (80, 26), (80, 27), (80, 28), (80, 29), (80, 30), (80, 31), (80, 32), (81, 23), (81, 24), (81, 25), (81, 26), (81, 27), (81, 28), (81, 29), (81, 31), (81, 32), (82, 23), (82, 24), (82, 25), (82, 26), (82, 27), (82, 28), (82, 31), (82, 32), (82, 33), (82, 34), (82, 35), (82, 36), (82, 37), (84, 22), (84, 23), (84, 24), (84, 25), (84, 26), (84, 27), (84, 28), (84, 29), (84, 30), (84, 31), (84, 32), (84, 33), (84, 34), (85, 24), (85, 25), (85, 27), (85, 28), (85, 29), (85, 30), (85, 31), (85, 32), (85, 33), (87, 26), (87, 27), (87, 28), (87, 29), (87, 30), (87, 31), (87, 32), (87, 33), (87, 34), (87, 35), (87, 36), (87, 37), (87, 38), (87, 39), (87, 40), (87, 41), (87, 42), (87, 43), (88, 23), (88, 24), (88, 25), (88, 26), (88, 27), (88, 28), (88, 29), (89, 22), (89, 23), (89, 24), (89, 25), (89, 26), (89, 27), (89, 28), (89, 29), (89, 30), (89, 31), (89, 32), (89, 33), (89, 34), (93, 23), (93, 24), (93, 25), (93, 26), (93, 27), (93, 28), (93, 29), (93, 30), (93, 31), (93, 32), (93, 33), (93, 34), (95, 24), (95, 25), (95, 26), (95, 27), (95, 28), (95, 29), (95, 30), (95, 31), (95, 32), (95, 33), (95, 34), (95, 35), (95, 36), (95, 37), (95, 38), (96, 23), (96, 24), (96, 25), (96, 26), (96, 27), (96, 28), (96, 29), (96, 30), (96, 31), (96, 32), (96, 33), (96, 34), (96, 35), (96, 36), (96, 37), (96, 38), (97, 24), (97, 27), (97, 30), (97, 31), (98, 22), (98, 23), (98, 24), (98, 25), (98, 26), (98, 27), (98, 28), (98, 29), (98, 30), (98, 31), (98, 32), (98, 33), (98, 34), (100, 23), (100, 24), (100, 26), (100, 27), (100, 28), (100, 29), (100, 30), (100, 31), (100, 32), (100, 33), (101, 23), (101, 24), (101, 25), (101, 26), (101, 27), (101, 28), (101, 29), (101, 30), (101, 31), (101, 32), (103, 23), (103, 24), (103, 25), (103, 26), (103, 27), (103, 28), (103, 30), (103, 31), (104, 21), (104, 22), (104, 23), (104, 24), (104, 25), (104, 26), (104, 27), (104, 28), (104, 29), (104, 30), (104, 31), (104, 32), (104, 33), (104, 34), (104, 35), (105, 23), (105, 24), (105, 25), (105, 26), (105, 27), (105, 28), (105, 29), (105, 30), (105, 31), (105, 32), (105, 33), (105, 34), (105, 35), (105, 36), (107, 23), (107, 24), (107, 25), (107, 26), (107, 27), (107, 28), (107, 29), (107, 30), (107, 31), (107, 32), (107, 33), (107, 34), (107, 35), (107, 36), (107, 37), (107, 38), (107, 39), (109, 22), (109, 23), (109, 24), (109, 25), (109, 26), (109, 27), (109, 28), (109, 29), (109, 30), (109, 31), (109, 32), (109, 33), (109, 34), (109, 35), (109, 36), (109, 37), (111, 23), (111, 24), (111, 25), (111, 26), (111, 27), (111, 28), (111, 29), (111, 30), (111, 31), (111, 32), (111, 33), (112, 21), (112, 22), (112, 23), (112, 24), (114, 22), (114, 23), (114, 24), (114, 25), (116, 24), (116, 26), (116, 28), (116, 29), (116, 30), (116, 31), (116, 32), (117, 23), (117, 24), (117, 25), (117, 26), (117, 27), (117, 28), (117, 29), (117, 30), (117, 31), (117, 32), (117, 33), (117, 34), (117, 35), (120, 23), (120, 24), (120, 25), (120, 26), (120, 27), (120, 28), (120, 29), (120, 30), (120, 31), (120, 32), (120, 33), (120, 34), (120, 35), (120, 36), (121, 23), (121, 24), (121, 25), (121, 26), (121, 27), (121, 28), (121, 29), (121, 30), (121, 31), (121, 32), (121, 33), (121, 34), (121, 35), (121, 36), (121, 37), (121, 38), (121, 39), (122, 23), (122, 24), (122, 25), (122, 26), (122, 27), (122, 28), (122, 29), (122, 30), (122, 31), (122, 32), (122, 34), (122, 35), (122, 36), (123, 23), (123, 24), (123, 25), (123, 26), (123, 27), (123, 28), (123, 29), (123, 30), (123, 31), (123, 32), (123, 33), (123, 34), (123, 35), (124, 22), (124, 23), (124, 24), (124, 25), (124, 26), (124, 27), (124, 28), (124, 29), (124, 30), (124, 31), (124, 32), (124, 33), (124, 34), (124, 35), (124, 36), (124, 37), (128, 24), (128, 25), (128, 26), (128, 27), (128, 28), (128, 30), (128, 33), (129, 27), (129, 28), (129, 29), (129, 30), (132, 23), (133, 23), (133, 24), (133, 25), (133, 26), (133, 27), (133, 28), (133, 29), (133, 30), (133, 31), (133, 32), (133, 33), (133, 34), (133, 35), (133, 36), (134, 22), (134, 23), (134, 24), (134, 25), (134, 26), (134, 27), (134, 28), (134, 29), (134, 30), (134, 31), (134, 32), (134, 34), (136, 24), (136, 25), (136, 26), (136, 27), (136, 28), (136, 29), (136, 30), (136, 31), (136, 32), (136, 33), (136, 34), (136, 35), (136, 36), (136, 37), (136, 38), (137, 23), (137, 24), (137, 25), (137, 26), (137, 27), (137, 28), (137, 29), (137, 30), (137, 31), (137, 32), (137, 33), (137, 34), (137, 35), (138, 23), (138, 24), (138, 25), (138, 26), (138, 29), (138, 30), (140, 24), (140, 26), (140, 27), (140, 28), (140, 29), (140, 30), (140, 31), (140, 32), (141, 26), (143, 22), (143, 23), (143, 24), (143, 25), (143, 26), (143, 27), (145, 21), (145, 22), (145, 23), (145, 24), (145, 25), (145, 26), (145, 27), (145, 28), (145, 29), (145, 30), (145, 31), (145, 32), (146, 23), (146, 24), (146, 25), (146, 26), (146, 27), (146, 28), (146, 29), (146, 30), (146, 31), (146, 32), (146, 33), (146, 34), (147, 22), (147, 23), (147, 24), (147, 25), (147, 26), (147, 27), (147, 28), (147, 29), (147, 30), (147, 31), (147, 32), (147, 33), (147, 34), (149, 23), (149, 24), (149, 25), (149, 26), (149, 27), (149, 28), (149, 29), (149, 30), (149, 31), (149, 32), (149, 33), (154, 23), (154, 24), (154, 28), (154, 29), (154, 30), (154, 31), (156, 23), (156, 24), (156, 25), (156, 26), (156, 27), (156, 28), (156, 29), (156, 30), (156, 31), (156, 32), (156, 33), (1

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com