程序代写 COMP2420/COMP6420 - Introduction to Data Management, Analysis and Security

Wk2-3-practice

COMP2420/COMP6420 – Introduction to Data Management, Analysis and Security
Lecture 6 – Practice Example

Author in R – – 10 Mar 2019
Python Conversion – [ ]:

import pandas as pd
import numpy as np
import statistics as stats
from scipy import stats as spystats
import matplotlib as plt
%matplotlib inline

Practice Questions¶

Practice 1: Muffled Maydow¶

As a large amount of stock traders work in City, it may be true that unreasonably good or bad weather there affects the performance of the stock market. The dataset ‘maydow’ contains data for the Dow Jones Industrial Average (DJIA) and maximum temperatures in Central Park for May 2003.

maydow_data = pd.read_csv(‘data/maydow.csv’, index_col=0)
maydow_data.head()

Your task is as follows:

– Make a scatterplot of the two variables (DJA and maximum temperature) and find the correlation
– Calculate R-square using Correlation
– Is there any relationship between DJA and temperature?

# Your answer here

Practice 2: Baffling Batting¶

The dataset batting contains baseball statistics for the 2002 major league baseball season. It is shown in the import below:

batting_data = pd.read_csv(‘data/Batting.csv’, header=None)
batting_data.columns = [‘playerID’, ‘year’, ‘stint’, ‘teamID’, ‘lgID’, ‘Games’, ‘All Bats’, ‘Runs’, ‘Hits’, ‘Doubles’, ‘Triples’, ‘Homeruns’, ‘Runs Batted In’, ‘Stolen Bases’, ‘Caught Stealing’, ‘Base on Balls’, ‘Strikeouts’, ‘Intentional Walks’, ‘Hit by Pitch’, ‘Sacrifice Hits’, ‘Sacrifice Flies’, ‘Grounded into Double Plays’]
batting_data.head()

Your task is as follows:

– What is the correlation between the number of strikeouts (‘Strikeouts’) and number of home runs (‘Homeruns’)?
– Does the data suggest that in order to hit a lot of home runs one should strike out a lot?
– Make a scatterplot to visually assess the trend.

# Your Answer Here

Practice 3: Willful Weightloss¶

The dataset wtloss contains measurements of a patient’s weight in kilograms during a weight-rehabilitation program. The import is shown below:

weightloss_data = pd.read_csv(‘data/wtloss.csv’, index_col=0)
weightloss_data.head()

Your task is as follows:

– Make a scatterplot showing how the variable Weight decays as a function of Days.
– What is the correlation of the two variables?
– Does the data appear appropiate for a regression model?

# Your Answer Here

Practice 4: Grand Galton¶

The galton dataset contains data collected by in 1885. Each data point contains a child’s height and an average of his or her parents’ heights.

galton_data = pd.read_csv(‘data/galton.csv’, index_col=0)
galton_data.head()

Your task is as follows:

– Do a T-test to see if there is a difference in the mean height. (Assume the paired T-test is appropiate)
– What problem is there with the above assumption?

# Your answer here

Practice 5: Coloured Candies¶

A large bag of M&Ms candies is filled from batches that contain a specified percentage of each of six colors. These percentages are given in the ‘mandms’ dataset.

eminems = pd.read_csv(‘data/mandms.csv’)
eminems.columns = [‘Type’, ‘Blue’, ‘Brown’, ‘Green’, ‘Orange’, ‘Red’, ‘Yellow’]
eminems.head()

Assume a package of candies contains the following color distribution: 15 blue, 34 brown, 7 green, 19 orange, 29 red, and 24 yellow.

Your task is as follows:

– Perform a chi-squared test with the null hypothesis that the candies are from a milk chocolate package.
– Repeat assuming the candies are from a Peanut package.
– Based on the p-values, which would you suspect is the true source of the candies?

Expected_outputs = [15,34,7,19,29,24]
# Your Answer Here

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts