Wk2-2-practice
COMP2420/COMP6420 – Introduction to Data Management, Analysis and Security
Week 2 – Lecture 1 – Practice Example
Copyright By PowCoder代写 加微信 powcoder
Author in R – – 10 Mar 2019
Python Conversion – [ ]:
import pandas as pd
import numpy as np
import statistics as stats
from scipy import stats as spystats
import matplotlib as plt
%matplotlib inline
Practice Questions¶
Practice 1: Running Rivers¶
The Dataset rivers contains the length (in miles) of 141 major rivers in North America. The data is imported below:
rivers_data = pd.read_csv(“data/rivers.csv”, index_col=0)
rivers_data.columns = [“Length”]
rivers_data.head()
Your task is as follows:
– Compare the mean, median and mode of the length of the rivers. Is there a big difference between the three?
– What are the 25th, 50th & 75th quantiles of the length of the rivers?
– What proportion are less than the mean length of the rivers?
# Answers here
Practice 2: Math¶
The dataset Maths contains a single column of numbers for you to work upon
maths_data = pd.read_csv(“data/math.csv”, index_col=0)
maths_data.head()
Your task is as follows:
– Make a histogram and try to guess the mean, median and standard deviation.
– Check your guesses by running the appropiate python commands
# Answers here
Practice 3: Babyboom¶
The dataset ‘babyboom’ contains data on the births of 44 children in a one-day period at a Brisbane, Australia, hospital.
babybooming = pd.read_csv(‘data/babyboom.csv’, index_col=0)
babybooming.head()
Your task is as follows:
– Make a histogram of the wt (Weight) variable.
– Is the data symmetric or skewed?
# Answers here
Practice 4: Executive Pay¶
Sometimes a dataset is so skewed that it can help if we transform the data prior to looking at it. A common transformation for long-tailed data is to take the logarithm of the data. The ‘exec.pay’ data is highly skewed.
exec_pay = pd.read_csv(‘data/execpay.csv’, index_col=0)
exec_pay.head()
Your task is as follows:
– Generate Histograms before making any adjustments and after taking a logarithmic transform.
– Which is better for showing the data and why?
– Find the median and mean for the transformed data
– How does it correspond to the untransformed mean and median?
# Answers here
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com