CS考试辅导 COMP2420/COMP6420 - Introduction to Data Management, Analysis and Securit

Wk2-2-practice

COMP2420/COMP6420 – Introduction to Data Management, Analysis and Security
Week 2 – Lecture 1 – Practice Example

Author in R – – 10 Mar 2019
Python Conversion – [ ]:

import pandas as pd
import numpy as np
import statistics as stats
from scipy import stats as spystats
import matplotlib as plt
%matplotlib inline

Practice Questions¶

Practice 1: Running Rivers¶

The Dataset rivers contains the length (in miles) of 141 major rivers in North America. The data is imported below:

rivers_data = pd.read_csv(“data/rivers.csv”, index_col=0)
rivers_data.columns = [“Length”]
rivers_data.head()

Your task is as follows:

– Compare the mean, median and mode of the length of the rivers. Is there a big difference between the three?
– What are the 25th, 50th & 75th quantiles of the length of the rivers?
– What proportion are less than the mean length of the rivers?

# Answers here

Practice 2: Math¶

The dataset Maths contains a single column of numbers for you to work upon

maths_data = pd.read_csv(“data/math.csv”, index_col=0)
maths_data.head()

Your task is as follows:

– Make a histogram and try to guess the mean, median and standard deviation.
– Check your guesses by running the appropiate python commands

# Answers here

Practice 3: Babyboom¶

The dataset ‘babyboom’ contains data on the births of 44 children in a one-day period at a Brisbane, Australia, hospital.

babybooming = pd.read_csv(‘data/babyboom.csv’, index_col=0)
babybooming.head()

Your task is as follows:

– Make a histogram of the wt (Weight) variable.
– Is the data symmetric or skewed?

# Answers here

Practice 4: Executive Pay¶

Sometimes a dataset is so skewed that it can help if we transform the data prior to looking at it. A common transformation for long-tailed data is to take the logarithm of the data. The ‘exec.pay’ data is highly skewed.

exec_pay = pd.read_csv(‘data/execpay.csv’, index_col=0)
exec_pay.head()

Your task is as follows:

– Generate Histograms before making any adjustments and after taking a logarithmic transform.
– Which is better for showing the data and why?
– Find the median and mean for the transformed data
– How does it correspond to the untransformed mean and median?

# Answers here

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts