Wk2-2-demo
COMP2420/COMP6420 – Introduction to Data Management, Analysis and Security
Week 2 – Lecture 1 – DEMO
Copyright By PowCoder代写 加微信 powcoder
Author in R – – 10 Mar 2019
Python Conversion – Alex, [ ]:
# Important Imports
import pandas as pd
import numpy as np
import statistics as stats
from scipy import stats as spystats
import matplotlib as plt
%matplotlib inline
Based on the lecture content, you should now be introduced to the multiple different types of data that can be used in data analysis and visualisations, and how we can summerise the data. We now aim to show this.
Categorical data¶
We are going to start with the most common type of data: categorical data. For the purposes of this piece, we will use the following definition of categorical variables
[…] A categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. (reference)
Web browser preferences¶
We are going to start with this (fake) example of people’s preferences of their favourite web browser. While we haven’t provided the answer for each preference, we are going to pretend the following:
Preference # Web Browser
1 Internet Explorer (IE)
2 Microsoft Edge
3 Google Chrome
4 Mozilla Firefox
Why did we choose these? Because 1 is the most common occurance, because naturally IE is the most popular and best web browser 😉
# defining the preferences and creating a dataframe
# yes, you can create a dataframe from a list, or dictionary, or many other items
# don’t forget to name the column
webBrowser = [3,4,1,1,3,4,3,3,1,3,2,1,2,1,2,3,2,3,1,1,1,1,4,3,1]
Visual Representation¶
Obviously we should show the results, so lets start with the obvious method: visualisations
# Lets show the best preferences, using a decent graph
# Above is one way to plot, however this isn’t the only way we can use a bar chart. For small values above it is easy to tell the difference,
# but how do we tell the difference for large numbers such as below?
webPreferences = {“IE”: 45, “Firefox”: 44, “Chrome”: 46}
Statistical Representation¶
How else should we show the data?
sample_Data = [18,19,18,20,18,18,20,21,37,18]
Summarising data¶
For those who have done statistics before, the 5 number summary is always a useful system for determining information in a system.
The 5 number summary has the following features:
The minimum value
The lower quartile (0.25)
The median (or middle quartile)
The upper quartile (0.75)
The maximum value
sample_Five = [10,12,15,17,21]
Graphical Summaries of Data¶
Lets talk about some ways we can build graphs.
# Graphical Summaries of Data
Histogram¶
# Classic Histogram
# 5 Break Histogram
# Normalised Histogram
# Boxplot – Original
# Boxplot – Notch
# Boxplot – Horizontal
Bivariate Data¶
While we have spoken about univariate (1 variable data), we can also plot Bivariate data. Consider the dataset below.
For the purposes of this course, Bivariate Data is defined as:
Data on each of two variables, where each value of one of the variables is paired with a value of the other variable
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com