COMP2420/COMP6420 – Introduction to Data Management, Analysis and Security¶
Live Coding Lecture – Independent Sample T Tests¶
Copyright By PowCoder代写 加微信 powcoder
# Important Imports
# MAKE SURE YOU RUN THIS CELL FIRST
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# hypothesis testing imports
from scipy import stats
# ignore warnings
import warnings
warnings.filterwarnings(‘ignore’)
dataset = sns.load_dataset(‘iris’)
dataset.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
dataset.species.unique()
array([‘setosa’, ‘versicolor’, ‘virginica’], dtype=object)
setosa = dataset[dataset.species == ‘setosa’]
versicolor = dataset[dataset.species == ‘versicolor’]
plt.figure(figsize=(12,8))
ax1 = sns.distplot(setosa.sepal_length)
ax2 = sns.distplot(versicolor.sepal_length)
plt.axvline(np.mean(setosa.sepal_length), color = ‘b’, linestyle = ‘dashed’, linewidth = 5)
plt.axvline(np.mean(versicolor.sepal_length), color = ‘orange’, linestyle = ‘dashed’, linewidth = 5);
Null Hypothesis: The means of both populations are equal. (The two population’s sepal length are from the same species)
Alternate Hypothesis: The means of both populations are not equal.(The two population’s sepal length are not from the same species)
setosa_sample = np.random.choice(setosa.sepal_length, N)
versicolor_sample = np.random.choice(versicolor.sepal_length, N)
t_value, p_value = stats.ttest_ind(versicolor_sample, setosa_sample)
print(f”T Value obtained is {t_value}.”)
print(f”P Value obtained is {p_value}.”)
T Value obtained is 6.12675097082924.
P Value obtained is 3.814930721300677e-07.
A large t-score tells you that the groups are different.
A small t-score tells you that the groups are similar.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com