The below table shows the top 5 entries in the dataset.
In [3]:
import pandas as pd
df = pd.read_csv(“./data/survey.csv”)
df.head()
Out[3]:
Q1a
Q1b
Q2: CSE 3241 or CSE 5241
Q2: CSE 2331 or 5331
Q2: Stats 3301
Q2: ISE 3200
Q3: Numerical Methods
Q3: Linear Algebra
Q3: Prob and Stats
Q3: Data & Discrete Structures
Q3: Algorithms
Q4: R
Q4: Python
Q4: IDEs
Q4: Jupyter
0
CSE
Undergrad
yes
yes
no
no
0
1
2
2
1
1
2.0
2
0
1
Other
Grad
no
yes
no
no
3
3
2
2
2
1
2.0
0
2
2
CSE
Undergrad
yes
yes
no
no
0
2
2
2
2
0
2.5
2
2
3
Other
Grad
no
no
no
no
1
2
2
2
1
0
2.0
2
2
4
Other
Grad
yes
yes
no
no
2
2
2
1
2
0
1.0
1
1
Lets first check what the distribution of the class is in terms of the disciplines.
In [42]:
import matplotlib.pyplot as plt
# Pie chart, where the slices will be ordered and plotted counter-clockwise:
# explode = [0.2, 0.1, 0.1, 0.1]
# create split
sizes = df.groupby(‘Q1a’).size()
print(sizes)
fig1, ax1 = plt.subplots(figsize=(18,8)) # <-- Increase the size of the plot
ax1.pie(sizes, autopct='%1.1f%%', shadow=False, startangle=90)
ax1.set_title('Spread of the class')
ax1.legend(sizes.index)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()
Q1a
CSE 11
ME 2
Math 1
Other 12
Stats 3
dtype: int64

What about the class in terms of their progress towards their degree. How many graduate and undergraduate students are there in the class ?
In [35]:
import numpy as np
# this configuration is used by pandas .hist call below.
fig, ax = plt.subplots(figsize=(8,9))
ax.set_title("Degrees")
ax.set_ylabel("Freq.")
sizes = df.groupby('Q1b').size()
ax.set_xticklabels(sizes.index)
ax.set_xticks(range(len(sizes.index)))
# pandas has some common builtin matplotlib plots that are useful in doing common analysis
ax.bar(range(len(sizes.index)), sizes, align='center', alpha=0.5)
Out[35]:

Those who havent taken stats 3301, I want to know what the distribution of their scores are for Q3: Probability and Statistics
In [39]:
import numpy as np
# this configuration is used by pandas .hist call below.
fig, ax = plt.subplots(figsize=(8,9))
ax.set_title(“Histogram of Q3: Probability and Statistics of students who didnt take stats 3301”)
ax.set_ylabel(“Freq.”)
ax.set_xlabel(“proficency”)
# pandas has some common builtin matplotlib plots that are useful in doing common analysis
filtered_df = df[df[‘Q2: Stats 3301’] == “no”]
sizes = filtered_df.groupby(‘Q3: Prob and Stats’).size()
ax.set_xticklabels(sizes.index)
ax.set_xticks(range(len(sizes.index)))
# pandas has some common builtin matplotlib plots that are useful in doing common analysis
ax.bar(range(len(sizes.index)), sizes, align=’center’, alpha=0.5)
Out[39]:

Where:
0. Zilch
1. beginner
2. intermediate
3. expert
What about those who did take the class ?
In [43]:
import numpy as np
# this configuration is used by pandas .hist call below.
fig, ax = plt.subplots(figsize=(8,9))
ax.set_title(“Histogram of Q3: Probability and Statistics of students who did take stats 3301”)
ax.set_ylabel(“Freq.”)
ax.set_xlabel(“proficency”)
# pandas has some common builtin matplotlib plots that are useful in doing common analysis
filtered_df = df[df[‘Q2: Stats 3301’] == “yes”]
sizes = filtered_df.groupby(‘Q3: Prob and Stats’).size()
ax.set_xticklabels(sizes.index)
ax.set_xticks(range(len(sizes.index)))
# pandas has some common builtin matplotlib plots that are useful in doing common analysis
ax.bar(range(len(sizes.index)), sizes, align=’center’, alpha=0.5)
Out[43]:

In [ ]: