Pandas_Visualizations
Visual Analytics – Pandas Visualizations¶
Pandas has a built-in visualization library that builds off matplotlib. Here are some examples of what you can do with it.
Copyright By PowCoder代写 加微信 powcoder
Install necessary libraries¶
#!pip install numpy
#!pip install pandas
#!pip install matplotlib
Import necessary libraries¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
Import data¶
df1 = pd.DataFrame(np.random.randn(50,4), columns=[‘A’,’B’,’C’,’D’])
df1.head()
0 0.011407 -2.573139 0.629844 0.863108
1 1.000630 0.093029 1.749768 0.904575
2 0.510490 1.356406 1.458783 0.225151
3 -0.395484 0.294955 -0.902467 -0.747303
4 -0.542393 -0.740049 -0.022740 0.376725
df2 = df1 + 10
df2.head()
0 10.011407 7.426861 10.629844 10.863108
1 11.000630 10.093029 11.749768 10.904575
2 10.510490 11.356406 11.458783 10.225151
3 9.604516 10.294955 9.097533 9.252697
4 9.457607 9.259951 9.977260 10.376725
Matplotlib has several style sheets that can be used to alter the appearance of a plot. Just import matplotlib and use plt.style.use() prior to drawing a plot.
plt.style.use(‘bmh’)
Plot Types¶
There are several plot types built-in to pandas:
df.plot.area
df.plot.barh
df.plot.density
df.plot.hist
df.plot.line
df.plot.scatter
df.plot.bar
df.plot.box
df.plot.hexbin
df.plot.kde
df.plot.pie
These can also be called using the kind argument with plot, e.g. for hist, df.plot(kind=’hist’). To make other plots with this style syntax, just set kind equal to one of the key terms in the list above (e.g., ‘box’,’barh’, etc.)
Histogram¶
Use df[‘col_name’].hist() to plot a histogram of count values.
df1[‘A’].plot.hist()
df1[‘A’].plot.hist(rwidth=0.9) # set width of each rectangle
df1[‘A’].plot.hist(edgecolor=”black”) # set edge color to “black”
plt.style.use(‘dark_background’) # change style
df1[‘B’].hist()
plt.style.use(‘ggplot’)
df1[‘A’].plot.hist(bins=50)
df2.plot.area(alpha=0.4)
df2.loc[0:10].plot.bar()
Stacked Bar plots¶
df2.loc[0:10].plot.bar(stacked=True)
Line Plots¶
df1.plot.line(y=’B’,figsize=(12,3),lw=1)
Scatter Plots¶
df1.plot.scatter(x=’A’,y=’B’)
Color based on another column¶
You can color data to represent a third axis by using the ‘c’ argument. Here we color the points based on the ‘C’ column
df1.plot.scatter(x=’A’,y=’B’,c=’C’,cmap=’coolwarm’)
Size based on another column¶
You can size datapoints to represent a third axis by using the ‘c’ argument. Here we size of the points are based on the ‘C’ column.
with np.errstate(divide=’ignore’,invalid=’ignore’,over=’ignore’,under=’ignore’):
df1.plot.scatter(x=’A’,y=’B’,s=df1[‘C’]*200)
C:\Users\roman\anaconda3\lib\site-packages\matplotlib\collections.py:922: RuntimeWarning: invalid value encountered in sqrt
scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor
df2.plot.box()
Hexagonal Bin Plot¶
This for useful for bivariate data.
df3 = pd.DataFrame(np.random.randn(2000, 2), columns=[‘A’, ‘B’])
df3.plot.hexbin(x=’A’,y=’B’,gridsize=25,cmap=’Oranges’)
Kernel Density Estimation plot (KDE)¶
df2[‘B’].plot.kde()
df1.plot.density()
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com