Seaborn_Visualizations
Visual Analytics – Seaborn¶
Seaborn is python data visualization library that is based on matplotlib.
Copyright By PowCoder代写 加微信 powcoder
Categorical Data Plots:¶
violinplot
factorplot
Distribution Plots:¶
Regression Plots:¶
Grid Plots:¶
Matrix Plots:¶
clustermap
Install necessary libraries¶
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install seaborn
Requirement already satisfied: numpy in c:\users\roman\anaconda3\lib\site-packages (1.20.1)
Requirement already satisfied: pandas in c:\users\roman\anaconda3\lib\site-packages (1.2.4)
Requirement already satisfied: python-dateutil>=2.7.3 in c:\users\roman\anaconda3\lib\site-packages (from pandas) (2.8.1)
Requirement already satisfied: numpy>=1.16.5 in c:\users\roman\anaconda3\lib\site-packages (from pandas) (1.20.1)
Requirement already satisfied: pytz>=2017.3 in c:\users\roman\anaconda3\lib\site-packages (from pandas) (2021.1)
Requirement already satisfied: six>=1.5 in c:\users\roman\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
Requirement already satisfied: matplotlib in c:\users\roman\anaconda3\lib\site-packages (3.3.4)
Requirement already satisfied: cycler>=0.10 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: pillow>=6.2.0 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (8.2.0)
Requirement already satisfied: numpy>=1.15 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (1.20.1)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib) (2.8.1)
Requirement already satisfied: six in c:\users\roman\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib) (1.15.0)
Requirement already satisfied: seaborn in c:\users\roman\anaconda3\lib\site-packages (0.11.1)
Requirement already satisfied: matplotlib>=2.2 in c:\users\roman\anaconda3\lib\site-packages (from seaborn) (3.3.4)
Requirement already satisfied: numpy>=1.15 in c:\users\roman\anaconda3\lib\site-packages (from seaborn) (1.20.1)
Requirement already satisfied: scipy>=1.0 in c:\users\roman\anaconda3\lib\site-packages (from seaborn) (1.6.2)
Requirement already satisfied: pandas>=0.23 in c:\users\roman\anaconda3\lib\site-packages (from seaborn) (1.2.4)
Requirement already satisfied: cycler>=0.10 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib>=2.2->seaborn) (0.10.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib>=2.2->seaborn) (8.2.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib>=2.2->seaborn) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib>=2.2->seaborn) (1.3.1)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\roman\anaconda3\lib\site-packages (from matplotlib>=2.2->seaborn) (2.8.1)
Requirement already satisfied: six in c:\users\roman\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.15.0)
Requirement already satisfied: pytz>=2017.3 in c:\users\roman\anaconda3\lib\site-packages (from pandas>=0.23->seaborn) (2021.1)
Import necessary libraries¶
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
Import data¶
Let’s import some data to plot with
df = sns.load_dataset(‘tips’)
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
Categorical Plot Examples¶
sns.barplot(x=’sex’,y=’total_bill’,data=df)
countplot¶
Used to count samples.
sns.countplot(x=’sex’,data=df)
box plots show the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range. They are used to show the distribution of categorical data
sns.boxplot(x=”day”, y=”total_bill”, data=df)
The hue argument can be used to split the data based on a 3rd feature. This causes each exiting box plot to be split into two corressponding to the 3rd feature.
sns.boxplot(x=”day”, y=”total_bill”, hue=”smoker”,data=df)
violinplot¶
violin plots are used for similar reasons to box plots. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.
sns.violinplot(x=”day”, y=”total_bill”, data=df)
The hue argument can similarly be used with violin plots but violin plots also take an additional argument called split which allows each half of the violin plot to correspond to a value of the new feature.
sns.violinplot(x=”day”, y=”total_bill”, data=df,hue=’sex’,split=True)
stripplot¶
The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.
sns.stripplot(x=”day”, y=”total_bill”, data=df)
The jitter argument allows us to scatter the plots a bit so we can see how dense the data is.
sns.stripplot(x=”day”, y=”total_bill”, data=df, jitter=True)
The hue argument allows us to color the plots based on a 3rd argument. Here the additional split argument will split the stripplot into two separate plots.
w/o split¶
sns.stripplot(x=”day”, y=”total_bill”, data=df, jitter=True, hue=’sex’,dodge=False)
sns.stripplot(x=”day”, y=”total_bill”, data=df, jitter=True, hue=’sex’,dodge=True)
swarmplot¶
The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).
sns.swarmplot(x=”day”, y=”total_bill”, data=df)
sns.swarmplot(x=”day”, y=”total_bill”, hue=’sex’, data=df, dodge=True, size=4)
catplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type:
sns.catplot(x=’sex’,y=’total_bill’,data=df,kind=’bar’)
Distribution Plots¶
sns.distplot(df[‘total_bill’])
C:\Users\roman\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
sns.distplot(df[‘total_bill’],kde=False,bins=30)
C:\Users\roman\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
jointplot¶
jointplot() allows you to basically match up two distplots for bivariate data. With your choice of what kind parameter to compare with:
sns.jointplot(x=’total_bill’,y=’tip’,data=df,kind=’hex’)
sns.jointplot(x=’total_bill’,y=’tip’,data=df,kind=’resid’)
pairplot will plot pairwise relationships across an entire dataframe (for the numerical columns) and supports a color hue argument (for categorical columns).
sns.pairplot(df)
sns.pairplot(df,hue=’sex’)
rugplots are actually a very simple concept, they just draw a dash mark for every point on a univariate distribution. They are the building block of a KDE plot:
sns.rugplot(df[‘total_bill’])
kdeplots or Kernal Density Estimation plots replace every single observation with a Gaussian (Normal) distriubtion centered around that value.
sns.kdeplot(df[‘total_bill’])
sns.rugplot(df[‘total_bill’])
Regression plots¶
Draw a linear regression plot
sns.lmplot(x=’total_bill’,y=’tip’,data=df)
Using hue¶
sns.lmplot(x=’total_bill’,y=’tip’,data=df, hue=’sex’)
Using col¶
sns.lmplot(x=’total_bill’,y=’tip’,data=df,col=’sex’)
irisDF = sns.load_dataset(‘iris’)
irisDF.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Map to all¶
# Plot to pairgrid
g = sns.PairGrid(irisDF)
g.map(plt.scatter) # feature wise scatter plot
Map to regions¶
# Map to upper,lower, and diagonal
g = sns.PairGrid(irisDF)
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)
pairplot is a simpler version of PairGrid
sns.pairplot(irisDF)
sns.pairplot(irisDF,hue=’species’)
FacetGrid¶
General way to create a grid of a feature
g = sns.FacetGrid(df, col=”time”, row=”smoker”, hue=’sex’)
g = g.map(plt.hist, “total_bill”)
JointGrid¶
JointGrid is the general version for jointplot() type grids
g = sns.JointGrid(x=”total_bill”, y=”tip”, data=df)
g = g.plot(sns.regplot, sns.distplot)
C:\Users\roman\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
C:\Users\roman\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
C:\Users\roman\anaconda3\lib\site-packages\seaborn\distributions.py:1647: FutureWarning: The `vertical` parameter is deprecated and will be removed in a future version. Assign the data to the `y` variable instead.
warnings.warn(msg, FutureWarning)
Matrix Plots¶
Matrix plots allow you to plot data as color-encoded matrices.
sns.heatmap(df.corr())
w/ annot (annotation)¶
sns.heatmap(df.corr(), annot=True)
flights = sns.load_dataset(‘flights’)
pvflights = flights.pivot_table(values=’passengers’,index=’month’,columns=’year’)
sns.heatmap(pvflights)
w/ separation using linecolor and linewidths¶
sns.heatmap(pvflights,cmap=’magma’,linecolor=’white’,linewidths=1)
clustermap¶
The clustermap uses hierarchal clustering to produce a clustered version of the heatmap.
sns.clustermap(pvflights)
w/ scaling from 0-1 using standard_scale¶
sns.clustermap(pvflights,standard_scale=1)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com