PowerPoint Presentation
lecture 1: Hello-World Machine Learning Project
Instructor:
Copyright By PowCoder代写 加微信 powcoder
A Step-by-step guide
Modules (libraries)
Versions (suggested)
Hello World Project
Why Python
Popular in Industry/Academia
Development community
Source codes available
Easy to use
Variables without declaring
Define classes but are not enforced
Why Python
Built-in or Third-party Libraries/Modules
Data loading, visualization, statistics, natural language processing, machine learning, image processing
Interact directly with the code, using a terminal or other tools like the Jupyter Notebook.
Creating complex graphical user interfaces (GUIs)
Integration into existing systems (key)
Built-in Libraries
scikit-learn
Jupyter Notebook
matplotlib
Installment using conda
Available for windows, ubuntu, mac-os, linux etc.
Download and install conda
Installing packages with conda prompt:
conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz
A. Library: scikit-learn
Open source project (free, source codes)
Active user community: state-of-the-art machine learning algorithms
Widely used in industry and academia
User Guide and API documentation:
https://scikit-learn.org/stable/user_guide.html
B. Environment: Jupyter Notebook
Interactive environment for running Python codes (and other languages) in the browser
Exploratory data analysis
All the codes for this lecture are provided in Lecture-HelloWorld.ipynb
C. Library: Num for multi-dimensional arrays, high-level mathematical functions (linear algebra operations), Fourier transform, pseudorandom number generators.
To use scikit-learn, all data have to be converted to a NumPy array
A numPy array looks like this:
D. Library: Sci collection of functions for scientific computing: function optimization, signal processing, statistical distributions
scikit-learn uses SciPy to implement its algorithms
scipy.sparse: sparse matrices, another data representation in scikit-learn
Library: Sci . Library: matplotlib
Primary scientific plotting library in Python
High-quality visualizations: line charts, histograms, scatter plots, etc.
E. Library: pandas
A library for data analysis
Data structure: DataFrame, a table, and a collection of methods to modify and operate on this table.
allows queries and joins of tables (e.g., SQL)
Individual columns might have different data types (integers, dates, float, strings)
Functions for loading data from databases, e.g., Excel files, comma-separated values (CSV) files.
F. Library: pandas
Guides and Tutorials: https://pandas.pydata.org/pandas-docs/stable/tutorials.html
Built-in Libraries
scikit-learn
Jupyter Notebook
matplotlib
Versions (recommended)
There is no need to strictly follow these versions
Hello World: Flower Classification
A machine learning project that can classify Iris species
Hello World: Flower Classification
Task: learn a flower classifier that can distinguish flower species
Major Steps
Collect a set of flowers
Ask experts to provide each flower with a label ( setosa, versicolor, or virginica)
Measure each flower’s: petal length, petal width, sepal length, sepal width – – hopefully these measurements can be used to separate flower species.
Learn to make sense these data
For a new flower and its measurements, apply the above learned knowledge to identify its flower type
Database : https://archive.ics.uci.edu/ml/datasets/Iris
Flower Types:
— Iris Setosa
— Iris Versicolour
— Iris Virginica
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
The dataset is included in scikit-learn and can be used by using the load_iris function:
Access data in iris_dataset, a dictionary
Access data in iris_dataset, a dictionary
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Splitting data
Split the total of 150 flowers into two sets: one for training models; the other one for testing purposes
The built-in function in scikit-learn can shuffle the dataset and do the splitting (75% vs 25%, by default)
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Plotting data
Exploring data is a critical step before actually writing machine learning code
Pair-plot: creating a scatter for each pair of features
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
K-Nearest Neighbor Methods
Nearest Neighbor Method: to classify a new data point, it finds the data point in the training set that is closed to the new point; assigns the label of this training point to the new data point
K-NN method: instead of finding the nearest one, retrieve the top K nearest training points, and these the labels of these training points to vote for the prediction (e.g., by majority voting)
Setting K=1 implies the nearest neighbor method
Quiz: for two samples, how to measure their closeness (similarities)?
Answer: Euclidean distance: e.g., the point (0,0) is closer to (0, 1) than ( 2, 1)
Sci-kit provides a handful of machine learning models, including KNN
Import libraries/functions
Create a knn instance and fit it with training data
Output of knn.fit(): a string representing the classifier object
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Predictions
A new data point
Making predictions using the knn object:
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Evaluations
Assess how well a model works
The number of errors made among the predictions on a set of testing data points
Generating predictions
Calculate accuracy
Implement KNN using four lines of codes
Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations
Take-home Quiz (Extra Credits)
How does the parameter K affect the prediction accuracy?
Please try a number of different Ks while running the codes, and draw your conclusions from the comparisons.
What to submit?
Prepare a .pdf to summarize your findings. Your writeup should be up to one page and might include figures, tables, and/or texts
This is an extra credit assignment
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com