CS代写 PowerPoint Presentation

PowerPoint Presentation

lecture 1: Hello-World Machine Learning Project
Instructor:

Copyright By PowCoder代写 加微信 powcoder

A Step-by-step guide
Modules (libraries)
Versions (suggested)
Hello World Project

Why Python
Popular in Industry/Academia
Development community
Source codes available
Easy to use
Variables without declaring
Define classes but are not enforced

Why Python
Built-in or Third-party Libraries/Modules
Data loading, visualization, statistics, natural language processing, machine learning, image processing
Interact directly with the code, using a terminal or other tools like the Jupyter Notebook.
Creating complex graphical user interfaces (GUIs)
Integration into existing systems (key)

Built-in Libraries
scikit-learn
Jupyter Notebook
matplotlib

Installment using conda
Available for windows, ubuntu, mac-os, linux etc.
Download and install conda

Installing packages with conda prompt:
conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz python-graphviz

A. Library: scikit-learn
Open source project (free, source codes)
Active user community: state-of-the-art machine learning algorithms
Widely used in industry and academia
User Guide and API documentation:
https://scikit-learn.org/stable/user_guide.html

B. Environment: Jupyter Notebook
Interactive environment for running Python codes (and other languages) in the browser
Exploratory data analysis
All the codes for this lecture are provided in Lecture-HelloWorld.ipynb

C. Library: Num for multi-dimensional arrays, high-level mathematical functions (linear algebra operations), Fourier transform, pseudorandom number generators.
To use scikit-learn, all data have to be converted to a NumPy array
A numPy array looks like this:

D. Library: Sci collection of functions for scientific computing: function optimization, signal processing, statistical distributions
scikit-learn uses SciPy to implement its algorithms
scipy.sparse: sparse matrices, another data representation in scikit-learn

Library: Sci . Library: matplotlib
Primary scientific plotting library in Python
High-quality visualizations: line charts, histograms, scatter plots, etc.

E. Library: pandas
A library for data analysis
Data structure: DataFrame, a table, and a collection of methods to modify and operate on this table.
allows queries and joins of tables (e.g., SQL)
Individual columns might have different data types (integers, dates, float, strings)
Functions for loading data from databases, e.g., Excel files, comma-separated values (CSV) files.

F. Library: pandas
Guides and Tutorials: https://pandas.pydata.org/pandas-docs/stable/tutorials.html

Built-in Libraries
scikit-learn
Jupyter Notebook
matplotlib

Versions (recommended)
There is no need to strictly follow these versions

Hello World: Flower Classification
A machine learning project that can classify Iris species

Hello World: Flower Classification
Task: learn a flower classifier that can distinguish flower species
Major Steps
Collect a set of flowers
Ask experts to provide each flower with a label ( setosa, versicolor, or virginica)
Measure each flower’s: petal length, petal width, sepal length, sepal width – – hopefully these measurements can be used to separate flower species.
Learn to make sense these data
For a new flower and its measurements, apply the above learned knowledge to identify its flower type

Database : https://archive.ics.uci.edu/ml/datasets/Iris

Flower Types:
— Iris Setosa
— Iris Versicolour
— Iris Virginica

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

The dataset is included in scikit-learn and can be used by using the load_iris function:

Access data in iris_dataset, a dictionary

Access data in iris_dataset, a dictionary

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Splitting data
Split the total of 150 flowers into two sets: one for training models; the other one for testing purposes
The built-in function in scikit-learn can shuffle the dataset and do the splitting (75% vs 25%, by default)

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Plotting data
Exploring data is a critical step before actually writing machine learning code
Pair-plot: creating a scatter for each pair of features

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

K-Nearest Neighbor Methods
Nearest Neighbor Method: to classify a new data point, it finds the data point in the training set that is closed to the new point; assigns the label of this training point to the new data point
K-NN method: instead of finding the nearest one, retrieve the top K nearest training points, and these the labels of these training points to vote for the prediction (e.g., by majority voting)
Setting K=1 implies the nearest neighbor method

Quiz: for two samples, how to measure their closeness (similarities)?
Answer: Euclidean distance: e.g., the point (0,0) is closer to (0, 1) than ( 2, 1)

Sci-kit provides a handful of machine learning models, including KNN
Import libraries/functions

Create a knn instance and fit it with training data

Output of knn.fit(): a string representing the classifier object

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Predictions
A new data point

Making predictions using the knn object:

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Evaluations
Assess how well a model works
The number of errors made among the predictions on a set of testing data points

Generating predictions

Calculate accuracy

Implement KNN using four lines of codes

Steps in Python
Four measurements per flower
Splitting Data
Plotting Data
Build the first model: K-Nearest Neighbors
Making predictions
Evaluations

Take-home Quiz (Extra Credits)
How does the parameter K affect the prediction accuracy?

Please try a number of different Ks while running the codes, and draw your conclusions from the comparisons.

What to submit?
Prepare a .pdf to summarize your findings. Your writeup should be up to one page and might include figures, tables, and/or texts
This is an extra credit assignment

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com