程序代写代做代考 algorithm PowerPoint Presentation

PowerPoint Presentation

1

Spark MLlib
Spark’s scalable machine learning library
linear SVM and logistic regression
classification and regression tree
recommendation via alternating least squares
clustering via k-means, Gaussian mixtures, and power iteration clustering

High-quality algorithms, 100x faster than MapReduce
Most machine learning algorithms are iterative
Note: As of Spark 2.0, the primary Machine Learning API for Spark is now the DataFrame-based API (spark.ml). RDD-based APIs (spark.mllib) have entered maintenance mode, and is expected to be removed in Spark 3.0.

2

Unsupervised learning
Examples
Clustering
Probability distribution estimation
Association rule mining
Dimension reduction

Supervised learning
Examples
Prediction
Classification, regression

Supervised learning

ML Pipelines
Inspired by scikit-learn
DataFrame
Pipeline componens:
Transformer
Estimator
Parameters

Transformers
Converts one dataframe to another
Must implement a method transform()
Examples:
Model:
DataFrame[id: int, feature_vector: Vector] =>
DataFrame[id: int, label: string]
Feature transformer:
DataFrame[id: int, text: string] =>
DataFrame[id: int, feature_vector: Vector]

Estimators
Input: DataFrame
Output: Model
Must implement a method fit()
Example:
LogisticRegression is an Estimator.
Calling fit() trains a LogisticRegressionModel, which is a Model (hence also a Transformer).

Parameters
Both transformers and estimators can have parameters
Set parameters:
lr = LogisticRegression()
lr.setMaxIter(10)
Pass a ParamMap to fit() or transform().
A ParamMap is a set of (parameter, value) pairs.

Logistic regression
Training data: feature vectors with binary labels
The trained model is a nonlinear function f(x) that maps testing data to [0, 1]
Returns 1 if f(x) > 0.5
Returns 0 if f(x) < 0.5 See example: Training Pipeline (Estimator) transformer transformer transformer estimator DataFrame Trained PipelineModel (transformer) Replace the estimator in the training Pipeline with the trained model, which is a transformer See example Cross Validation Training Testing Train-test split 5-Fold Cross Validation /docProps/thumbnail.jpeg