Review of Course/Syllabus
Introduction to Machine Learning
Part 1
‹#›
Learning Objectives for This Class
Distinguish between learning and non-learning in AI
Know when to apply neural nets
Comfortable with “genetic algorithm”
‹#›
By covering the basics of various techniques, we will compare how they can be applied. We will restrict this to two major learning approaches.
Neural Nets: What?
A problem-solving technique that simulates neurons and their interaction.
https://en.wikipedia.org/wiki/Neuron
‹#›
Neural nets are based on aspects of the brain. A Neural net consists of neurons—cells that take input from and provide output to other neurons. Importantly, a single neuron does not seem to encode knowledge as we understand it. Knowledge is encoded by the set of connections between neurons.
Application Examples of Neural Nets
Checking loan applications (Chase)
input: past application / success pairs
result: net usable for new applications
Recognizing handwriting (Apple)
Input: sample pairs of script / print
Credit card fraud detection
Predicting investments (Nikko; Fidelity)
Real estate appraisal
Predicting hospital stays …
‹#›
The idea of simulating interconnected neurons has led to many applications, some of which are listed here. The rate and scale of applications has become immense—too long to list, really.
Examples of Neural Net Applications
System that learns to discriminate sounds
Diagnostics
Medicine
Autos
Plant repair
….
‹#›
Neural Nets : Use when …
No model or expertise is known
Input/(output) examples are known
Input and output can be encoded as vectors
Input
Model
Output
‹#›
We typically use neural nets when we don’t have a model of how a process actually works. But we must have a set of actual input/outputs. These are mostly from the real world. The input and output of a neural net must each be a vector—an array of numbers.
Neural Nets: Input
Vector of real numbers
(or convert to this form)
Input
Model
Output
‹#›
It is actually possible to represent input as an array of numbers, in many cases.
Neural Nets: Output
Input
Model
Output
Vector of real values,
giving answers …
consistent with performance on examples
‹#›
Once the architecture of a neural net has been decided (i.e., the interconnection pattern and the nature of each neuron’s data processing), it is fed a set of input/output pairs. These form the training set. When successfully trained, the net gives the respective output for each input to within a tolerable error. It is then ready to be tested and used on any input.
Neural Nets : Model
Set up processing elements (“neurons” or “nodes”) that behave like human brain neurons, with connections. Adapt the weighting on the connections to adapt the neural net to the problem.
‹#›
In many ways (even though they are modeled on brains), neural nets are “dumb.” Once we set one up, it learns from input/output data by making many numerical adjustments. We don’t model how neural nets actually work beyond that: we often can’t form much of a vision about why exactly they end up the way they do. Sometimes we get an idea of this, however.
An important skill is in how we architect (“wire” so to speak) each neural net application in the first place, i.e., prior to training it. This will be explored later in the course when we focus on neural nets.
Modelling Neuronal I/O
Process
* Output =
f (wo + w’o’)
*
*
*
Transfer function
input wo+w’o’
weight w
weight w’
‹#›
In software, we model each connection with a number (the weight) that reflects its relative strength.
Each neuron in a neural net takes as input the sum of a output of other neurons, weighted by the connection strength. It then applies a function (a transfer function) to this quantity. The output of this function becomes an input to other neurons (after weighting), or else it is the output to the whole neural net.
Modelling Neuronal I/O
Process
Process
Process
Process
Process
Process
output o
output o’
*
*
*
Neuron simulation
input wo+w’o’
weight w
weight w’
‹#›
This figure shows how a (simulated) neuron interacts with other neurons.
As with (what we know about) biological neural networks, learning consists mainly of modifying weights.
Backpropagation
16.2
-1.34
112.3
-3.3
12.1
wij
2. ….
-0.2
Sample input data:
Actual output:
2.1
– 2.2
Corresponding output:
1. Assign random weights initially.
‹#›
Backpropagation
16.2
-1.34
112.3
-3.3
12.1
wij
2. The error (sum of square differences) is differentiated w.r.t. each weight.
3. Adjust weights.
-0.2
Sample input data:
Actual output:
2.1
– 2.2
Corresponding output:
1. Assign random weights initially.
‹#›
Consider the program that produces the error for a given input/output data pair. It can be considered a function of its weights and can therefore be differentiated with respect to each weight.
TensorFlow Playground Demo
https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.29166&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false
‹#›
The purpose here is to show how a neural net can recognize whether a point in the unit square belongs to the blue set or the orange set. This is based on a given (“training”) set of blue and orange points. The “Playgound” allows you to try various training sets and neural net architectures.
After Training
‹#›
Deep Learning
Neural net architecture
Organized in layers
Each of higher order
Example: text summary
Sentence syntax
Sentence sense
Story sense
‹#›
A major reason that neural nets have been so successful in the second decade of the 21st century is deep learning. This is characterized by multiple layers of neurons and massive amounts of training data. These were tried for many years without much success; however, persistence, various new techniques, the availability of massive data sets, and increased computation power, which we will describe later in the course (in the neural net modules), made success possible.
Architecture (Whole … Parts to Follow)
‹#›
What brought about the dramatic improvement, and the use of hidden layers are suggested by the figure (from the paper).
Demonstration: MNIST
https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb
‹#›
MNIST—from the National Institute of Stadards and Technology—is a standard example for neural nets. The input is a grid of grey-scale numbers and the output is 0, 1, …, or 9.
Summary of Part 1
Machine Learning: applications that learn from data / environment / behavior of agents
Neural Nets learn from voluminous data, typically input/output
‹#›