Data Mining and Machine Learning
Introduction to Artificial Neural Networks
Peter Jančovič Slide 1
Data Mining and Machine Learning
Objectives
Introduce Artificial Neural Networks (ANNs)
Feed-forward ANNs – Multi-Layer Perceptrons (MLPs)
Basic MLP calculations
Geometric interpretation of MLPs
Slide 2
Data Mining and Machine Learning
Artificial Neural Networks
(Artificial) Neural Networks (NNs) offer another approach to data analysis
Popularised in 1980s, resurgence in 2000s
“Machine learning” (or most recently “AI”) often
synonymous with the use of NNs
Inspiration for the basic elements of a NN (artificial neuron) comes from biology, but analogy stops there
ANNs are just a computational device for processing patterns – not “artificial brains”
Slide 3
Data Mining and Machine Learning
Feed-forward Neural Networks Multi-Layer Perceptron – Feed-Forward Neural Network
Input Layer (Input Units)
Artificial neuron
Hidden Layers (Hidden Units)
Slide 4
Output Layer (Output Units) Data Mining and Machine Learning
A simple model of a neuron
Slide 5
Data Mining and Machine Learning
A Simple Artificial Neuron
i1 i2 i3
Basic idea –
– if the input iu to unit u is
w1,u
w2,u
u
w3,u
big enough, the neurone ‘fires’
– Otherwise nothing happens
How do we calculate the input to u?
Slide 6
Data Mining and Machine Learning
Artificial Neurone (2)
i1 i2 i3
Suppose the inputs to units 1, 2 and 3 are i1, i2 and i3 and these are also the outputs o1, o2 and o3
o1 o2
w1,u w2,u w3,u u
o3
Then the input to u is: iowowow
u 1 1,u 2 2,u 3 3,u
In general, for an artificial neuron u that receives input from N units, the input to unit u is:
N iu ow
n n,u n1
Slide 7
Data Mining and Machine Learning
The sigmoid activation function
i1 i2 i3
o1 o2 o3
w1,u w2,u w3,u
The activation function defines the output of a neuron – whether the neuron should “fire”
A typical activation function is the sigmoid function g:
gx ou giu
ou giu Data Mining and Machine Learning
N iu ow
1 1e
n1
n n,u u
kx The output of u is then:
Slide 8
Activation functions
Linear activation function (output equals input):
gx x
Sigmoid activation
function: 1 gx1ekx
The sigmoid is a ‘soft’ threshold function
Sigmoid activation function
Slide 9
Data Mining and Machine Learning
The ‘bias’
As described, the neuron will ‘fire’ only if its input is greater than 0
We can change the value of the point of firing by introducing a bias
This is an additional input unit whose input is fixed at 1
i1 i2 i3 1 w1,u w2,u w3,u wb,u
u
Slide 10
Data Mining and Machine Learning
How the bias works…
According to the sigmoid activation function, the artificial neuron u ‘fires’ if the input to u is greater than or equal to 0
iowowoww0 u 1 1,u 2 2,u 3 3,u b,u
i.e:
But this happens only if
iw iw iw w
1 1,u 2 2,u 3 3,u b,u
Slide 11
Data Mining and Machine Learning
Example (2D)
Suppose u has a sigmoid activation function. Then, for these values of weight, u will ‘fire’ if:
iu 3xy20 i.e. y 3x 2
xy1
3 1
-2
u
Slide 12
Data Mining and Machine Learning
Example (continued)
xuy1u2 1 u2 3
y 3x 2
[2,2]T
3 1 -2 u
2/3
[-2,-2]T
A single artificial neuron defines a linear decision boundary
Slide 13
y 3x 2 Data Mining and Machine Learning
Example (continued)
Assume
– Linear activation functions for units u1, u2 and u3 – Sigmoid activation function for u
Case1: input to u1 is 2 and input to u2 is 2, then: – Input iu to u is 2 × 3 + 2 ×1 + 1 × (-2) = 6
– Hence output ou from u is g(6) = 0.998
Case 2: input to u1 is -2 and input to u2 is -2, then: – Input iu to u is -2 × 3 + -2 ×1 + 1 × (-2) = -10
– Hence output ou from u is g(-10) = 4.54 × 10-5 ≈ 0
Slide 14
Data Mining and Machine Learning
Example 2
xy1 2 -1 -1
u
1/2 -1
Ifiu 2xy10,then y 2x 1
Slide 15
Data Mining and Machine Learning
Combining 2 Artificial Neurons
xy1 xy1
3 1 -2 u1
2
2/3
2 -1 -1 u2
1/2 -1
Slide 16
Data Mining and Machine Learning
Combining neurons – artificial neural networks
u1 x
y u2 -1
1
1
3
u1
-1
2
1
-2
20
u
u2
-2
-20
Slide 17
Data Mining and Machine Learning
Combining neurones
‘firing region’
2
2/3
-1
Slide 18
Data Mining and Machine Learning
Combining neurons
Input to u1 is 3x + y – 2
Input to u2 is 2x – y – 1
When x = 3, y = 0
– Input iu1 to u1 is 7, input iu2 to u2 is 5
– Output ou1 from u1 is 1, output ou2 from u2 is 0.993 – Input iu to u is 1 × 20 + 0.993 × (-20) – 2 = -1.88 – Output ou from u is g(-1.88) = 0.13
Slide 19
Data Mining and Machine Learning
Outputs
i1
i2
ou
3
0
0.13
0.5
2
1.00
0.5
-2
0.00
-1
0
0.06
Slide 20
Data Mining and Machine Learning
Single hidden layer Multi-Layer Perceptron (MLP)
I units in Input layer H x I weight matrix W1
H units in Hidden layer O x H weight matrix W2
O units in Output layer
Slide 21
Data Mining and Machine Learning
Single hidden layer MLP Can characterize arbitrary convex regions
Defines the region using linear decision boundaries
Slide 22
Data Mining and Machine Learning
Two hidden layer MLP
H1 x I weight matrix W1
I units in Input layer H1 units in first hidden layer
H2 x H1 weight matrix W2 H2 units in second hidden layer
O x H2 weight matrix W3 O units in Output layer
Slide 23
Data Mining and Machine Learning
Two hidden layer MLP
An MLP with two hidden layers can characterize arbitrary shapes
First hidden layer characterises convex regions
Second hidden layer combines these convex regions
In theory, there is no advantage in having more than two hidden layers
In practice multiple hidden layer “deep” neural networks give best performance (e.g. Speech recognition)
Slide 24
Data Mining and Machine Learning
Formal definition: MLP with a single hidden layer
A single hidden layer MLP consists of:
1. A set of I input units, and for each input unit i an
activation function gi (typically linear)
2. A set of H hidden units, and for each hidden unit h an
activation function gh (typically sigmoid)
3. A set of O output units, and for each output unit o an
activation function go
4. An H x I weight matrix W1, which maps the outputs of
the input units to the inputs of the hidden units
5. An O x H weight matrix W2, which maps the outputs of the hidden units to the inputs of the output units
Slide 25
Data Mining and Machine Learning
Example
i(i )=0.9 i(i )=-0.5 12
W1
W2
2 unit input layer, linear activation (I = 2)
Single 3 unit hidden layer, sigmoid activation (H = 3)
2 unit output layer, linear activation (O = 2)
A 3 x 2 weight matrix W1 between input and hidden layer
A 2 x 3 weight matrix W2 between hidden and output layer
Slide 26
Data Mining and Machine Learning
Example continued
2.6 1.7
W10.2 1.0,W2
1.0 0.5 1.0
4 . 0 2 . 5 0 . 5 0 . 6 1 . 0
Input 0.9
- 0.5
Outputfrom first layer 0.9 (linear activation)
Slide 27
- 0.5
Data Mining and Machine Learning
Example (continued)
Inputstohidden layer:
i(h)w1 o w1 o 2.60.9(1.7)0.52.340.853.19, 1 111 122
ih w1 o w1 o 0.20.91.0(0.5)0.180.50.32, 2 211 222
ih w1 o w1 o (4.0)0.92.5(0.5)3.61.254.85 3 311 322
In matrix notation :
ihW1o
Outputsfrom hidden layer:
o(h ) 1
oh 2
Slide 28
1 1e3.19
1 1 e0.32
0.96, 0.42,oh 1
0.008. Data Mining and Machine Learning
3 1 e4.85
Example (continued) Inputstotheoutputlayer:
iow2 ohw2 ohw2 oh 1 11 1 12 2 13 3
io 10.960.50.4210.0080.960.210.0080.758. 1
iow2 ohw2 ohw2 oh 2 21 1 22 2 23 3
i o 2 0 . 5 0 . 9 6 0 . 6 0 . 4 2 1 0 . 0 0 8 0 . 4 8 0 . 2 5 4 0 . 0 0 8 0 . 7 4 2 . In matrix notation :
ioW2oh
Linear outputunit activation :
oo 0.758, oo 0.742. 12
Slide 29
Data Mining and Machine Learning
Summary
Introduction to neural networks
Definition of an ‘artificial neurone’
Activation functions – linear and sigmoid
Linear boundary defined by a single neurone
Convex region defined by a one-level MLP
Two-level MLPs
Forward propagation in an MLP (calculation)
Slide 30
Data Mining and Machine Learning