Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Artificial Neural Networks
MAST90083 Computational Statistics and Data Mining
Karim Seghouane
School of Mathematics & Statistics The University of Melbourne
Artificial Neural Networks 1/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Outline
§i. Introduction
§ii. The McCulloch-Pitts Neuron
§iii. The Single-Layer Preceptrons
§iv. Feedforward Single-Layer Networks §v. Multilayer Perceptron
Artificial Neural Networks 2/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Introduction
Artificial neural networks (ANNs) were influenced by the field of artificial intelligence which seeks to answer questions such as:
How do humans solve problems ?
What makes the human brain such a formidable machine in
processing cognitive thought ?
What is the nature or definition of this thing called “intelligence” ?
Artificial Neural Networks 3/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Introduction
These questions of“mind”and“intelligence”form the essence of“cognitive science”
Cognitive science: is a discipline that focuses on the study of interpretation and learning
Interpretation: deals with the thought process resulting from exposure to the senses of some type of input (music, speech, scientific manuscript, computer program,….)
Learning: deals with questions of how to learn from knowledge accumulated by studying examples having certain characteristics.
Artificial Neural Networks 4/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Introduction
There are many different theories and models for how the mind and brain work
One such theory is“connectionism”, analogues to neurons and their connections
Together with concepts of neuron firing, activation functions and the ability to modify those connections to form algorithms for artificial neural networks
Artificial Neural Networks 5/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Introduction
This formulation introduces a relationship between the three notions of mind, brain and computation
where information is processed by the brain through massively parallel computations (simultaneously)
unlike standard serial computations, which carry out one instruction at a time in sequential fashion
Artificial Neural Networks 6/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Introduction
Sophisticated types of ANN have for example been used to model human ability to learn a language
As an overly simplified model of the neuron activity in the brain ANN were originally designed to mimic brain activity
Now, ANN are treated more abstractly, as a network of highly interconnected nonlinear computing elements
ANN are now used to solve problems of pattern classification and prediction (examples include speech recognition, handwritten character recognition and face recognition)
The common features to all these problems are high-dimensional data and large sample sizes
Artificial Neural Networks 7/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Brain as a neural network
The largest part of the brain is the cerebral cortex, which consists of a vast network of interconnected cells called neurons.
Neurons are elementary nerve cells that form the building blocks of the nervous system
In the human brain, for example there are about 10 billion neurons of more than hundred different types as defined by their size, shape and neurochemicals they produce
Artificial Neural Networks 8/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Brain as a neural network
Schematic diagram of a biological neuron
Artificial Neural Networks 9/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Biological neuron
The cell body of a typical neuron contains its nucleus and two types of processes (or projections): dendrites and axons.
The neuron receives signals from other neurons via its many dendrites which operates as input devices
Each neuron has a single axon, a long fiber that operates as an output device
The end of the axon branches into strands and each strand terminates in a synapse
Each synapse may either connect to a synapse on a dendrite or cell body of another neuron or terminate into muscle tissue.
Artificial Neural Networks 10/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Biological neuron
A neuron maintains on a verage about a thousand synaptic connections with other neurons (whereas some may have 10-20 thousand such connections)
The entire collection of neurons in the brain yields a rich network of neural connections
Under appropriate conditions, an activated neuron fires an electrical pulse (called action potential or spike) of fixed amplitude and duration
The brain“learns”by changing the strengths of the connections between neurons or by adding or removing such connections.
Artificial Neural Networks 11/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The McCulloch-Pitts Neuron
This is a simplified abstraction of the process of neuron activity in the human brain
The inputs are denoted by x1, …, xr and each has a value of either 0 (off) or 1 (on).
The signal at each input connection depends upon whether the synapses is excitatory or inhibitory.
If any one of the synapses is inhibitory and transmits the value 1, the neuron is prevented from firing (the output is 0).
If no inhibitory synapses is present, the inputs are summed to produce the total excitation U = ri=1 xi
If U ≥ θ the output is 1 (fires), otherwise the output y is 0 and the neuron doesn’t fire
Artificial Neural Networks 12/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The McCulloch-Pitts Neuron
The output corresponds to I[U−θ≥0]
If θ ≥ r, the number of inputs, the neuron will never fire
If θ = 0 and there are no inhibitory synapses, the output will always have the value 1
Artificial Neural Networks 13/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Brain as a neural network
The McCulloch-Pitts Neuron is usually referred to as a threshold logic unit (TLU)
It is designed to compute simple logical functions of r arguments where y = 1 corresponds to true and y = 0 to false.
Example of logical functions AND and OR for three inputs
Artificial Neural Networks 14/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Logic functions
The AND and OR functions form the basis set of logical functions.
Other logical functions can be computed by building up large networks consisting of several layers of McCulloch-Pitt neurons
Problem There are no adjustable parameters or weights in the network
Artificial Neural Networks 15/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The Single-Layer Perceptrons
A perceptron is a McColloch-Pitt neuron, but with input xi equipped with a real-valued connection weight βi , i = 1, …, r .
Artificial Neural Networks 16/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The Single-Layer Perceptrons
A weighted sum of the input values U = ri =1 βi xi and the output y = 1 only if U ≥ θ, where θ is the threshold; otherwise y = 0.
The threshold can be converted by introducing a bias element β0 = −θ so that U − θ = β0 + U and then comparing U=ri=0βjxj to0withx0 =1.
IfU≥0,theny=1;otherwisey=0.
Artificial Neural Networks 17/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Feedforward Single-Layer Networks
In a feedforward network, information flows in one direction only from input nodes to output nodes.
The network nodes are organized into two separate groups: r inputnodesx1,…,xr andsoutputnodesy1,…,ys
Only the output nodes involve significant amounts of computation
The input involve no computation and hence do not count as a layer of learnable nodes.
Artificial Neural Networks 18/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Feedforward Single-Layer Networks
Every connection xj → yl between the input nodes and the output nodes carries a connection weight βjl which identifies the strength of the connection
Positive weights represent excitory signals, negative weights represent inhibitory signals and zero weight represent connections that do not exist in the network.
The architecture of the network consists of the nodes, the directed edges and the connection weights.
Artificial Neural Networks 19/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Activation functions
Let X = (X1, …, Xr )⊤ represents a random-vector of input and x = (x1, …, xr )⊤ a realization of X.
Given x, each output node computes an activation value using
r
Ul =β0l +βjlxj =β0l +x⊤βl
j=1
where β0l is a constant or bias related to the threshold for the
neuron to fire and
βl = (β1l , …, βrl )⊤ is an r-vector of connection weights l = 1,…,s.
Artificial Neural Networks 20/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Activation functions
In matrix form
U = β0 + BX
U = (U1,…,Us)⊤, β0 = (β01,…,β0s) is an s−vector of biases andB=(β1,…,βs)⊤ isan(s×r)matrixofconnection weights.
The activation values are then filtered through a nonlinear threshold activation function f (Ul ) to form the values of the lth output node, l = 1, …, s
f (U) = f (β0 + BX)
where f (U) = f (f (U1),…,f (Us))⊤ is an s−vector function. The simplest form of f is the identity function f (u) = u.
Artificial Neural Networks 21/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Activation functions
Example of activation functions
Artificial Neural Networks 22/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Activation functions
The most widely used functions the sigmoidal (S-shapped) functions such as the logitic and hyperbolic tangent functions
Artificial Neural Networks 23/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Activation functions
A sigmoidal function σ(.) has the following properties σ(u) → 0 as u → −∞ and σ(u) → 1 as u → +∞
A sigmoidal function σ(.) is symmetric if σ(+u) + σ(−u) = 1 and asymmetric if σ(+u) + σ(−u) = 0
The logistic function is symmetric whereas the tanh function is asymmetric.
Iff(u)=(1+e−u)−1,
∂f (u) e−u
∂u = (1+e−u)2 =f(u)(1−f(u))
.
Artificial Neural Networks 24/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Rosenblatt’s Single-Unit Perceptron
In binary classification problem: each of the n input vectors x1,…,xn is to be classified as a member of one of the two classes Π1 or Π2.
A single-layer feedforward neural network consisting of only a single output node can be used for this type of application
A single-unit perceptron (Rosenblatt) is a single-layer feedforward network with a single output node that computes a linear combination of the input variables β0 + x⊤β and deliver its sign
signβ0 +x⊤β
which is essentially the threshold logic unit (threshold taken as zero).
Artificial Neural Networks 25/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Rosenblatt’s Single-Unit Perceptron
A generalization version of the single-unit perceptron is
f β0 + x⊤β
where f (.) is an activation function, which is usually sigmoidal.
Artificial Neural Networks 26/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The Perceptron learning rule
Assume x1, …, xn are independent realizations of X and drwan from two classes Π1 and Π2.
Assume the observations are linearly separable → there is a β∗ such that x⊤β∗ = 0 separates the data x1, …, xn.
the update rule is a gradient-descent algorithm which operates sequentially on each input vector → online learning (correct classification errors as they occur)
The input vectors are examined one at a time and classified to one of the two classes.
Artificial Neural Networks 27/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The Perceptron learning rule
The algorithm proceeds by relabeling the {xi } one at a time, so that at the hth iteration we are dealing with xh , h = 1, 2, …
If βh correctly classifies xh, we do not change βh, βh+1 = βh (ifeitherx⊤hβh ≥0,xh ∈Π1 orx⊤hβh <0,xh ∈Π2)
If βh missclassifies xh, the connection weight vector are updated as follows
if x⊤h βh ≥ 0 but xh ∈ Π2 then βh+1 = βh − ηxh
if x⊤h βh < 0 but xh ∈ Π1 then βh+1 = βh + ηxh where η is the learning rate parameter η = 1
Artificial Neural Networks 28/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
The Perceptron learning rule
If a solution vector β∗ exists, the algorithm will find that solution in a finite number hmax of iteration.
The perceptron can learn to distinguish two classes only if the classes are linearly separable. This is however not always possible.
Artificial Neural Networks 29/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Artificial intelligence and expert systems
The inconvenient of the perceptron led to the introduction of rule-based expert systems as the main areas of research into AI.
In early AI systems, problems were solved in a sequential step-by-step fashion using a set of rules /knowledge available on a particular subject.
Expert systems are knowledge-based systems, where “knowledge”represents a repository of data, well-known facts,
specialized information, and heuristics, which experts in a field (e.g., medicine) would agree upon. Such expert systems are interactive computer programs that provide users (e.g., physicians) with computer-based consultative advice.
Expert systems were successful only in specialized situations and were not able to learn from their own experiences
Artificial Neural Networks 30/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Artificial intelligence and expert systems
Expert systems didn’t“posses”cognition which was the primary goal of AI.
The failure was due to the fact that it was not delivering a realistic model of the structure of the brain
Whereas human brains consisted of massively parallel systems of neurons, AI digital computers were serial machines; overall, they were incredibly slow by comparison.
Artificial Neural Networks 31/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Multilayer Perceptron
The limitation of the perceptron could be overcome by“layering” the perceptrons and applying nonlinear transformations prior to combining the transformed weights inputs
Artificial Neural Networks 32/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Multilayer Perceptron
A multilayer feedforward neural network (perceptron) is a multivariate statistical technique that maps the inputs
X = (X1, ..., Xr )⊤ nonlinearly to the output variables
Y = (Y1,...,Ys)⊤.
Between the input and outputs variables there are hidden variables arranged in layers
The hidden output variables are traditionally called nodes, neurons or processing units.
ANN can be used to model repression or classification problems.
Artificial Neural Networks 33/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Network architecture
A multilayer perceptron have r input nodes x1, ..., xr
One or more layers of hidden nodes →“hidden layers” s output nodes y1, ..., ys
If there is a single hidden layer the network is a“two-layer” network
If there are L hidden layers, the network is described as being an (L + 1)−layer network
There are fully connected networks and partially connected networks
Artificial Neural Networks 34/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
A single hidden layer
Assume we have a two-layer network with r input nodes xi,, i = 1,...,r
A single layer (L = 1) of t hidden nodes (zj,j = 1,...,t) s output nodes yk,k = 1,...,s
βmj are the weight of the connection xm → zj with bias β0j zj=fj(Uj)whereUj=β0j+x⊤βj, j=1,...,t
αjk be the weight of the connection zj → yk with bias α0k
μk(x)=gk(vk)wherevk=α0k+z⊤αk, k=1,...,s here βj = (β1j,...,βrj)⊤ and αk = (α1k,...,αtk)⊤.
Artificial Neural Networks 35/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
A single hidden layer
The value of the kth output node can be expressed as yk =μk(x)+εk
t r μk(x)=gkα0k+αjkfj β0j+βmjxm
j=1 m=1
where k = 1,...,s, fj(.),j = 1,...,t and gk(.),k = 1,...,s are activation functions for the hidden and output layers of nodes, respectively.
The activations {fj (.)} are usually taken to be nonlinear continuous functions with sigmoidal shape (logistic or tanh)
The functions {gk(.)} are often taken to be linear (regression) or sigmoidal (classification).
Artificial Neural Networks 36/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
A single hidden layer
The error term can be taken as N(0,σ2).
Artificial Neural Networks 37/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Example
Let s = 1, so that we have a single output node
Assume that all hidden nodes in the single hidden layer have
the same sigmoidal activation function σ(.)
We further take the output activation function g(.) to be
linear
Then the network output reduces to y = μ(x) + ε, where
tr μ(x)=α0+αjσ β0j +βmjxm
j=1 m=1
and the network is equivalent to a single layer perceptron.
If alternatively both f (.) and g (.) are linear then we just have a linear combination of the inputs.
Artificial Neural Networks 38/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Approximation of continuous functions
The use of neural networks is motivated by Kolmogorov universal approximation theorem which states
Any continuous function defined on a compact subset of Rr can be uniformly approximated (in an appropriate metric) by a function of the form μ(x).
In other words, we can approximate a continuous function by a two-layer network incorporating a single hidden layer, with a large number of hidden nodes of continuous sigmoidal nonlinearities, linear output units, and suitable connection weights.
Furthermore, the closer the approximation desired, the larger the number of hidden nodes required.
This result doesn’t specify how to find that approximation: determine the weights, the number of t nodes
Artificial Neural Networks 39/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
More than one hidden layer
Matrix notation of the one layer neural network is given by μ(X)=g(α0 +Af(β0 +BX))
B = (βij) is a (t × r) matrix of weights between the input nodes and the hidden layer
A = (αjk ) is a (s × t) matrix of weights between the hidden layer and the output layer
β0 = (β01,...,β0t)⊤ and α0 = (α01,...,α0s)⊤
f = (f1,...,ft)⊤ and g = (g1,...,gs)⊤ are the vectors of nonlinear activation functions
Artificial Neural Networks 40/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
More than one hidden layer
when {fj} and {gj} are taken to be the identity function we μ(X) = μ + ABX
where μ = α0 + Aβ0
we could use the matrix C = AB in a single layer network and
the result will be the same
The results change only when we use nonlinear activation functions at the hidden nodes.
A network with r input nodes, a single hidden layer with t nodes, s output nodes, and sigmoidal activation functions at the hidden nodes can be viewed as a nonlinear generalization of multivariate reduced-rank regression.
Artificial Neural Networks 41/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Learning Criteria
The (st + rt + t + s) parameter vector θ is obtained via the minimization of
n ESS(θ)=∥yi − ̃yi∥2 i=1
For multiclass classification problems
n e v i , k
yi,k logy ̃i,k y ̃i,k = l evi,l
where yi,k = 1 if xi ∈ Πk and zero otherwise and
vi,k = α0,k + z⊤i αk is the value of vk for the ith input vector xi.
E(θ) = −
i=1
Artificial Neural Networks 42/43
Introduction The McCulloch-Pitts Neuron The Single-Layer Perceptrons Feedforward Single-Layer Networks Multilayer Perce
Projection pursuit repression
The regression function is taken to be t
or
μ(x)=α+fβ +x⊤β 0 j0j j
j=1
t
μ(x)=α+αfβ +x⊤β 0 jj0j j
j=1
Artificial Neural Networks 43/43