CS计算机代考程序代写 AI decision tree scheme algorithm PowerPoint Presentation

PowerPoint Presentation

Machine Learning Lecture:
Two-Layer Artificial Neural Networks (ANNs)
C.-C. Hung
Slides used in the classroom only

Textbook
In Chapter 18 (section 18.7) page 727 – 737.

Outline
What are ANNs?
Biological Neural Networks
ANN – The basics
Feed forward net
Training
Testing
Example – Voice recognition
Some ANNs
Recurrency
Elman nets
Hopfield nets
Characterizing artificial neural networks

What are Artificial Neural Networks?
Models of the brain and nervous system
Highly parallel
Process information much more like the brain than a serial computer
Learning
Very simple principles
Very complex behaviours
Applications
As powerful problem solvers
As biological models

Biological Neural Nets
Pigeons as art experts (Watanabe et al. 1995)

Experiment:
Pigeon in Skinner box (psychological experiments).
Present paintings of two different artists (e.g. Chagall / Van Gogh).
Reward for pecking when presented a particular artist (e.g. Van Gogh).

*

Chagall and his paintings
Marc Chagall was born Moishe/Marc Shagal in Liozne, near Vitebsk, in modern day Belarus, in 1887. He was a Russian-French-Jewish artist of international repute who, arguably, was one of the most influential modernist artists of the 20th Century, both as an early modernist, and as an important part of the Jewish artistic tradition. He distinguished himself in many arenas: as a painter, book illustrator, ceramicist, stained-glass painter, stage set designer and tapestry maker. Widely admired by both his contemporaries, and by later artists, he forged his creative path in spite of the many difficulties and injustices he faced in his long lifetime.
https://www.marcchagall.net/

Van Gogh and his paintings
Between November of 1881 and July of 1890, Vincent van Gogh painted almost 900 paintings. Since his death, he has become one of the most famous painters in the world. Van Gogh’s paintings have captured the minds and hearts of millions of art lovers and have made art lovers of those new to world of art. The following excerpts are from letters that Van Gogh wrote expressing how he evolved as a painter. There are also links to pages describing some of Vincent van Gogh’s most famous paintings, Starry Night, Sunflowers, Irises, Poppies, The Bedroom, Blossoming Almond Tree, The Mulberry Tree, The Night Café, and The Potato Eaters, in great detail.
https://www.vangoghgallery.com/painting/

*
Canal with Women Washing, 1888

Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on)

Discrimination still 85% successful for previously unseen paintings of the artists

Pigeons do not simply memorize the pictures
They can extract and recognize patterns (the ‘style’)
They generalize from the already seen to make predictions

This is what neural networks (biological and artificial) are good at (unlike conventional computer)

Biological inspiration

Dendrites – Input
Soma (cell body) – Processing Element
Axon – Output

Biological inspiration

synapses

axon
dendrites
The information transmission happens at the synapses (i.e. Weights).

Biological inspiration
The spikes travelling along the axon of the pre-synaptic neuron trigger the release of neurotransmitter substances at the synapse.
The neurotransmitters cause excitation or inhibition in the dendrite of the post-synaptic neuron.
The integration of the excitatory and inhibitory signals may produce spikes in the post-synaptic neuron.
The contribution of the signals depends on the strength of the synaptic connection (i.e. weights).

Architecture of a typical artificial neural network
Weights

Weights

I n p u t S i g n a l s
O u t p u t S i g n a l s

Analogy between biological and artificial neural networks

Non-Symbolic Representations
Decision trees can be easily read
A disjunction of conjunctions (logic)
We call this a symbolic representation
Non-symbolic representations
More numerical in nature, more difficult to read
Artificial Neural Networks (ANNs)
A Non-symbolic representation scheme
They embed a giant mathematical function
To take inputs and compute an output which is interpreted as a categorization
Often shortened to “Neural Networks”
Don’t confuse them with real neural networks (in heads)

ANNs – The basics
ANNs incorporate the two fundamental components of biological neural nets:

Neurons (nodes)
Synapses (weights)

Neuron vs. Node

Structure of a node in ANN
Squashing function limits node output:
Function

Synapse vs. weight

Feed-forward nets
Information flow is unidirectional
Data is presented to Input layer
Passed on to Hidden Layer
Passed on to Output layer
Information is distributed
Information processing is parallel

Internal representation (interpretation) of data

Feeding data through the net:

(1  0.25) + (0.5  (-1.5)) = 0.25 + (-0.75) = – 0.5

Squashing using Sigmoid function:
Feed-forward nets: Output

Data is presented to the network in the form of activations in the input layer
Examples
Pixel intensity (for pictures)
Molecule concentrations (for artificial nose)
Share prices (for stock market prediction)
Data usually requires preprocessing
Analogous to senses in biology
How to represent more abstract data, e.g. a name?
Choose a pattern, e.g.
0-0-1 for “Chris”
0-1-0 for “Becky”

Feed-forward nets: Input

Weight settings determine the behaviour of a network

 How can we find the right weights?
Answer: Training
Feed-forward nets: Weights

Training the Network – Learning

Advantages
It works!
Relatively fast.
Downsides
Requires a training set.
Training can be slow.
Probably not biologically realistic.
Alternatives to Backpropagation
Hebbian learning: Not successful in feed-forward nets.
Reinforcement learning: Only limited success.
Artificial evolution (Genetic Algorithm)
More general, but can be even slower than backpropagation.

Feed-forward nets

Example: Voice Recognition
Task: Learn to discriminate between two different voices saying “Hello”.
Data
Sources
Steve Simpson
David Raubenheimer
Format
Frequency distribution (60 bins)
Analogy: cochlea

Network architecture
Feed-forward network
60 input (one for each frequency bin)
6 hidden
2 output (0-1 for “Steve”, 1-0 for “David”)

Example: Voice Recognition

Results – Voice Recognition

Performance of trained network
Discrimination accuracy between known “Hello”s
100%
Discrimination accuracy between new “Hello”’s
100%

Example: Voice Recognition

Results – Voice Recognition (ctnd.)

Network has learnt to generalize from original data.
Networks with different weight settings can have same functionality.
Trained networks ‘concentrate’ on lower frequencies.
Network is robust against non-functioning nodes.

Example: Voice Recognition

Pattern recognition
Character recognition
Face Recognition
Sonar mine/rock recognition (Gorman & Sejnowksi, 1988)
Navigation of a car (Pomerleau, 1989)
Stock-market prediction
Pronunciation (NETtalk)

(Sejnowksi & Rosenberg, 1987)
Applications of Feed-forward nets

Function Learning of Neural Networks
Map categorization learning to numerical problem
Each category given a number.
Or a range of real valued numbers (e.g., 0.5 – 0.9).
Function learning examples
Input = 1, 2, 3, 4 Output = 1, 4, 9, 16
Here the concept to learn is squaring integers
Input = [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]
Output = 1, 5, 11, 19
Here the concept is: [a, b, c] -> a*c – b
The calculation is more complicated than in the first example.
Neural networks:
Calculation is much more complicated in general.
But it is still just a numerical calculation.

Complicated Example:
Categorizing Vehicles
INPUT INPUT INPUT INPUT
Input to function: pixel data from vehicle images
Output: numbers: 1 for a car; 2 for a bus; 3 for a tank

OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1

So, what functions can we use?
Biological motivation:
The brain does categorisation tasks like this easily.
The brain is made up of networks of neurons.
Naturally occurring neural networks
Each neuron is connected to many others.
Input to one neuron is the output from many others.
Neuron “fires” if a weighted sum S of inputs > threshold.
Artificial neural networks
Similar hierarchy with neurons firing.
Don’t take the analogy too far
Human brains: 100,000,000,000 neurons
ANNs: < 1000 usually ANNs are a gross simplification of real neural networks Recurrent Networks Feed forward networks: Information only flows one way One input pattern produces one output No sense of time (or memory of previous state) Recurrency Nodes connect back to other nodes or themselves Information flow is multidirectional Sense of time and memory of previous state(s) Biological nervous systems show high levels of recurrency (but feed-forward structures exists too) Elman Nets Elman nets are feed forward networks with partial recurrency. Unlike feed forward nets, Elman nets have a memory or sense of time. Classic experiment on language acquisition and processing (Elman, 1990) Task Elman net to predict successive words in sentences. Data Suite of sentences, e.g. “The boy catches the ball.” “The girl eats an apple.” Words are input one at a time Representation Binary representation for each word, e.g. 0-1-0-0-0 for “girl” Training method Backpropagation Hopfield Networks Sub-type of recurrent neural nets Fully recurrent. Weights are symmetric. Nodes can only be on or off. Random updating. Learning: Hebb rule (cells that fire together wire together) Biological equivalent to LTP and LTD. Can recall a memory, if presented with a corrupt or incomplete version.  called auto-associative or content-addressable memory Task: store images with resolution of 20x20 pixels  Hopfield net with 400 nodes Memorize: Present image Apply Hebb rule (cells that fire together, wire together) Increase weight between two nodes if both have same activity, otherwise decrease Go to 1 Recall: Present incomplete pattern Pick random node, update Go to 2 until settled Memories are attractors in state space Recap: Artificial neural networks Inputs Output An artificial neural network is composed of many artificial neurons that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs. General idea 1.1 2.7 3.0 -1.3 2.7 4.2 -0.8 7.1 2.1 -1.2 1.1 0.2 0.3 HIDDEN LAYERS INPUT LAYER NUMBERS INPUT NUMBERS OUTPUT OUTPUT LAYER CATEGORY VALUES PROPAGATE THROUGH THE NETWORK Cat A Cat B Cat C Choose Cat A (largest output value) Value calculated using all the input unit values Artificial Neural Network trying to mimic brain? mathematical model? Axon Dendrite Body Synapse Characterizing Artificial Neural Networks: Three Elements Topology (Architecture) Transfer function (squashing function) Training algorithm (learning algorithm) What is an artificial Neuron? Sum S Transfer function Input to neuron Output from neuron The McCullogh-Pitts model Transfer function is the same as squashing function, activation function. Artificial neurons The McCullogh-Pitts model: spikes are interpreted as spike rates; synaptic strength are translated as synaptic weights; excitation means positive product between the incoming spike rate and the corresponding synaptic weight; inhibition means negative product between the incoming spike rate and the corresponding synaptic weight; Artificial neurons Nonlinear generalization of the McCullogh-Pitts neuron: y is the neuron’s output, x is the vector of inputs, and w is the vector of synaptic weights. Examples: sigmoidal neuron Gaussian neuron Activation/Squashing functions Please note that X is Sum (from previous page) Representation of Information If ANNs can correctly identify vehicles They then contain some notion of “car”, “bus”, etc. The categorisation is produced by the nodes. Exactly how the input reals are turned into outputs But, in practice: Each unit does the same calculation But it is based on the weighted sum of inputs to the unit So, the weights in the weighted sum Is where the information is really stored We draw weights on to the ANN diagrams (see later) “Black Box” representation: Useful knowledge about learned concept is difficult to extract ANN learning problem Given a categorisation to learn (expressed numerically) And training examples/samples represented numerically With the correct categorisation for each example Learn a neural network using the examples which produces the correct output for unseen examples Boils down to (a) Choosing the correct network architecture (topology) Number of hidden layers, number of units, etc. (b) Choosing (the same) function for each unit (transfer function) (c) Training the weights between units to work correctly (training algorithm) Special Cases Generally, we can have many hidden layers In practice, usually only one or two. Next lecture: Look at ANNs with one hidden layer Multi-layer ANNs This lecture: Look at ANNs with no hidden layer Two-layer ANNs Example: Perceptrons Perceptrons Multiple input nodes. Single output node. Takes a weighted sum of the inputs, call this S. Unit function calculates the output for the network. Useful to study because We can use perceptrons to build larger networks. Perceptrons have limited representational abilities We will look at concepts they can’t learn later Squashing Functions Linear Functions Simply output the weighted sum. Threshold Functions Output low values Until the weighted sum gets over a threshold. Then output high values. Equivalent of “firing” of neurons. Step function: Output is +1 if S > Threshold T.
Output is –1 otherwise.
Sigma function:
Similar to step function but differentiable (next lecture).

Step
Function
Sigma
Function

Example: Perceptron
Categorisation of 2 x 2 pixel black & white images
Into “bright” and “dark”
Representation of this rule:
If it contains 2, 3 or 4 white pixels, it is “bright”
If it contains 0 or 1 white pixels, it is “dark”
Perceptron architecture:
Four input nodes, one for each pixel.
One output node: +1 for bright, -1 for dark.

Example: Perceptron
Example calculation: x1=-1, x2=1, x3=1, x4=-1
S = 0.25*(-1) + 0.25*(1) + 0.25*(1) + 0.25*(-1) = 0
0 > -0.1, so the output from the ANN is +1
So the image is categorised as “bright”

Learning in Perceptrons
Need to learn
Both the weights between input and output units.
And the value for the threshold (What is this? Next slide)
Make calculations easier by
Thinking of the threshold as a weight from a special input unit where the output from the unit is always 1.
Exactly the same result
But we only have to worry about learning weights.

New Representation
for Perceptrons
Special Input Node
Always produces 1 (called Bias Neuron)
Threshold function
has become this

Bias Neuron
The bias neuron is a special neuron added to each layer in the neural network, which simply stores the value of 1. This makes it possible to move or “translate” the activation function left or right on the graph.
Without a bias neuron, each neuron takes the input and multiplies it by a weight, with nothing else added to the equation. For example, it is not possible to input a value of 0 and output 2. In many cases, it is necessary to move the entire activation function to the left or right to generate the required output values—this is made possible by the bias.
Although neural networks can work without bias neurons, in reality, they are almost always added, and their weights are estimated as part of the overall model.

Learning Algorithm
Weights are set randomly initially
For each training example E
Calculate the observed output from the ANN, o(E)
If the target output t(E) is different to o(E)
Then tweak all the weights so that o(E) gets closer to t(E)
Tweaking is done by perceptron training rule (next slide)
This routine is done for every input example E

Don’t necessarily stop when all examples used
Repeat the cycle again (called an ‘epoch’)
Until the ANN produces the correct output
For all the examples in the training set (or good enough)

Perceptron Training Rule
When t(E) is different to o(E)
Add on Δi to weight wi
Where Δi = η(t(E)-o(E))xi
Do this for every weight in the network
η is the learning rate (between 0.0 and 1.0) and xi is input.
Interpretation:
(t(E) – o(E)) will either be + or –
So we can think of the addition of Δi as the movement of the weight in a direction
Which will improve the networks performance with respect to E
Multiplication by xi
Moves it more if the input is bigger

The Learning Rate
η is called the learning rate
Usually set to something small (e.g., 0.1) (can be negative)
Learning rate: to control the movement of the weights
Not to move too far for one example
Which may over-compensate for another example
If a large movement is actually necessary for the weights to correctly categorise E
This will occur over time with multiple epochs

Worked Example
Return to the “bright” and “dark” example
Use a learning rate of η = 0.1
Suppose we have set random weights:

Worked Example
Use this training example, E, to update weights:

Here, x1 = -1, x2 = 1, x3 = 1, x4 = -1 as before
Propagate this information through the network:
S = (-0.5 * 1) + (0.7 * -1) + (-0.2 * +1) + (0.1 * +1) + (0.9 * -1) = -2.2
Hence the network outputs o(E) = -1
But this should have been “bright”=+1
So t(E) = +1

Calculating the Error Values
Δ0 = η(t(E)-o(E))x0

= 0.1 * (1 – (-1)) * (1) = 0.1 * (2) = 0.2
Δ1 = η(t(E)-o(E))x1

= 0.1 * (1 – (-1)) * (-1) = 0.1 * (-2) = -0.2
Δ2 = η(t(E)-o(E))x2

= 0.1 * (1 – (-1)) * (1) = 0.1 * (2) = 0.2
Δ3 = η(t(E)-o(E))x3

= 0.1 * (1 – (-1)) * (1) = 0.1 * (2) = 0.2
Δ4 = η(t(E)-o(E))x4

= 0.1 * (1 – (-1)) * (-1) = 0.1 * (-2) = -0.2

Calculating the New Weights
w’0 = -0.5 + Δ0 = -0.5 + 0.2 = -0.3

w’1 = 0.7 + Δ1 = 0.7 + -0.2 = 0.5

w’2 = -0.2 + Δ2 = -0.2 + 0.2 = 0

w’3= 0.1 + Δ3 = 0.1 + 0.2 = 0.3

w’4 = 0.9 + Δ4 = 0.9 – 0.2 = 0.7

New Look Perceptron
Calculate for the example, E, again:
S = (-0.3 * 1) + (0.5 * -1) + (0 * +1) + (0.3 * +1) + (0.7 * -1) = -1.2
Still gets the wrong categorisation
But the value is closer to zero (from -2.2 to -1.2)
In a few epochs time, this example will be correctly categorised

Learning Abilities
of Perceptrons
Perceptrons are a very simple network.
Computational learning theory
Study of which concepts can and can’t be learned
By particular learning techniques (representation, method)
Minsky and Papert’s influencial book
Showed the limitations of perceptrons.
Cannot learn some simple boolean functions
Caused a “winter” of research for ANNs in AI
People thought it represented a fundamental limitation
But perceptrons are the simplest network
ANNS were revived by neuroscientists, etc.

Boolean Functions
Take in two inputs (-1 or +1)
Produce one output (-1 or +1)
In other contexts, use 0 and 1

Example: AND function
Produces +1 only if both inputs are +1
Example: OR function
Produces +1 if either inputs are +1
Related to the logical connectives from the first order logic (F.O.L).

Boolean Functions as Perceptrons
Problem: XOR boolean function
Produces +1 only if inputs are different
Cannot be represented as a perceptron
Because it is not linearly separable

Linearly Separable: Boolean Functions
Linearly separable:
Can use a line (dotted) to separate +1 and –1
Think of the line as representing the threshold
Angle of line determined by two weights in perceptron
Y-axis crossing determined by threshold

Remember: You are smart. Why?
XOR function: 2 – 3 neurons.
Fruit fly’s brain: around 200,000 neurons.
Firefly-inspired algorithms (some intelligence).
Human beings: Billions of neurons.

Therefore, you are potentially an “A” student.

XOR function
To implement the XOR function only needs 2 neurons, however, this is not perceptron.

XOR function with 2 neurons
XOR
Please note that: 1.5 and 0.5 inside boxes are thresholds
X Y XOR
0 0 0
0 1 1
1 0 1
1 1 0

XOR function with 3 neurons
X Y XOR
0 0 0
0 1 1
1 0 1
1 1 0

Perceptron learning algorithm (The binary version)
Step 1: Apply input X and calculate output y, then apply threshold operation. (note that X is a vector)
Step 2:
a) If y is correct, change nothing and go to step 1.
b) If y is incorrect and is zero, add each input to its corresponding weight or
c) If y is incorrect and is one, subtract each input from its corresponding weight.
Step 3: go to step 1.

Example for perceptron learning
Example: Learning OR function; representing logical OR with initial weights set randomly to: w1 = 1, w2 = 0, T (threshold) = 1
Steps 1 and 2: Apply input and calculate output
w1 w2 input input output Correct (Y/N)
1 0 0 0 0 Y
“ “ 0 1 0 N, so add 0 to W1 and 1 to W2
1 1 1 0 1 Y
“ “ 1 1 1 Y
“ “ 0 0 0 Y
“ “ 0 1 1 Y
“ “ 1 0 1 Y
“ “ 1 1 1 Y

Perceptron learning algorithm (The continuous version)
Step 1: Apply input x and calculate output y.
Step 2: W(n + 1) = W (n) + Δi (Δi = η(t(x)-o(x))xi)
t(x)-o(x) = 0
t(x)-o(x) > 0
t(x)-o(x) < 0 Step 3: go to step 1. Exercise: Use the continuous version for any Boolean functions. Recap – Neural Networks Components – biological plausibility Neurone / node Synapse / weight Feed forward networks Unidirectional flow of information Good at extracting patterns, generalisation and prediction Distributed representation of data Parallel processing of data Training: Backpropagation Not exact models, but good at demonstrating principles Recurrent networks Multidirectional flow of information Memory / sense of time Complex temporal dynamics (e.g. CPGs) Various training methods (Hebbian, evolution) Often better biological models than FFNs Summary Artificial neural networks are inspired by the learning processes that take place in biological systems. Artificial neurons and neural networks try to imitate the working mechanisms of their biological counterparts. Learning can be perceived as an optimisation process. Biological neural learning happens by the modification of the synaptic strength. Artificial neural networks learn in the same way. The synapse strength modification rules for artificial neural networks can be derived by applying mathematical optimisation methods. Summary Learning tasks of artificial neural networks can be reformulated as function approximation tasks. Neural networks can be considered as nonlinear function approximating tools (i.e., linear combinations of nonlinear basis functions), where the parameters of the networks should be found by applying optimisation methods. The optimisation is done with respect to the approximation error measure. In general it is enough to have a single hidden layer neural network (such as multi-layer feedforward back-propogation networks) to learn the approximation of a nonlinear function. In such cases general optimisation can be applied to find the change rules for the synaptic weights. Key Concepts Neurons Topology Transfer functions (i.e. Squashing functions) Training algorithms (i.e. Learning algorithms) Perceptron Implement simple Boolean functions using perceptrons References Simon Haykin, Neural Networks: A Comprehensive Foundation, IEEE Press, 1994 Exercises and Examples Design a perceptron which can perform logic AND function using Perceptron learning algorithm (The binary version). Epoch Input (x, y) Desired Output Weights (w1, w2) Actual Output Error Adjusted Weights 1 2 3 4 Exercises and Examples Design a perceptron which can perform logic OR function using Perceptron learning algorithm (The binary version). Epoch Input (x, y) Desired Output Weights (w1, w2) Actual Output Error Adjusted Weights 1 2 3 4 Questions & Suggestions? The End * Appendix Linearly Separable Functions Result extends to functions taking many inputs And outputting +1 and –1 Also extends to higher dimensions for outputs More on activation functions More on activation functions In pu t La ye r Ou tp ut La ye r Mi ddl e La ye r Bi ol og ic al Ne ur al Ne tw or k A rt ific ia l Ne ur al Ne tw or k So ma De nd ri te Ax on Sy na ps e Ne ur on In pu t Ou tp ut We ig ht 0.3775 1 1 5 . 0 = + e ) , ( w x f y = 2 2 2 || || 1 1 a w x a x w e y e y T - - - - = + = St ep fu nc ti on Si gn fu nc ti on +1 -1 0 +1 -1 0 X Y X Y 1 1 -1 0 X Y S i g m o i d f u n c t i o n -1 0 X Y Li ne ar fu nc ti on î í ì < ³ = 0 if , 0 0 if , 1 X X Y st ep î í ì < - ³ + = 0 if , 1 0 if , 1 X X Y si gn X si gm oi d e Y - + = 1 1 X Y lin ea r =