Autoencoders and their Applications
COMP90073 Security Analytics
, CIS Semester 2, 2021
Copyright By PowCoder代写 加微信 powcoder
• IntroductiontoNeuralNetworks
• GradientDecentLearning
• Autoencodersandtheirarchitectures • DenoisingAutoencoder(DAE)
• VariationalAutoencoder(VAE)
COMP90073 Security Analytics © University of Melbourne 2021
Artificial Neural Networks
• Acollectionofsimple,trainablemathematicalunitsthatcollectivelylearn complex functions
• Givensufficienttrainingdataanartificialneuralnetworkcanapproximatevery complex functions mapping raw data to output decisions
COMP90073 Security Analytics © University of Melbourne 2021
Artificial Neural Networks
COMP90073 Security Analytics © University of Melbourne 2021
Artificial Neural Networks
COMP90073 Security Analytics © University of Melbourne 2021
Artificial Neural Networks
COMP90073 Security Analytics © University of Melbourne 2021
Types of Deep Neural Networks (DNNs)
1. DNN – all fully connected layers
COMP90073 Security Analytics © University of Melbourne 2021
Types of Deep Neural Networks (DNNs)
2. CNNs (Convolutional Neural Networks) – some convolutional layers
3. RNNs (Recurrent Neural Networks) – LSTM
COMP90073 Security Analytics © University of Melbourne 2021
Fundamentals of Neural Networks
• Receivesignalsfrominputneurons:𝑥!,𝑥”,…,𝑥#
• Weightsignalsaccordingtothelinkstrengthbetweenneurons:
𝑤!𝑥!, 𝑤”𝑥”, … , 𝑤#𝑥#
• Addtheinputsignalsandbias:𝑤 𝑥 ,𝑤 𝑥 ,…,𝑤 𝑥 +𝑏=∑# 𝑤𝑥 +𝑏 !!””## $%!$$
• Emitanoutputsignal:activationfunction𝑓
Activation Function
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑓(( 𝑤!𝑥! + 𝑏) !”#
COMP90073 Security Analytics © University of Melbourne 2021
Activation Functions
• Activationfunctionsaddnon-linearitytoournetwork’sfunction • Mostreal-worldproblems+dataarenon-linear
Common Activation Functions:
COMP90073 Security Analytics © University of Melbourne 2021
Fundamentals of Neural Networks
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 2×0.1 + 3×0.5 + −1 ×2.5 + 5×0.2 + 1×3 = 0.96
COMP90073 Security Analytics © University of Melbourne 2021
Functional Form of Neural Networks
Feed-Forward pass:
𝐻! = 𝑓!(𝑋!𝑤!,! + 𝑋”𝑤!,’)
𝐻” = 𝑓!(𝑋!𝑤!,” + 𝑋”𝑤!,() 𝑂! = 𝑓”(𝐻!𝑤”,! + 𝐻”𝑤”,’) 𝑂” = 𝑓”(𝐻!𝑤”,” + 𝐻”𝑤”,()
COMP90073 Security Analytics © University of Melbourne 2021
Training a Neural Network
• Findasetofweightssothatthenetworkexhibitsthedesiredbehaviour • Example:catvs.dog
Output Label
0.8,0.7 1,1 0.08,0.6 0,1 0.92,0.01 1,0 0.02, 0.0 0,0
(Cat & dog) (Dog) (Cat)
(No cat or dog)
COMP90073 Security Analytics © University of Melbourne 2021
Error Function
• Measurethedifferencebetweenactualoutputandexpectedoutput • Onepopularmeasure:sumofsquarederror
𝐸 𝑖𝑛𝑝𝑢𝑡, 𝑤𝑒𝑖𝑔h𝑡, 𝑙𝑎𝑏𝑒𝑙 = ∑ 𝑜𝑢𝑡𝑝𝑢𝑡 − 𝑙𝑎𝑏𝑒𝑙 ”
• Note:Neuralnetworkisacomposite/nestedfunctionthatmaptheinputtothe output.
𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑓**(𝑖𝑛𝑝𝑢𝑡, 𝑤𝑒𝑖𝑔h𝑡𝑠)
COMP90073 Security Analytics © University of Melbourne 2021
Gradient Decent Learning
• Atrainingexampleisavectorofinputsandthedesirableoutput(s),i.e., 𝑥#,𝑥%,…,𝑥$ , 𝑡#,𝑡%,…,𝑡& ,𝑡! = 1 iff the data point
𝑥#,𝑥%,…,𝑥$ ,belongstothei-thclass.
• Objective:findingtheweights𝑤thatminimisethedifferencebetween𝑡and𝑜
(predicted output) for each of our training inputs.
• DefineanerrorfunctionEtobethesumofsquarederrors.
𝐸 = 12 ? 𝑜 + − 𝑡 + ” +%!
• IfwethinkofEasheight,itdefinesanerrorlandscapeontheweightspace. The aim is to find a set of weights for which E is very low.
• Thisisdonebymovinginthesteepestdownhilldirection(Gradientdescent),
i.e., − ,-
,.& 𝑤” ← 𝑤” − 𝜂 𝜕𝐸 𝜕𝑤”
Weight update rule 𝜂 : Learning rate
COMP90073 Security Analytics © University of Melbourne 2021
Example Error Landscape
• Anexampleerrorlandscapeforgradientdescentsearchinweightspace E
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of Gradient Descent Learning
Negative gradient points towards a local minimum.
COMP90073 Security Analytics © University of Melbourne 2021
Weight update rule
To derive the weight update rule, we need to remember the chain rule from differential calculus:
if 𝑦 = 𝑦(𝑢) and 𝑢 = 𝑢(𝑥)
!$ then ‘ =(‘)(⁄)
So,ifE=%&∑ 𝑜−𝑡 &,𝑜=∑𝑤’𝑥’ +𝑏,andt=trueoutput
𝜕 𝐸 = 𝜕 12 𝑜 − 𝑡 & 𝜕 𝑜
Normally we include a learning rate parameter 𝜂 to control the update of weights in a stable manner
𝑤’ ← 𝑤’ − 𝜂 𝑜 − 𝑡 𝑥
We repeatedly update the weights based on each example until the weights converge
= 𝑜−𝑡 =𝑜−𝑡𝑥
COMP90073 Security Analytics © University of Melbourne 2021
Adjusting Learning Rate
• Learning rate parameter η is a small value (usually 0.0001 − 0.1) to control the update of weights in a stable manner.
COMP90073 Security Analytics © University of Melbourne 2021
Minimizing Error by Gradient Descent
• Theiterativealgorithmmightconvergetooneofthemanylocalminima
COMP90073 Security Analytics © University of Melbourne 2021
Backpropagation Algorithm
Backpropagation(network, training data 𝐷, label T) Initialise the weights w to small random values Repeat
For each 𝑑 ∈ 𝐷,𝑑 =< 𝑥%,...𝑥*,𝑡%,...,𝑡+ >
Forward pass: calculate hidden 𝐻’ and output 𝑂, values Backward pass:
Calculate error between 𝑜- and 𝑡- at output Update the weights in each layer 𝑤’,,
(in proportion to their effect on the error using 𝑤!,! gradientdescent𝑤$ ←𝑤$ −𝜂 𝑜−𝑡 𝑥) 𝑤!,#
Forward pass
Until network has converged 𝑤!,$ 𝑤$,#
Backward pass
COMP90073 Security Analytics © University of Melbourne 2021
Network Architectures/Parameters
• Numberofinputnodes=numberoffeatures
• Numberofoutputnodes=numberofclasses
– Neuralnetworkscandealnaturallywithmulti-classclassification problems (compare to SVM?).
• Numberofhiddenlayers
• Numberofnodesineachhiddenlayers
• Learningrate
• Regularizationparameters(similartoCinSVM):controlthecomplexityofthe model, preventing overfitting.
Live Demo: http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-
plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.13820&showT estData=false&discretize=false&percTrainData=50&x=true&y=true&x OMP90073 Security Analytics © University of Melbourne 2021
Autoencoder [1]
• A a neural net which aim is to take an input 𝑥 ∈ R/and reproduce it 𝑥B ∈ R/.
• To make this non-trivial, we need to add a bottleneck layer h ∈ R- whose
dimension is much smaller than the input, 𝑘 ≪ 𝑑. • Architecture:
𝑥 G! h H” 𝑥#
Encoder Decoder
• Parameters are obtained using backpropagation
• Demo: https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html
– Encoder: h = 𝑓0 𝑥
– Decoder: 𝑥B = 𝑔1 h
= 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑥 + 𝑏) = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊L h + 𝑏M)
– Often use tied weights, 𝑊L = 𝑊2
• Minimize a loss function
L𝜃,𝜙=%∑* 𝑥−𝑔𝑓𝑥 &
& ‘3% ‘ 1 0 ‘
COMP90073 Security Analytics © University of Melbourne 2021
Using AEs Unsupervised Problems
Only input vectors 𝑥!, 𝑥”, … , 𝑥# are available, not corresponding labels. • Anomalydetection
• Extractinginterestinginformationfromdata
– Datacompression(dimensionreduction) – Clustering
– Visualisation
• Representationslearning
COMP90073 Security Analytics © University of Melbourne 2021
Using Autoencoder for Anomaly Detection
• Trainanautoencoderonnormal data
• Learnthedistributionofnormaldat
• 𝜇:meanoferrorintrainingdata
• 𝜎:standarddeviationoferrorin training data
• Identifyinganomalieswith𝟑𝝈rule: 𝑥/#01/23 − 𝑥O/#01/23 ” ≥ 𝜇 + 3σ
– Generatehigherrorfor anomalies.
High error
COMP90073 Security Analytics © University of Melbourne 2021
Using Autoencoder for Anomaly Detection – Example
• Trainsamples
Sample 𝒙 𝒙6
1 30, 12, 85 26, 14, 78
2 22, 18, 83
3 32, 21, 68
• Testsamples
2 19, 28, 63
25, 13, 89
28, 18, 74
Sample 𝒙 𝒙6 t𝐭
1 32, 16, 81 29, 12, 79
27, 16, 88
COMP90073 Security Analytics © University of Melbourne 2021
Using Autoencoder for Anomaly Detection – Example
• Trainsamples
Sample 𝒙 𝒙6 Error
1 30, 12, 85 26, 14, 78 8.3
2 22, 18, 83
3 32, 21, 68
• Testsamples
2 19, 28, 63
25, 13, 89 8.4
28, 18, 74 7.8
8.2 + 3×0.26 = 8.98
Sample 𝒙 𝒙6 Error
1 32, 16, 81 29, 12, 79 5.3
27, 16, 88 28.5
COMP90073 Security Analytics © University of Melbourne 2021
Data Compression
• Objective:Findawaytoappropriately compress our input into a useful “bottleneck” vector of smaller dimensionality (encoder).
– LearnsaLossyCompressionofthe input data.
• Specialcase:𝑓,𝑔linear,Lmeansquare error: reduces to Principal Component Analysis (PCA)
– PCA:Datacompressionmethodsthat reduces the dimensionality of the data while maintaining its essence.
COMP90073 Security Analytics © University of Melbourne 2021
Unsupervised Representation Learning
• Forcetherepresentationstobettermodelinputdistribution
– Not just extracting features for classification
– Askingthemodeltobegoodatrepresentingthedataandnotoverfittingto a particular task
– Potentiallyallowingforbettergeneralization
COMP90073 Security Analytics © University of Melbourne 2021
Application: Hybrid Anomaly Detection Model [3]
• Feedtheoutputofthebottleneckintoasimplemodel(e.g.k-NN,1SVM,logistic regression. . . ).
COMP90073 Security Analytics © University of Melbourne 2021
Application: Hybrid Anomaly Detection Model
• Exampleof2dimensionalBanana(twomoon)dataset:
• iVATimagesof100dimensionalBananadataset(with5%randomanomaly)
Input data
Output of 1st Output of 2nd hidden layer hidden layer
COMP90073 Security Analytics © University of Melbourne 2021
Application: Multimodal Learning [4]
COMP90073 Security Analytics © University of Melbourne 2021
Undercomplete vs. Overcomplete Hidden Layer
Undercomplete Representation:
• Hiddenlayerisundercompleteifsmallerthantheinput layer
• Hiddenlayercompressestheinput
• Compresseswellonlyforthetrainingdistribution
• Hiddenunitswillbegoodfeaturesforthetraining distribution, but bad for other inputs
COMP90073 Security Analytics © University of Melbourne 2021
Undercomplete vs. Overcomplete Hidden Layer
Undercomplete Representation:
• Hiddenlayerisundercompleteifsmallerthantheinput layer
• Hiddenlayercompressestheinput
• Compresseswellonlyforthetrainingdistribution
• Hiddenunitswillbegoodfeaturesforthetraining distribution, but bad for other inputs
Overcomplete Representation:
• Hiddenlayerisovercompleteifgreaterthantheinputlayer
• Nocompressioninthehiddenlayer
• Eachhiddenunitcouldcopyadifferentcomponent
• Noguaranteethatthehiddenunitwillextractmeaningful structure
COMP90073 Security Analytics © University of Melbourne 2021
Undercomplete vs. Overcomplete Hidden Layer
Undercomplete Representation:
• Hiddenlayerisundercompleteifsmallerthantheinput layer
• Hiddenlayercompressestheinput
• Compresseswellonlyforthetrainingdistribution
• Hiddenunitswillbegoodfeaturesforthetraining distribution, but bad for other inputs
Overcomplete Representation:
• Hiddenlayerisovercompleteifgreaterthantheinputlayer
• Nocompressioninthehiddenlayer
• Eachhiddenunitcouldcopyadifferentcomponent𝑥=𝑥O
• Noguaranteethatthehiddenunitwillextractmeaningful structure
COMP90073 Security Analytics © University of Melbourne 2021
• Idea:Addnoisetoinputbutlearntoreconstruct the original
– Randomlyassignasubsetofdatato0,with probability 𝜌
– AddGaussiannoise
• Reconstruct𝑥Ofromcorrupted𝑥Y
• Loss function minimises error between 𝑥O and
original sample 𝑥 1#$
” L 𝜃 , 𝜙 = 2 ? 𝑥 $ − 𝑔 4 𝑓 5 𝑥Y $
• Preventscopying
• Improvestherepresentationsandrobustness
• Note:differentnoiseisaddedduringeachepoch
COMP90073 Security Analytics © University of Melbourne 2021
Example of Autoencoder
L 𝜃 , 𝜙 = ? 𝑥 − 𝑥I $$
COMP90073 Security Analytics © University of Melbourne 2021
Example of
Original Data 𝑥
Noisy Input 𝑥$ 𝑥\
L 𝜃 , 𝜙 = ? 𝑥 − 𝑥I $$
COMP90073 Security Analytics © University of Melbourne 2021
Example of
0% Noise 25% Noise 50% Noise
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of
COMP90073 Security Analytics © University of Melbourne 2021
Application: Image Patching (Neural Inpainting) [9]
COMP90073 Security Analytics © University of Melbourne 2021
Problem with AEs
• Theirlatentspaceandtheirencodedvectors,maynotbecontinuous.
COMP90073 Security Analytics © University of Melbourne 2021
• VAEsareinaclassofmodelscalledgenerativemodels,theycanbeusedto generate examples of input data by learning their statistics (e.g., mean and variance).
• Insteadoflearning𝑓5(𝑥)and𝑔4(h),VAEslearndistributionsofthefeatures given the input, and the input given the activations, i.e., probabilistic versions of 𝑓5 and 𝑔4. The VAE will learn:
– 𝑞5(h|𝑥):thedistributionofthefeaturesgiventheinput.
– 𝑝4(𝑥|h):thedistributionoftheinputgiventhefeatures.
• Objective:Findadistribution𝑞5(h|𝑥)ofsomelatentvariablesh,whichwe can sample from h~𝑞5(h|𝑥), to generate new samples 𝑥O~𝑝4(𝑥|h).
COMP90073 Security Analytics © University of Melbourne 2021
Encoding of AE vs. VAE
COMP90073 Security Analytics © University of Melbourne 2021
Why VAE Learns Data Distribution?
• Oftentimesthedataisnoisy,andamodelofthedistributionofthedataismore useful for a given application.
• Therelationshipbetweentheobservedvariablesandthelatentvariablescan be nonlinear, in which case the VAE provides a way to do inference.
• TheVAEisagenerativemodel;bylearning𝑝4(𝑥|h),itispossibletosampleh and then sample 𝑥. This enables the generation of data that has similar statistics to the input.
COMP90073 Security Analytics © University of Melbourne 2021
VAE Architecture
COMP90073 Security Analytics © University of Melbourne 2021
VAE’s Loss Function
L𝜃,𝜙=𝔼`!(a|b)log𝑝c𝑥h −𝐷de(𝑞fh𝑥‖𝑝ch)
Reconstruction Loss
Regulariser
(KL divergence)
• ReconstructionLoss:Theexpectedlog-likelihoodmeasureshowwell samples from 𝑞5(h|𝑥) are able to explain the data 𝑥.
• Regulariser:Ensuresthattheexplanationofthedata𝑞5(h|𝑥)doesn’tdeviate too far from the prior distribution 𝑝4(h) .
• Kullback–Leibler(KL)-divergence:Measureofdifferencebetweentwo distributions (the approximate posterior and the prior for h)
COMP90073 Security Analytics © University of Melbourne 2021
Intuition of VAE’s Loss Function [10]
Only reconstruction loss Only KL divergence Combination
COMP90073 Security Analytics © University of Melbourne 2021
Example: Two-dimensional Latent Space for MNIST
Demo: https://www.siarez.com/projects/variational-autoencoder
COMP90073 Security Analytics © University of Melbourne 2021
Application: Generating celebrity-lookalike photos
COMP90073 Security Analytics © University of Melbourne 2021
Application: Forecasting [7]
COMP90073 Security Analytics © University of Melbourne 2021
Disentangled VAE (𝜷-VAE)
• DisentangleRepresentation:Eachvariableintheinferredlatent representation h is only sensitive to one single generative factor and relatively invariant to other factors.
• Extractveryusefulfeaturesfromveryhighdimensionalspaceandusethemto a task it wants to learn.
• Thosefeaturesgeneralisetodomainsoutsidethetrainingdata,andenhance interpretability.
Non examinable
COMP90073 Security Analytics © University of Melbourne 2021
Disentangled VAE (𝜷-VAE)
L𝜃,𝜙=𝔼`!(a|b)log𝑝c𝑥h −𝛽𝐷de(𝑞fh𝑥‖𝑝ch)
• For𝛽>1,itappliesastrongerconstraintonthelatentbottleneckandlimitsthe
representation capacity of h.
• Advantages:Extremelyflexible,evenifeachconditionalissimple(e.g.
conditional Gaussian), the marginal likelihood can be arbitrarily complex
Non examinable
COMP90073 Security Analytics © University of Melbourne 2021
Representation Learning by 𝜷-VAE and VAE
Non examinable
COMP90073 Security Analytics © University of Melbourne 2021
Manipulating Latent Variables
Azimuth (Rotation)
Emotion (Smile)
Non examinable
COMP90073 Security Analytics © University of Melbourne 2021
Using VAE for Anomaly Detection [11]
• TrainingVAE:usingonlydataofnormalinstancestolearn𝑞5(h|𝑥)and𝑝4(𝑥|h) • Foratestinstance𝑧
– Evaluatethemeanandstandarddeviationvectorswiththeprobabilistic encoder 𝜇6, 𝜎6 = 𝑞5(h|𝑧)
– Draw𝐿samplesfromh~𝑁𝜇6,𝜎6
– Computethereconstructionprobability 1;
– 𝑧isanomalyif𝑃7890#: 𝑧 >𝛼
𝑃 (𝑧)= ?𝑝(𝑧|𝜇,𝜎)
4 6̂ ( 6̂ ( 6̂ ( 6̂ ( 5 ( > | @ ( )
2%! (𝜇 ,𝜎 )=𝑞
COMP90073 Security Analytics © University of Melbourne 2021
Advantages of VAE for Anomaly Detection
• Latentvariablesarestochasticvariables
– TheprobabilisticencoderonVAEmodelsthedistributionofthelatent
variables (rather than the latent variable itself).
– Itcancapturenormalandanomalieswhichsharethesamemean,but different variance.
• Reconstructionsarestochasticvariables
– Reconstructionprobabilityconsidersthereconstructionerror,aswellas the variability of the reconstruction (by considering the variance parameter of the distribution function).
– Thispropertyenablesselectivesensitivitytoreconstructionaccordingto variable variance.
COMP90073 Security Analytics © University of Melbourne 2021
• Howtotrainaneuralnetwork?
• Whatisanautoencoderandwhatareitsapplications?
• Howwecanapplyautoencodertonoisydata?
• Whatisagenerativeautoencoderandhowwecanuseitforanomaly detection?
Next: Graph anomaly detection
COMP90073 Security Analytics © University of Melbourne 2021
References
1. , , , “Deep Learning Book”, MIT Press, Chapter 14
2. , , Jérôme Louradour, , “Exploring Strategies for Training Deep Neural Networks”, JMLR, 2009
3. , , , , “High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning”, Pattern Recognition, 2016.
4. , , Mingyu Kim, Juhan Nam, Honglak Lee, and . Ng. “Multimodal deep learning”, International Conference on Machine Learning, 2011.
5. , , and , “Segnet: A deep convolutional encoder-decoder architecture for image segmentation”, IEEE transactions on pattern analysis and machine intelligence, 2017.
6. Diederik P. Kingma, , “Auto-Encoding Variational Bayes”, Neural Information Processing Systems (NeurIPS), 2014.
COMP90073 Security Analytics © University of Melbourne 2021
References
7. , , , and Martial Hebert. “An uncertain future: Forecasting from static images using variational autoencoders.” In European Conference on Computer Vision, 2016.
8. , , Arka Pal, , , , , , “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework”, ICLR, 2017.
9. , , , , Alexei A. Efros, “Context encoders: Feature learning by inpainting”, IEEE Conference on Computer Vision and Rattern Recognition, 2016.
10. , “Intuitively Understanding s”,2018. https://towardsdatascience.com/intuitively-understanding-variational- autoencoders-1bfe67eb5daf
11. Jinwon An, , “Variational autoencoder based anomaly detection using reconstruction probability”, Special Lecture on IE 2, (2015).
COMP90073 Security Analytics © University of Melbourne 2021
Other References
• RaghavendraChalapathy,SanjayChawla,“DeepLearning
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com