What is Pattern Recognition?
● “The assignment of a physical object or event to one of several prespecified categories” – Duda and Hart
● “Given some examples of complex signals and the correct decisions for them, make decisions automatically for a stream of future examples” – Ripley
● “The science that concerns the description or classification (recognition) of measurements” – Schalkoff
Copyright By PowCoder代写 加微信 powcoder
● “The process of giving names ω to observations x” – Schürmann
● “Pattern Recognition is concerned with answering the question ‘What is this?’” – is to build algorithms that can recognize useful patterns in data, extract useful information from data, or make decisions based on data.
What is Pattern Recognition?
Aim is to build algorithms that can recognize useful patterns in data, extract useful information from data, or make decisions based on data.
“the film was well directed and enjoyable”
“the story was confusing & the acting wooden”
● What is Pattern Recognition?
● Applications of Pattern Recognition
● An Example of Designing a Pattern Recognition System
● General Principles: – DataandFeatures
– ModelsandTraining – Evaluation
Pattern / Data information /
Related Disciplines
● Machine Learning (ML)
– Broader: also covers the sort of reasoning about data considered in artificial intelligence (AI).
● (Artificial) Neural Networks (NN)
– Narrower: a sub-set of techniques used for pattern recognition. Deep Learning (DL) is a sub-discipline of neural networks.
● Statistics
– Overlapping: a set of techniques used for pattern recognition (and other things).
Related Disciplines
Disciplines in which pattern recognition is applied to specific types of data:
● Signal Processing – 1Dsignals
● Computer Vision
– 2Dimages/3Dvideos
● Data Mining
– Numeric,categoricalandlinguisticdata
stati- stics
What is a Pattern?
Completely regular, deterministic
Uninteresting: easily analysed using standard methods
Interesting: the space of “patterns” pattern recognition is concerned with
Completely random
Uninteresting: no patterns / information
Ingredients required for pattern recognition:
● A pattern exists
● We can not pin it down mathematically
● Data exists
Applications of Pattern Recognition
● What is Pattern Recognition?
● Applications of Pattern Recognition
● An Example of Designing a Pattern Recognition System
● General Principles: – DataandFeatures
– ModelsandTraining – Evaluation
Why is Pattern Recognition Important?
● Lots of data (“data is cheep”)
● Finding patterns in data can be useful / lucrative
(“knowledge is valuable”)
● Lots of applications/opportunities …
Applications: Financial
Financial Forecasting
● using previous financial data (such as price fluctuations) try to find recurring patterns that can be used to inform investment decisions.
● For example, the discovery that sharp increases in the price of gold tends to be preceded by long periods of price stability might be the basis for an investment rule.
Applications: Financial
Detecting fraudulent credit-card transactions
● Keep records of transactions, merchants, and account holders.
● Record which previous transactions were fraudulent, and which were not.
● Find patterns in this data associated with fraudulent transactions.
● Look for similar patterns in new transactions to identify possible fraudulent use.
Applications: Retail
Detailed logs can be be kept of individual purchasing habits (through online sales, loyalty cards).
Identifying patterns in the data can allow:
● Offers/marketing to be targeted at specific customers (Customer Relationship Management).
● Making recommendations:
– PeoplewhoboughtXalso
– Youmightalsolike…
Applications: The Internet
Spam Filters
● Keep lots of old emails, identified as both spam and non-spam.
● Find patterns (words used, attachments, etc.) that distinguish spam from non-spam.
● Identify patterns in new emails to filter spam.
Applications: Medical
Finding sequences in DNA
Identifying Proteins from Amino acids
Relating these to disorders/illnesses
Applications: Financial
Credit scoring
● For processing insurance applications, credit card applications, etc.
● Keep records of previous customers (e.g. # of accidents, make of car, year of model, income, # of dependents, mortgage amount, etc.).
● Record which previous customers were good/bad.
● Find patterns in this data associated with good/bad customers.
● Look for similar patterns in new applications to predict if new customer will be good or bad (i.e. to inform decision about offering insurance or credit).
Applications: Speech Recognition
Find patterns in audio data to allow phonemes or words to be recognized
Applications: Computer Vision
Find patterns in images that let us identify objects
or letters
Applications: Medical
Finding patterns in diagnostic tests
Finding patterns in medical records
Relating these to disorders/illnesses
e.g. in ECG that indicate heart disease or dysfunction.
Case Study in Designing a Pattern Recognition System
● What is Pattern Recognition?
● Applications of Pattern Recognition
● An Example of Designing a Pattern Recognition System
● General Principles: – DataandFeatures
– ModelsandTraining – Evaluation
Applications: Manufacturing
Industrial inspection
Condition Monitoring
Quality control of items being produced.
Analyse sensor data to predict need for maintenance, identify faults.
Sorting Fish: Preprocessing
● Set up a camera to take images.
● Isolate fish from one another and from the background
(segmentation).
● Extract features (from each isolated fish), e.g.:
– Lightness
– Numberfins,etc…
● Expertise of the problem may be required to select features that are useful for the task.
● Pass Features to a classifier.
Sorting Fish: Classification
● Suppose we choose the length of the fish as a possible feature for discrimination
– sea bass are typically longer than salmon
● A simple way of classifying fish might be to see whether
the length of the fish is above or below a critical value (l*)
– i.e. decide fish is sea bass if length>l*
– we call this our model
● The model:
– has a parameter(s) the value of which we need to define
– this will define a decision boundary (the point at which we decide between one category and another)
Sorting Fish
Industrial inspection
● Sort fish on a conveyor according to species
Sea bass Salmon
decision boundary
parameter value
Sorting Fish: Classification
● How do we determine the value of l*?
– Obtainmanymeasurementsoffishlength(bytaking images of fish, preprocessing them, and performing feature extraction).
– Determinetheclassofeachfish(byaskinganexpert).
– Usethisdatatochoosel*
– Forasimpleproblemlikethisone,wecandothisby looking at the histogram of lengths …
Sorting Fish: Feature Selection
● length (alone) would appear to be a poor feature for
correct classification
– eventhebestchoiceofl*resultsinmanyfishbeing misclassified
● So try a different feature: ● lightness…
correctly identified as salmon
correctly identified as sea bass
incorrectly identified as sea bass
incorrectly identified as salmon
Sorting Fish: Cost of Classification Errors
● So far, have assumed aim is reduce number of errors – i.e.numberofmisclassifiedfish
● However, different errors may have different cost
– erroneouslyclassifyingseabassassalmon
● may be less desirable (more costly) than
– erroneouslyclassifyingsalmonasseabass
● The threshold (decision boundary) can be set to minimise cost rather than absolute number of errors
– inthisexample,wewouldmoveourdecisionboundarytoward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified as salmon!)
● Minimizing cost is the focus of a sub-field of pattern recognition known as decision theory
decision boundary for minimising cost
decision boundary for minimising number of errors
incorrectly identified as salmon
Sorting Fish: Cost of Classification Errors
● So far, have assumed aim is reduce number of errors – i.e.numberofmisclassifiedfish
● However, different errors may have different cost
– erroneouslyclassifyingseabassassalmon
● may be less desirable (more costly) than
– erroneouslyclassifyingsalmonasseabass
Sorting Fish: Feature Selection
What if optimal decision boundary for lightness is still not good enough?
● We could try performing classification based on multiple features
● e.g. the lightness and width of the fish
– wearenowworkingina2Dratherthana1Dfeature
– goalisstilltopartitionfeaturespaceintotworegions: one containing sea bass and one containing salmon
decision boundary
2D feature space:
represent feature values of samples
● colour indicates class
Sorting Fish: Feature Selection
What if optimal decision boundary for lightness is still not good enough?
Sorting Fish: Data
To improve performance further, we might collect more data. i.e. increase the number of points in the feature space
● This would give a better estimate of the optimal decision boundary.
● However, collecting more data may be costly (in terms of time and effort).
● There will always be some limit to the amount of data we have with which to design our classifier.
● We might supplement the data with other knowledge of the task:
– e.g. we might know that in winter Sea Bass are more frequently caught, while in summer Salmon are more common.
Sorting Fish: Decision Boundary
To improve performance further, we might use a more complex decision boundary.
● i.e. increase the number of parameters in the model
● if the aim is to minimize errors, then isn’t the optimal decision boundary like this…
Sorting Fish: Feature Selection
To improve performance further, we might add more features. i.e. increase the dimensionality of the feature space
● We should avoid adding “noisy features” that reduce the performance.
● We should avoid adding features that are correlated with the ones we already have.
Sorting Fish: Generalisation
● The aim is to design a classifier to correctly classify novel inputs
● i.e. to generalise to new data
● An overly complex decision boundary is unlikely to
have good performance on new data.
● It is over-fitted to the training data
decision boundary
decision boundary
Bias/Variance Trade-off
Less expressive
Might be unable to classify data
High variance More expressive Might overfit the data
Sorting Fish: Some Lessons Learnt
● A classifier consists of a model and the parameters of that model
● The classifier takes feature values as input
● Some features are more discriminating than others
● Multiple features can be used for classification
● Training data can be used to refine the model and optimise the parameters
● The classifier might be optimised to reduce the number of errors or the cost
● Simple models generalise better than overly complex models
Sorting Fish: Improving Performance
To improve performance, we might
● add more features
● collect more data
● use a model with more parameters (to define a more complex decision boundary)
These choices interact:
● Increasing the number of features → need more data to avoid overfitting
– known as the “curse of dimensionality”
● Increasing the number of parameters → need more data to avoid
overfitting
– known as “bias/variance trade-off”
● What is Pattern Recognition?
● Applications of Pattern Recognition
● An Example of Designing a Pattern Recognition System
● General Principles: – DataandFeatures
– ModelsandTraining – Evaluation
The Classifier Design Cycle
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
Pattern Recognition System Design: Data and Features
The Classifier Design Cycle: Data
Design Considerations:
● What data to collect?
What data is it possible to obtain?
What data will be useful?
● How much data to collect?
– need representative set of examples for training and testing the classifier
The Classifier Design Cycle: Data
Data Sources
● One or more transducers (e.g. camera, microphone, etc.) extract information from the physical world.
● Datamyalsocomefromothersources(e.g.databases, webpages, surveys, documents, etc.).
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
The Classifier Design Cycle
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
modify if evaluation not good enough
The Classifier Design Cycle: Data
input data
This module will:
● Assume that we have the input data
● Concentrate on the subsequent stages
The Classifier Design Cycle: Data
input data
Input data can be any set of values, e.g.:
● Age and salary of customers.
● Number of iPads sold per week.
● A set of images of objects.
● Recordingsofspeech.
● EEGrecordings.
● Medicalrecords.
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
The Classifier Design Cycle: Data
input data
Data Cleansing
● Datamayneedtobepreprocessedorcleaned(e.g. segmentation, noise reduction, outlier removal, dealing with missing values).
● This provides the “input data”.
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
The Classifier Design Cycle: Features
Data Collection
Feature Selection
Design Considerations:
● What features are useful for classification?
Knowledge of the task may help decide
The Classifier Design Cycle: Features
feature vector
Select features of the input data:
● A feature can be any aspect, quality, or characteristic of the data.
● Features may be discrete (e.g., labels such as “blue”, ”large”) or continuous (e.g., numeric values representing height, lightness).
● Any number of features, d, can be used. The selected features form a “feature vector”
Model Selection
Parameter Tuning
Evaluation
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
The Classifier Design Cycle: Data
input data
Data is classified as:
● discrete (integer/symbolic valued) or continuous (real/continuous valued).
● univariate(containingonevariable)ormultivariate(containing multiple variable).
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
Feature Vectors
● a collection of datapoints/feature vectors is called a dataset.
x11 x12 x1n sample1= x21 sample2= x22 ⋯ samplen= x2n
[⋮] [⋮] [⋮]
xd1 xd2 xdn x11 x12⋯x1n
dataset= x21 x22 ⋯ x2n [⋮⋮ ⋮]
xd1 xd2⋯xdn
Feature Vectors
How datapoints are distributed in feature space will influence how well the classifier works.
Ideally, the chosen features will help discriminate between classes:
– Examples from the same class should have similar feature values
– Examples from different classes have different feature values
“Good” features
“Bad” features
Feature Vectors
● Each datapoint/exemplar/sample is represented by the chosen set of features.
● The combination of d features is a d-dimensional column vector called a feature vector.
● The d-dimensional space defined by the feature vector is called the feature space.
● a collection of datapoints/feature vectors is called a dataset.
● datapoints can be represented as points in feature space; the result is a scatter plot.
The Classifier Design Cycle: Features
feature vector
Ideally we would select features of the input data that:
● preserve within-class similarity
● discriminate between classes
Data Collection
Feature Selection
Pattern Recognition System Design: Models and Training
Model Selection
Parameter Tuning
Evaluation
Feature Space Properties
Certain types of distribution have names:
Linearly separable Non-linear separability Multi-modal
The Classifier Design Cycle: Model
A model is the method used to perform the classification Design Considerations:
● What sort of model should be used?
– e.g. Neural Network, SVM, Random Forest, etc.
● Different models will give different results
– However, the only way to tell which model will work
best is to try them.
The Classifier Design Cycle: Training
The model has parameters which need to be defined Design Considerations:
● How to use the data to define the parameters?
– There are many different algorithms for training classifiers
(e.g. gradient descent, genetic algorithms)
● What parameters to use for the training algorithm?
Data Collection
Feature Selection
– These are the hyper-parameters
Model Selection
Parameter Tuning
Evaluation
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
● What is Pattern Recognition?
● Applications of Pattern Recognition
● An Example of Designing a Pattern Recognition System
● General Principles: – DataandFeatures
– ModelsandTraining – Evaluation
Tuning the parameters of the model
Need to find parameter values that give best performance (e.g. minimum classification error)
● Formorecomplextasksexhaustivesearchbecomesimpractical
The Classifier Design Cycle: Training
The procedure used to define the parameters of the classifier is called:
● “training”, or “parameter tuning”, or “optimisation”, or “learning”
– this is where the term “machine learning” comes from
– machine learning is analogous to, but simpler than, learning in biology
Data Collection
Feature Selection
Model Selection
Parameter Tuning
Evaluation
Tuning the parameters of the model
Need to find parameter values that give best performance (e.g. minimum classification error)
● For simple task could exhaustively try all values
What is Learning?
● Learning is acquiring and improving performance through experience.
● Pretty much all animals with a central nervous system are capable of learning (even the simplest ones).
● We want computers to learn when it is too difficult or too expensive to program them directly to perform a task.
● Get the computer to program itself by showing examples of inputs and outputs.
● We will write a “parameterized” program, and let the learning algorithm find the set of parameters that best approximates the desired function or behaviour.
Types of Learning
● Supervised learning
– Thereisadesiredoutputdefinedforeachfeature vector in the training set
– i.e.thedatasetcontains{x,ω}pairs – e.g.:
“dogs” “cats”
What is Learning?
● Learning is acquiring and improving performance through experience.
● Pretty much all animals with a central nervous system are capable of learning (even the simplest ones).
Types of Learning
● Kinds of Supervised learning
Types of Learning
Kinds of Supervised learning
Weakly-Supervised Learning
● supervised learning with inexact (or inaccurate) labels
● what constitutes inexact label depends on task
● inexact label cheaper to produce
weaker supervision
Regression
● Predict a continuous number given a feature vector
e.g. how much does this person earn?
Classification
● Predict a discrete number given a feature vector
e.g. does this person earn >£35,000?
This module concentrates on classification, but same methods can usually be adapted to perform regression
feature vector
possible labels
Types of Learning
● Kinds of Supervised learning
Regression
● Outputs are continuous variables (real numbers).
● Also known as “curve fitting” or “function approximation”.
● Allows us to perform interpolation.
Classification
● Outputs are discrete variables (category labels).
● Aim is to assign feature vectors to classes.
● e.g. learn a decision boundary that separates one class from the other.
Types of Learning
● Kinds of Unsupervised learning
Clustering
● Discover “clumps” or “natural groupings” of points in the feature space
● i.e. find groups of exemplars that are similar to each other
● Discover a low-dimensional manifold or surface near which the data lives.
Factorisation
● Discover a set of components that can be combined to
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com