COMP3308/3608, Lecture 5
ARTIFICIAL INTELLIGENCE
Introduction to Machine Learning. K-Nearest Neighbor. Rule-Based Algorithms: 1R
Reference: Russell and Norvig, p.693-697, 738-741 Witten, Frank, Hall and Pal, ch. 1-2, ch.4: p.91-96, 135-141
Copyright By PowCoder代写 加微信 powcoder
, COMP3308/3608 AI, week 5, 2022 1
Assignment 1 – COMP3308
• The first three students who finished the assignment are:
• They will receive certificates Congratulations! Very well done!
This Photo by Unknown Author is licensed under CC BY
, COMP3308/3608 AI, week 5, 2022 2
Assignment 1 – COMP3308 (2)
• How do we know?
• When you decode the secret message there are instructions
to post a specific phrase on the discussion board☺
• If you are not the first, second or third to finish, do not worry! You have done an amazing job and should be very proud of your search skills!
, COMP3308/3608 AI, week 5, 2022 3
• Introduction to Machine Learning (ML)
• What is learning and ML?
• Classification of ML methods
• K-nearest neighbor
• Learning rules – 1R algorithm
, COMP3308/3608 AI, week 5, 2022 4
Learning and Machine Learning
• Machine Learning (ML) is the area of AI that is concerned with writing computer programs that can learn from
• examples
• domain knowledge • userfeedback
• ML is the core of AI – without an ability to learn, a system cannot be considered intelligent
• What does it mean to learn? What do you understand by learning? When are you sure that you have learned something?
, COMP3308/3608 AI, week 5, 2022 5
Definitions of learning
• Definitions of learning from dictionary:
• 1) To get knowledge of something by study, experience, or being taught
• 2) To become aware by information or from observations
• 3) To commit to memory
• 4) To be informed of, ascertain; to receive instruction
• Learning is making useful changes in our minds ( )
• But when talking about computers (i.e. ML) these definitions
have shortcomings!
, COMP3308/3608 AI, week 5, 2022 6
Learning Definitions – Shortcomings
• 1) and 2) – impossible to test if learning has been achieved or not
• How do you know if a machine has got knowledge of…?
• Or if it has become aware…? Can computers be aware or
conscious – philosophical issue
• 3) and 4) – committing to memory, receiving instructions
• Sound too passive; trivial tasks for computers
• You can receive instructions and memorize things without being able to benefit from them, e.g. not able to apply new knowledge to new situations
, COMP3308/3608 AI, week 5, 2022 7
Learning – Operational Definition
• Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time ( )
• Computers learn when they change their behavior in a way that makes them perform better in the future
• Ties learning to performance rather than knowledge
, COMP3308/3608 AI, week 5, 2022 8
Types of ML
• Three main types • Supervised
• Unsupervised
• Reinforcement
• Forth type: associations learning – developed within the
database community in early 90s
, COMP3308/3608 AI, week 5, 2022 9
Supervised ML Tasks – Examples
• Classification task: recognizing post codes (=recognizing digits) Given: Handwritten digits and their corresponding label (class) Task: Build a classifier that can recognise new handwritten digits
For the image:
– 5 people wrote the numbers from 0 to 9
– 1 example = 1 handwritten digit (50 in total)
– each example is labelled with the digit it represents (0,1,…or 9) => there are 10 classes
image ref: http://www-inst.eecs.berkeley.edu/~cs188/fa06/projects/classification/4/writeup/img2.gif
• Regression task: predicting the exchange rate of AUD
Given: data from previous years (economic indicators, political events), with their corresponding exchange rate
Task: Build a classifier to predict the exchange rate for future days
• The difference between classification and regression is in the type of the class (target)
variable – nominal vs numeric
, COMP3308/3608 AI, week 5, 2022 10
Supervised Learning – Definition
• Given: a set of pre-classified (labelled) examples {x,y}, x – input vector, y – target output
• Task: learn a function (classifier, model) which encapsulates the information in these examples (i.e. maps x->y) and can be used predictively
• i.e. to predict the value of one variable (y) from the known values of other variables (x)
• Why is it called supervised?
• 2 types of supervised learning
• Classification: the variable to be predicted is categorical (i.e. its values belong to a pre-specified, finite set of possibilities)
• Regression: the variable to be predicted is numeric
• Examples of supervised algorithms: 1R, k-NN, DTs, NB, neural
networks (perceptron, backpropagation), SVM
, COMP3308/3608 AI, week 5, 2022 11
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
overcast mild high
mild normal
overcast hot normal
input data Classification – Ex.1
input vector, with 4 features
target class model (classifier) model 1: decision tree
model 3: …
model 2: rules
if outlook=sunny then play=no
elseif outlook=overcast then play=yes elseif outlook=rainy then play=yes
We can learn different types of models
, COMP3308/3608 AI, week 5, 2022 12
Classification Ex.2 – Driving Motor Vehicles
• ALVINN, Pomerleau et al., 1993
• Driving a van along a highway
• Uses a neural network classifier
Input vectors: derived from the 30×32 image (black and white values)
Outputs (classes): 32 classes, corresponding to the turning directions – left, straight, right; different degrees
1 labelled example is: input vector + class label (turning direction)
• The machine that changed the world https://www.youtube.com/watch?v=oPpMp60vCMY
Early NN: minute 39-41, ALVINN and NetTalk: minute 41-46
, COMP3308/3608 AI, week 5, 2022 13
Classification – More Examples
• Banking 1: Is a mortgage application a good or bad credit risk?
• Banking 2: Is a credit card transaction fraudulent or not?
• Medicine: Is a particular disease present or not?
• Law: Was a given will written by the real diseased person or by somebody else?
• Security: Is a given behavior a possible terrorist threat?
, COMP3308/3608 AI, week 5, 2022 14
CPU performance data
model linear regression
PRP = – 56.1 + 0.049 MYCT + 0.015 MMIN
+ 0.006 MMAX + 0.630 CACH
– 0.270 CHMIN + 1.46 CHMAX
regression tree
Regression – Example
Task: Predict computer performance
, COMP3308/3608 AI, week 5, 2022 15
More Regression Examples
• Predict electricity demand in the next hour from previous demands
• Predict retirement savings from current savings and market indicators
• Predict the house prices in Sydney in 2030
• Predict the sales of a new product based on advertisement
expenditure
• Predict wind velocity based on temperature, humidity, pressure
, COMP3308/3608 AI, week 5, 2022 16
Reinforcement Learning
• Each example has a score (grade) instead of correct output
• Much less common that supervised learning
• Most suited to control systems applications
, COMP3308/3608 AI, week 5, 2022 17
Unsupervised Learning (Clustering)
• Given: a collection of input vectors x • no target outputs y are given
• Task: group (cluster) the input examples into a finite number of clusters so that the examples
• From each cluster are similar to each other
• From different clusters are dissimilar to each other
• Examples of clustering algorithms: k-means, nearest neighbor, hierarchical clustering
, COMP3308/3608 AI, week 5, 2022 18
Clustering – Example
• Customer profiling
A department store wants to segment its customers into groups and create a special catalog for each group. The attributes for the grouping included customer’s income, location and physical characteristics (age, height, weight, etc.).
• Clustering was used to find clusters of similar customers
• A catalogue was created for each cluster based on the cluster
characteristics and mailed to each customer
, COMP3308/3608 AI, week 5, 2022 19
Associations Learning
• Find relationships in data
• market-basket analysis – find combinations of items that occur
typically together
• sequential analysis – find frequent sequences in data
, COMP3308/3608 AI, week 5, 2022 20
Market-Basket Analysis – Example
• Uses the information about what customers buy to give us insight into who they are and why they make certain purchases
• Ex.1. A grocery store owner is trying to decide if to put bread on sale. He generates association rules and finds what other products are typically purchased with bread. A particular type of cheese is sold 60% of the time the bread is sold and a jam is sold 70% of the time. Based on these findings, he decides:
• 1) to place some cheese and jam at the end of the aisle where the bread is
• 2) not to place either of these 3 items on sale at the same time.
, COMP3308/3608 AI, week 5, 2022 21
Frequent Sequences – Example
• Goal: Given a sequence of events, find frequent sub-sequences • These patterns are similar to market-basket analysis but the
relationship is based on time
• Ex. 1: The webmaster of company X periodically analyses the web pages log data to determine how the users browse the web pages. He finds that 70% of the cases the users of page A follow one of the following patterns:
• A->D->B->C
• A->E->B->C
• => A-> C if a frequent pattern
• => he then decides to add a link from page A to page C
• Ex.2: Finding sub-sequences in DNA data for particular species , COMP3308/3608 AI, week 5, 2022 22
More ML Applications
• Fraud detection
• Health care – medical insurance fraud, inappropriate medical treatment
• Credit card services, phone card and retail fraud
• Data: historical transactions and other data
• Sport – analyzing game statistics (shots blocked, assists and fouls) to gain competitive advantage
• “When player X is on the floor, player Y’s shot accuracy decreases from 75% to 30%”
• Astronomy
• JPL and the Palomar Observatory discovered 22 quasars using ML
• Web applications
• Mining web logs to discover customer preferences and behavior, analyze
effectiveness of web marketing, improve web site organization
, COMP3308/3608 AI, week 5, 2022 23
Why is ML Important? adapted from http://www.site.uottawa.ca/~nat/Courses/csi5387.html
• Some tasks cannot be defined well, except by examples
• e.g. recognizing people (man vs women), handwritten digits, etc.
• The amount of knowledge available about certain tasks is too big for explicit encoding into rules or difficult to extract from experts)
• e.g. medical diagnosis – easier to learn from cases (symptoms-> diagnosis)
• Need for adaptation
• Humans often produce machines that do not work as well as desired in
the environments in which they are used
• Environments change over time – e.g. a spam email filter; the characteristics of spam email change over time
• Relationships and correlations can be hidden within large amounts of data. ML and Data Mining may be able to find these relationships.
, COMP3308/3608 AI, week 5, 2022 24
Machine Learning vs Data Mining
• Data Mining (DM): search for hidden patterns in large datasets • these patterns should be meaningful, useful and actionable
• Most of the techniques used for DM have been developed in ML
• DM deals with large and multidimensional data, ML not necessary
• DM is applied ML
• Motivation for DM
• Data explosion – huge databases
• due to automated data collection tools and mature database technology
• examples: supermarket transaction data, credit card usage data, mobile usage data, government statistics, molecular databases, medical records, Wikipedia and other large test collections, etc.
• We are drowning in data but starving for knowledge!
, COMP3308/3608 AI, week 5, 2022 25
What jobs will disappear in the 21 century?
mailcarriers
insurance and retail estate agents, autodealesrs
prison guards stockbrokers
teachers orthodontists
pharmers (vaccine carrying tomato)
gene programmers
tissue engineers
hot-line handyman (remote diagnostics)
Turing testers
CEOs truckers housekeepers
What will be the 10 hottest jobs of the 21 century?
narrowcasters (personalised ads)
Time magazine, June 26, 2000
data miners
, COMP3308/3608 AI, week 5, 2022 26
If you were born in 2012….you would work in Data Mining
• SMH, 6 April 2012 http://www.smh.com.au/lifestyle/life/whats-the-future-baby-20120405-1wfez.html
• “My schooling will become more interesting as I go, as today’s digital natives grow up to become teachers. They’ll know how to use all the gadgets at their disposal to make learning easier, fun and compatible with my short attention span. I’ll always be switched on. I’ll crowdsource my big decisions, taking votes among my closest 30 or so net friends. I’ll do a university degree of course – just about everyone will. I’ll probably work in a knowledge-based service industry which will depend on mining data from customer transactions in unimaginable volumes to determine which services to provide to whom, where and when”.
• Side question: Technology vs “chalk & talk” teaching – which one is better?
https://www.openlearning.com/educationist/ChalkAndTalkOrTechnologyDoIHaveAChoicePartOne , COMP3308/3608 AI, week 5, 2022 27
The 10 Toughest Jobs To Fill In 2016 – Data Scientist
• Forbes magazine, 24 September 2015 https://www.forbes.com/sites/susanadams/2015/09/24/the-10-toughest-jobs-to-fill- in-2016/#d665d4a6fcca
• “With the explosion of big data and the need to track it, employers keep on hiring data scientists. But qualified candidates are in short supply. The field is so new, the Bureau of Labor Statistics doesn’t even track it as a profession. Yet thousands of companies, from startups that analyze credit card data in order to target marketing and advertising campaigns, to giant corporations like Ford Motor F +0.26% and Price WaterhouseCoopers, are bringing on scores of people who can take gigantic data sets and wrestle them into usable information. As an April report from technology market research firm Forrester put it, “Businesses are drowning in data but starving for insights.”
, COMP3308/3608 AI, week 5, 2022 28
The 10 Toughest Jobs To Fill In 2017 – Data Scientist (again)
• Forbes magazine, 8 February 2017 https://www.forbes.com/sites/karstenstrauss/2017/02/08/the-toughest-jobs-to-fill- in-2017/#44c245ee7f14
• “One job that made the list this year – as it did last year – is data scientist. Says Kensing: “Universities now are just starting to integrate specific majors for that field. It’s got a high growth outlook but right now it’s still a burgeoning field.” According to the numbers, the data scientist occupation has a 16% growth outlook over the next eight years, and right now the median annual salary for that position is more than $128,000.”
, COMP3308/3608 AI, week 5, 2022 29
Top Emerging Jobs in 2020
• Forbes magazine, 5 January 2020 https://www.forbes.com/sites/louiscolumbus/2020/01/05/ai-specialist-is-the-top- emerging-job-in-2020-according-to-linkedin/?sh=1d32f6c37495
• “Artificial Intelligence Specialist – Artificial Intelligence and Machine Learning have both become synonymous with innovation, and LinkedIn data shows that’s more than just buzz. Hiring growth for this role has grown 74% annually in the past 4 years and encompasses a few different titles within the space that all have a very specific set of skills despite being spread across industries, including artificial intelligence and machine learning engineer. According to Indeed, Machine Learning Engineer job openings grew 344% between 2015 to 2018 and have an average base salary of $146,085 …
• Data Scientist – LinkedIn is seeing a 37% annual increase in demand for Data Scientists and related technical positions today. Data Science is another field that has topped the LinkedIn Emerging Jobs list for three years running. It’s a specialty that’s continuing to grow significantly across all industries. …”
, COMP3308/3608 AI, week 5, 2022 30
Classification K-Nearest Neighbor Algorithm
, COMP3308/3608 AI, week 5, 2022 31
Given: a set of pre-labelled examples
• 14 examples
• 4 attributes: outlook, temperature, humidity and windy)
• the class is play (values: yes, no)
Task: Build a model (classifier) that can be used to predict the class of new (unseen) examples
e.g. predict the class (yes or no) of
attributes (features, variables)
Classification Setup Again
outlook=sunny, temp=hot, humidity=low, windy=true
Examples used to build the model are called training data
Success is measured empirically on another set called test data
• Test data hasn’t been used to build the classifier; it is also labelled
• Performance measure: accuracy – proportion of correctly classified test examples
, COMP3308/3608 AI, week 5, 2022 32
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
sunny mild normal overcast mild high
overcast hot normal
Nominal and Numeric Attributes
• 2 types of attributes (features):
• numeric (continuous) – their values are numbers
• nominal (categorical) – their values belong to a pre-specified, finite set of possibilities
outlook temp. humidity windy play
false no true no false yes false yes false yes true no true yes false no false yes false yes true yes true yes false yes true no
hot high sunny hot high overcast hot high rainy mild high
rainy cool normal
overcast cool normal
cool normal sunny mild high
rainy mild normal
cool normal
sunny mild normal overcast mild high
overcast hot normal
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com