CS699 Lecture 5 Classification 2
•
Using IF‐THEN Rules for Classification Represent the knowledge in the form of IF‐THEN rules
•
Assessment of a rule: coverage and accuracy
– A tuple is covered by R if it satisfies the antecedent of R – ncovers = # of tuples covered by R
– ncorrect = # of tuples correctly classified by R
– coverage(R) = ncovers /|D| /* D: training data set */
– accuracy(R) = ncorrect / ncovers
– Refer to Example 8.6, page 356
2
R: IF age = youth AND student = yes THEN buys_computer = yes
– IF antecedent/precondition/condition THEN consequent/conclusion
Using IF‐THEN Rules for Classification • Anotherexample
• •
R2: IF income = high AND student = no
3
THEN buys_computer = no R2 covers three tuples
age income
<=30 high no <=30 high no 31...40 high no >40 medium no >40 low yes >40 low yes 31…40 low yes <=30 medium no <=30 low yes >40 medium yes <=30 medium yes 31...40 medium no 31...40 high yes >40 medium no
coverage(R2) = 3/14 = 21.43%
excellent no fair yes fair yes fair yes excellent no excellent yes fair no fair yes fair yes excellent yes excellent yes fair yes excellent no
Among these three tuples, two tuples are correctly classified by R2. accuracy(R2) = 2/3 = 66.67%
studentcredit_ratingbuys_computer fair no
•
Using IF‐THEN Rules for Classification If more than one rule are triggered, need conflict resolution
4
– Size ordering: assign the highest priority to the triggering rules that has the “toughest” requirement (i.e., with the most attribute tests)
– Class‐based ordering: decreasing order of prevalence or misclassification cost per class
– Rule‐based ordering (decision list): rules are organized into one long priority list, according to some measure of rule quality or by experts
– When a tuple is not covered by any rule, a default rule is used. E.g., a majority class is assigned.
Rule Extraction from a Decision Tree
One rule is created for each path from the root to a leaf
Each attribute‐value pair along a path forms a conjunction (AND): the leaf holds the class prediction
5
•
Rule Extraction from a Decision Tree Example: Rule extraction from our buys_computer decision‐tree
6
no
no
yes yes
excellent fair yes
student?
yes
credit rating?
<=30
31..40
>40
IF age <= 30 AND student = no
IF age <= 30 AND student = yes
IF age in 31..40
IF age > 40 AND credit_rating = excellent IF age > 40 AND credit_rating = fair
THEN buys_computer = no THEN buys_computer = yes THEN buys_computer = yes THEN buys_computer = no THEN buys_computer = yes
age?
Rule Extraction from the Training Data
• Sequential covering algorithm: Extracts rules directly from training data
• Rules are learned sequentially
• Each rule for a given class will ideally cover many tuples of the class but none (or few) of the tuples of other classes.
• There are many sequential covering algorithms.
• General strategy:
– Rulesarelearnedoneatatime
– Eachtimearuleislearned,thetuplescoveredbytherulesareremoved
– The process repeats on the remaining tuples unless termination condition, e.g., when no more training examples or when the quality of a rule returned is below a user‐specified threshold
• Comp. w. decision‐tree induction: learning a set of rules simultaneously February 5, 2020 Data Mining: Concepts and Techniques 7
• Example
student credit_ratings_comp
• •
First, yes rules.
IF ? THEN buys_computer = yes
<=30 high no excellent 31...40 high no fair
>40 medium no fair
>40 low yes fair
no yes yes yes no yes no yes yes yes yes yes no
Simple Covering Algorithm (Example)
age <= 30 2/5 age 31..40 4/4 age > 40 3/5 income = low 3/4 income = medium 4/5 income = high 2/4 student = no 3/7 student = yes 6/7 CR = fair 6/8 CR = excellent 3/6
>40 low yes excellent 31…40 low yes excellent <=30 medium no fair <=30 low yes fair
February 5, 2020
>40 medium no excellent
age income
<=30 high no fair no
>40 medium yes fair <=30 medium yes excellent 31...40 medium no excellent 31...40 high yes fair
Data Mining: Concepts and Techniques
8
• •
A new rule is created IF age = 31..40
age income student <=30 high no <=30 high no
credit_ratings_comp fair no excellent no
•
>40 medium low
•
is repeated (for yes tuples) Once all yes tuples are covered,
Simple Covering Algorithm (Example)
THEN buys_computer = yes
Four tuples covered by this rule >40
no fair yes yes fair yes yes excellent no
are removed and the same
>40 low <=30 medium <=30 low
>40 medium <=30 medium >40 medium
no fair no yes fair yes yes fair yes yes excellent yes no excellent no
the same is done for no tuples
February 5, 2020
Data Mining: Concepts and Techniques
9
•
IF age = 31..40
age income student <=30 high no <=30 high no
credit_ratings_comp fair no excellent no
•
IF age = 31..40 AND
student = yes
Simple Covering Algorithm (Example)
THEN buys_computer = yes age <= 30 2/5 age > 40 3/5 income = low 2/3 income = medium 3/5 income = high 0/2 student = no 1/5 student = yes 4/5 CR = fair 4/6 CR = excellent 1⁄4
>40 medium >40 low
>40 low <=30 medium <=30 low
no fair yes yes fair yes yes excellent no
THEN buys_computer = yes
February 5, 2020 Data Mining: Concepts and Techniques
10
>40 medium <=30 medium >40 medium
no fair no yes fair yes yes fair yes yes excellent yes no excellent no
• •
(Numerical) prediction is similar to classification – constructamodel
– usemodeltopredictcontinuousvalues
• •
Major method for prediction: regression
– modeltherelationshipbetweenoneormoreindependentorpredictor
Numeric Prediction
Prediction is different from classification
– Classification refers to prediction of categorical class label – Prediction models continuous‐valued functions
variables and a dependent or response variable Regression analysis
– Linear and multiple regression
– Non‐linear regression
– Other regression methods: generalized linear model, Poisson regression, log‐linear models, regression trees
February 5, 2020 Data Mining: Concepts and Techniques 11
Linear Regression
• Linear regression: involves a response variable y and a single predictor variable x
y=w0 +w1 x
where w0 (y‐intercept) and w1 (slope) are regression coefficients
Method of least squares: estimates the best‐fitting straight line
• Multiple linear regression: involves more than one predictor variable – Trainingdataisoftheform(X1,y1),(X2,y2),…,(X|D|,y|D|)
– Ex.For2‐Ddata,wemayhave:y=w0 +w1 x1+w2 x2
– Manynonlinearfunctionscanbetransformedintotheabove
February 5, 2020 Data Mining: Concepts and Techniques 12
Other Regression
• Nonlinear regression: some nonlinear models can be modeled by a math function
Example: a polynomial regression model y=w0 +w1 x+w2 x2+w3 x3
• Other Regression‐Based Models – Logistic regression
– Regression trees
February 5, 2020 Data Mining: Concepts and Techniques 13
Classification Example
• Adult
• Car evaluation
• Vote
• Credit screening
• CPU (numeric prediction)
• All from UCI Machine Learning Data Repository
Adult Dataset
• Thisdatasetwasextractedfromcensusbureau database.
• 32561tuples,15attributes
• Classattribute:whetherearns>50Kor<=50K
• Attributes:age,workclass,fnlwgt,education, education‐num, marital‐status, occupation, relationship, race, sex, capital‐gain, capital‐loss, hours‐per‐week, native‐country, class
• CAR
– PRICE
car acceptability
overall price
buying price
price of the maintenance technical characteristics comfort
• buying
• maint – TECH
• COMFORT – doors
number of doors
capacity in terms of persons to carry the size of luggage boot
estimated safety of the car
Car Evaluation Dataset
• The model evaluates cars according to the following concept structure:
– persons
– lug_boot • safety
Car Evaluation Dataset
• 1728 tuples, 7 attributes • Class attribute:
– car acceptability
– values: unacc, acc, good, vgood
• Attributes: buying, maint, doors, persons, lug_boot, safety
• 435 tuples, 7 attributes
Vote Dataset
• This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes
• Class attribute: Democrat or Republican
• Attributes: handicapped‐infants, water‐project‐cost‐ sharing, adoption‐of‐the‐budget‐resolution, physician‐fee‐ freeze, el‐salvador‐aid, religious‐groups‐in‐schools, anti‐ satellite‐test‐ban, aid‐to‐nicaraguan‐contras, mx‐missile, immigration, synfuels‐corporation‐cutback, education‐ spending, superfund‐right‐to‐sue, crime, duty‐free‐exports, export‐administration‐act‐south‐africa
Credit Screening Dataset
• This file concerns credit card applications. All attribute names and values have been changed to protect confidentiality of the data.
• 690 tuples, 16 attributes
• Class attribute: approved or denied
• Attributes:
– MYCT: cycle times
CPU Dataset
• CPU performance
• 209 tuples, 7 attributes
• Class attribute: relative performance (numeric)
– MMIN: minimum main memory (KB) – MMAX: maximum main memory (KB) – CACH: cache (KB)
– CHMIN: minimum channels
– CHMAX: maximum channels
Hepatitis Dataset
• Downloaded from UCI Data Repository
• Includes information about 155 acute chronic hepatitis patients, with 19 measured variables.
• 33 died and 122 survived.
• Goal of original research was to understand the effect of measured variables on the chance of patient survival.
• 155 tuples and 19 attributes
Hepatitis Dataset • Initial dataset: hepatitis.csv
• Preprocessing: missing values, attribute types, attribute selection, etc.
• Attribute selection methods: CfsSubset, Correlation, GainRatio, InfoGain, ReliefF, SymmetricalUncert
• Classifiers: Naïve Bayes, Logistic regression, SVM, MLP, J48, RandomForest, KNN, Bagging, Boosting
• http://www.cs.illinois.edu/~hanj/bk3/
•
Ian H. Witten, E. Frank, and M.A. Hall, "Data Mining Practical Machine Learning Tools and Techniques," Third Ed., 2011, Morgan Kaufmann
23
References
• Han, J., Kamber, M., Pei, J., “Data mining: concepts and techniques,” 3rd Ed., Morgan Kaufmann, 2012