CS计算机代考程序代写 information retrieval AI Bayesian matlab database data mining algorithm Naïve Bayes Classification

Naïve Bayes Classification

AI lecture: Machine Learning
Naïve Bayes Classification
— Basic Machine Learning Model
Material borrowed (and modified) from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others and revised by C.C. Hung

Outline
Probability and Machine Learning
Bayesian Classification
Naïve Bayesian Classifier
Examples
Model parameters
Evaluating classification algorithms

Things We’d Like to Do
Spam Classification
Given an email, predict whether it is spam or not

Medical Diagnosis
Given a list of symptoms, predict whether a patient has disease X or not

Weather
Based on temperature, humidity, etc… predict if it will rain tomorrow

Recall: The machine learning framework
Apply a prediction function to a feature representation of the image to get the desired output:

f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”

Slide credit: L. Lazebnik

Recall: The machine learning framework
y = f(x)

Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
Testing: apply f to a never before seen test example x and output the predicted value y = f(x)

output
prediction function
Image feature
Slide credit: L. Lazebnik

Bayesian Classification
Problem statement:
Given features X1, X2, …, Xn
Predict a label Y for a new sample

Another Application
Digit Recognition

Features: X1,…,Xn  {0,1} (Blue vs. Red pixels)
Label: Y  {5,6} (predict whether a digit is a 5 or a 6)

Classifier
5

Blue: Background,
Red: Digit

The Bayes Classifier
A good strategy is to predict (mathematically):

(for example: what is the probability that the image represents a 5 given its pixels?)

So … How do we compute that?

Bayes Theorem
Use Bayes Rule!

Why did this help? Well, we think that we might be able to specify how features are “generated” by the class label (i.e. likelihood)

Normalization Constant
Likelihood
Prior

The Bayes Classifier
Use Bayes Rule!

Why did this help? Well, we think that we might be able to specify how features are “generated” by the class label

Normalization Constant
Likelihood
Prior

The Bayes Classifier
Let’s expand this for our digit recognition task:

To classify, we’ll simply compute these two probabilities and predict based on which one is greater.

Model Parameters
For the Bayes classifier, we need to “learn” two functions, the likelihood and the prior.

How many parameters are required to specify the prior for our digit recognition example?

Model Parameters
How many parameters are required to specify the likelihood?
(Supposing that each image is 30×30 pixels)

Model Parameters
The problem with explicitly modeling P(X1,…,Xn|Y) is that there are usually way too many parameters:
We’ll run out of space
We’ll run out of time
And we’ll need tons of training data (which is usually not available)

How many parameters must we estimate?
Suppose X = where Xi and Y are boolean random variables,
To estimate P(Y| X1, X2, … Xn) 2n quantities need to be estimated!
If we have 30 boolean Xi ’s: P(Y | X1, X2, … X30) 230 ~ 1 billion!
How many parameters for P(X1, X2, … Xn|Y) ?

Hence, we need lots of data or a very small n.

How many parameters must we estimate?
Consider the number of parameters we must estimate when Y is boolean and X is a vector of n boolean attributes. In this case, we need to estimate a set of parameters

How many parameters must we estimate?
Reasoning: where the index i takes on 2n possible values (one for each of the possible vector values of X), and j takes on 2 possible values. Therefore, we will need to estimate approximately 2n+1 parameters.
To calculate the exact number of required parameters, note for any fixed j, the sum over i of θij must be one. Therefore, for any particular value yj , and the 2n possible values of xi , we need compute 2n −1 independent parameters. Given the two possible values for Y, we must estimate a total of 2(2n − 1) such θij parameters.

The Naïve Bayes Model
The Naïve Bayes Assumption: Assume that all features are independent given the class label Y

Note that: the second line follows from a general property of probabilities and the third line follows from the definition of conditional independence.

The Naïve Bayes Model
The Naïve Bayes Assumption: Assume that all features are independent given the class label Y
Equationally speaking:

(We will discuss the validity of this assumption later)

Bayes’ Rule: An example
Now our experiment (called event R) consists of selecting a bowl with those priori probabilities and then drawing a ball at random from that bowl.

Bayes’ Rule: An example
Priori Probability

Bowls Red balls White balls Probability for selecting a bowl
A 2 4 1/3
B 1 2 1/6
C 5 4 1/2

Prior (given)

Bayes’ Rule: An example
Now our experiment (called event R having a red ball) consists of selecting a bowl with those priori probabilities and then drawing a ball at random from that bowl.

That is, the event R is the union of the mutually exclusive events, i.e. A∩R, B∩R, and C∩R, i.e.

P(R) = P(A∩R) + P(B∩R) + P(C∩R)
P(X) = P(R) = 1/3 x 2/6 + 1/6 x 1/3 + ½ x 5/9 = 8/18

Bayes’ Rule: An example
Suppose that the outcome of event R is a red ball, but we do not know from which bowl it was drawn.

Accordingly, we compute the conditional probability that the red ball was drawn from bowl A, namely,

P(A|R) by using Bayes’ rule.

Similarily for B and C, i.e. P(B|R) and P(C|R)

Bayes’ Rule: An example
1/3 x 2/6 = 2/18
1/6 x 1/3 = 1/18
1/2 x 5/9 = 5/18
B
C
If divided by p(R)

Bayes’ Rule: An example

Bowls Red balls White balls Probability for selecting a bowl Posterior
A 2 4 1/3 2/8
B 1 2 1/6 1/8
C 5 4 1/2 5/8
Prior

“Essentially, all models are wrong, but some are useful“
— George E.P. Box (1919 – 2013)
University of Wisconsin
*

The Naïve Bayes Model
The Naïve Bayes Assumption: Assume that all features are independent given the class label Y
Equationally speaking:

(We will discuss the validity of this assumption later)

Why is this useful?
# of parameters for modeling P(X1,…,Xn|Y):

Too many (2(2n-1))

# of parameters for modeling P(X1|Y),…,P(Xn|Y)

Much better (2n)

Naïve Bayes Training
Now that we’ve decided to use a Naïve Bayes classifier, we need to train it with some data:

MNIST Training Data

Naïve Bayes Training
Training in Naïve Bayes is easy:
Estimate P(Y=v) as the fraction of records with Y=v

Estimate P(Xi=u|Y=v) as the fraction of records with Y=v for which Xi=u

(This corresponds to Maximum Likelihood estimation of model parameters)

Likelihood
Prior

Naïve Bayes Training: if zero
In practice, some of these counts can be zero
Fix this by adding “virtual” counts:

(This is like putting a prior on parameters and doing MAP estimation instead of MLE)
This is called Smoothing.

MAP: Maximum a posteriori
Likelihood

Naïve Bayes Training
For binary digits, training amounts to averaging all of the training fives together and all of the training sixes together.

Naïve Bayes Classification

Another Example of the Naïve Bayes Classifier

The weather data, with counts and probabilities
outlook temperature humidity windy play
yes no yes no yes no yes no yes no
sunny 2 3 hot 2 2 high 3 4 false 6 2 9 5
overcast 4 0 mild 4 2 normal 6 1 true 3 3
rainy 3 2 cool 3 1
sunny 2/9 3/5 hot 2/9 2/5 high 3/9 4/5 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 mild 4/9 2/5 normal 6/9 1/5 true 3/9 3/5
rainy 3/9 2/5 cool 3/9 1/5

A new day
outlook temperature humidity windy play
sunny cool high true ?

Weather Example
Likelihood of yes

Likelihood of no

Therefore, the prediction is No

The Naive Bayes Classifier for Data Sets with Numerical Attribute Values
One common practice to handle numerical attribute values is to assume normal distributions for numerical attributes.

The numeric weather data with summary statistics
outlook temperature humidity windy play
yes no yes no yes no yes no yes no
sunny 2 3 83 85 86 85 false 6 2 9 5
overcast 4 0 70 80 96 90 true 3 3
rainy 3 2 68 65 80 70
64 72 65 95
69 71 70 91
75 80
75 70
72 90
81 75
sunny 2/9 3/5 mean 73 74.6 mean 79.1 86.2 false 6/9 2/5 9/14 5/14
overcast 4/9 0/5 std dev 6.2 7.9 std dev 10.2 9.7 true 3/9 3/5
rainy 3/9 2/5

Weather Example with numerical data
Let x1, x2, …, xn be the values of a numerical attribute in the training data set.

Weather Example with numerical data
For examples,

Likelihood of Yes =

Likelihood of No =

Outlook
Temperature
Humidity
Windy
Play

Outputting Probabilities
What’s nice about Naïve Bayes (and generative models in general) is that it returns probabilities
These probabilities can tell us how confident the algorithm is
So… don’t throw away those probabilities!

Performance on a Test Set
Naïve Bayes is often a good choice if you don’t have much training data!

Size of training set
Classification Accuracy

Naïve Bayes Assumption
Recall the Naïve Bayes assumption:

that all features are independent given the class label Y

Does this hold for the digit recognition problem?

Exclusive-OR Example
For an example where conditional independence fails:
Y=XOR(X1,X2)

X1 X2 P(Y=0|X1,X2) P(Y=1|X1,X2)
0 0 1 0
0 1 0 1
1 0 0 1
1 1 1 0

Naïve Bayes assumption
Actually, the Naïve Bayes assumption is almost never true.

Still… Naïve Bayes often performs surprisingly well even when its assumptions do not hold.

Numerical Stability: zero issue
It is often the case that machine learning algorithms need to work with very small numbers
Imagine computing the probability of 2000 independent coin flips
MATLAB thinks that (.5)2000=0

Underflow Prevention
Multiplying lots of probabilities

 floating-point underflow.

Recall: log(xy) = log(x) + log(y),

 better to sum logs of probabilities rather than multiplying probabilities.

Underflow Prevention
Class with highest final un-normalized log probability score is still the most probable (log of Bayes Rule).

(
)

Numerical Stability
Instead of comparing P(Y=5|X1,…,Xn) with P(Y=6|X1,…,Xn),
Compare their logarithms

Recap
We defined a Bayes classifier but saw that it’s intractable to compute P(X1,…,Xn|Y).
We then used the Naïve Bayes assumption – that everything is independent given the class label Y.

A natural question: is there some happy compromise where we only assume that some features are conditionally independent?

Pros and cons of Naïve Bayes
Advantages:
It is relatively simple to understand and build.
It is easily trained, even with a small dataset.
It is fast.
It is not sensitive to irrelevant features.

Disadvantages:
It assumes every feature is independent, which is not always the case.

An Example of Naïve Bayes
(http://blog.aylien.com/naive-bayes-for-dummies-a-simple-explanation/)
So, let’s say we have data on 1000 pieces of fruit. The fruit being a Banana, Orange or some Other fruit and imagine we know 3 features of each fruit, whether it’s long or not, sweet or not and yellow or not, as displayed in the table below:

An Example of Naïve Bayes
So from the table what do we already know?
50% of the fruits are bananas
30% are oranges
20% are other fruits

Prior

An Example of Naïve Bayes
Based on our training set we can also say the following:
From 500 bananas 400 (0.8) are Long, 350 (0.7) are Sweet and 450 (0.9) are Yellow
Out of 300 oranges 0 are Long, 150 (0.5) are Sweet and 300 (1) are Yellow
From the remaining 200 fruits, 100 (0.5) are Long, 150 (0.75) are Sweet and 50 (0.25) are Yellow
Which should provide enough evidence to predict the class of another fruit as it’s introduced.

Likelihood

An Example of Naïve Bayes
So let’s say we’re given the features of a piece of fruit and we need to predict the class.
If we’re told that the additional fruit is Long, Sweet and Yellow, we can classify it using the following formula and subbing in the values for each outcome, whether it’s a Banana, an Orange or Other Fruit.
The one with the highest probability (score) being the winner.

An Example of Naïve Bayes
Banana:

P(Banana|Long,Sweet,Yellow) = 0.8 x 0.7 x 0.9 x 0.5 = 0.252

An Example of Naïve Bayes
Orange:

P(Orange|Long,Sweet,Yellow) = 0.0 x 0.5 x 1.0 x 0.3 = 0

An Example of Naïve Bayes
Other Fruit:

P(Other|Long,Sweet,Yellow) = 0.5 x 0.75 x 0.25 x 0.2 = 0.01875

Evaluating classification algorithms

Evaluating classification algorithms
You have designed a new classifier.

You give it to me, and I try it on my image dataset.

Evaluating classification algorithms
I tell you that it achieved 95% accuracy on my data.

Is your technique a success?

Types of errors
But suppose that
The 95% is the correctly classified pixels
Only 5% of the pixels are actually edges
It misses all the edge pixels

How do we count the effect of different types of error?

Evaluation for Classification

Evaluation Metrics
Confusion Matrix: shows performance of an algorithm, especially predictive capability.
rather than how fast it takes to classify, build models, or scalability.

Type I and II error
No, but you say Yes
Yes, but you say No

Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?

Methods for Performance Evaluation
How to obtain reliable estimates?

Methods for Model Comparison
How to compare the relative performance among competing models?

Metrics for Performance Evaluation
Focus on the predictive capability of a model
Rather than how fast it takes to classify or build models, scalability, etc.
Confusion Matrix:

a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes a b
Class=No c d

Metrics for Performance Evaluation…

Most widely-used metric:

PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes a
(TP) b
(FN)
Class=No c
(FP) d
(TN)

Limitation of Accuracy
Consider a 2-class problem
Number of Class 1 examples = 9990
Number of Class 2 examples = 10

If model predicts everything to be class 1, accuracy is 9990/10000 = 99.9 %
Accuracy is misleading because model does not detect any class 2 example

Cost Matrix
C(i|j): Cost of misclassifying class j example as class i
PREDICTED CLASS

Computing Cost of Classification
Accuracy = 80%
Cost = 3910
Accuracy = 90%
Cost = 4255
Cost Matrix PREDICTED CLASS

ACTUAL
CLASS C(i|j) + –
+ -1 100
– 1 0

Model M1 PREDICTED CLASS

ACTUAL
CLASS + –
+ 150 40
– 60 250

Model M2 PREDICTED CLASS

ACTUAL
CLASS + –
+ 250 45
– 5 200

Cost vs Accuracy
Count PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes a b
Class=No c d

Cost PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes p q
Class=No q p

N = a + b + c + d

Accuracy = (a + d)/N

Cost = p (a + d) + q (b + c)
= p (a + d) + q (N – a – d)
= q N – (q – p)(a + d)
= N [q – (q-p)  Accuracy]
Accuracy is proportional to cost if
1. C(Yes|No)=C(No|Yes) = q
2. C(Yes|Yes)=C(No|No) = p

Cost-Sensitive Measures
Precision and Recall are two widely used metrics employed in applications where successful detection of one of the classes is considered more important than detection of the other classes.

Evaluation Metrics
Sensitivity or True Positive Rate (TPR)
TP/(TP+FN)
A parameter describing the success in finding a particular type of target (also called hit rate)
Specificity or True Negative Rate (TNR)
TN/(FP+TN)
A term that is important in medicine and relates to the proportion of well patients who are accurately told after the test they are not ill.

PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes a
(TP) b
(FN)
Class=No c
(FP) d
(TN)

Evaluation Metrics
Recall (= sensitivity= TPR)
TP/(TP+FN)
A term used when describing the success in finding an item in a database (example: information retrieval).

Discriminability
TP/(TP+FP)
A term used when describing the success in differentiating a particular type of target from a similar type of target.

Precision or Positive Predictive Value (PPV)
TP/(TP+FP)
A term describing the accuracy in picking out a particular type of target from any distractors, including noise and clutter.

Evaluation Metrics
Sensitivity = Recall
TP/(TP+FN)
Discriminability = Precision
TP/(TP+FP)
Negative Predictive Value (NPV)
TN/(TN+FN)
Accuracy
(TP+TN)/(TP+FP+TN+FN)

PREDICTED CLASS

ACTUAL
CLASS Class=Yes Class=No
Class=Yes a
(TP) b
(FN)
Class=No c
(FP) d
(TN)

ROC (Receiver Operating Characteristic)
A good classification model should be located as close as possible to the upper left corner of the diagram.

Performance of each classifier represented as a point on the ROC curve
changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point

ROC curve
Receiver Operating Characteristic (ROC)
Graphical approach for displaying the tradeoff between true positive rate(TPR) and false positive rate (FPR) of a classifier
TPR = positives correctly classified/total positives
FPR = negatives incorrectly classified/total negatives
TPR on y-axis and FPR on x-axis

ROC Curve

– 1-dimensional data set containing 2 classes (positive and negative)
– any points located at x > t is classified as positive

At threshold t:
TP=0.5, FN=0.5, FP=0.12, FN=0.88

ROC curve

ROC curve
Points of interests (TP, FP)
(0, 0): everything is negative
(1, 1): everything is positive
(1, 0): perfect (ideal)
Diagonal line
Random guessing (50%)
Area Under Curve (AUC)
Measurement how good the model on the average
Good to compare with other methods

ROC Curve
(TP,FP):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(1,0): ideal

Diagonal line:
Random guessing
Below diagonal line:
prediction is opposite of the true class

ROC Curve
A model that is strictly better than another would have a larger area under the ROC curve.
If the model is perfect, then its area under the ROC curve would equal to 1.
If the model simply performs random guessing, then its area under the ROC curve would equal 0.5.

Types of errors (TP, FP, FN and TN)
Prediction
Edge Not edge

Ground Truth
Not Edge Edge

TP
FP
FN
TN
True
Positive False
Negative
False
Positive True Negative

True Positive
Two parts to each: whether you got it correct or not, and what you guessed. For example for a particular pixel, our guess might be labelled…
Did we get it correct? True, we did get it correct.

False Negative
Did we get it correct? False, we did not get it correct.

or maybe it was labelled as one of the others, maybe…
What did we say? We said ‘positive’, i.e. edge.
What did we say? We said ‘negative, i.e. not edge.

Sensitivity and Specificity
Count up the total number of each label (TP, FP, TN, FN) over a large dataset. In ROC analysis, we use two statistics:
Sensitivity =
TP
TP+FN
Specificity =
TN
TN+FP
Can be thought of as the likelihood of spotting a positive case when presented with one.
Or… the proportion of edges we find.
Can be thought of as the likelihood of spotting a negative case when presented with one.
Or… the proportion of non-edges that we find

Sensitivity = = ?
TP
TP+FN
Specificity = = ?
TN
TN+FP

Prediction
Ground Truth
1
1
0
0

60
30
20
80
80+20 = 100 cases in the dataset were class 0 (non-edge)
60+30 = 90 cases in the dataset were class 1 (edge)
90+100 = 190 examples (pixels) in the data overall

The ROC space
1 – Specificity
Sensitivity
This is edge detector B
This is edge detector A
1.0
0.0
1.0
Note
ROC: Receiver Operating Characteristic

ROC
In statistics, a receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

Receiver operating characteristic – Wikipedia
https://en.wikipedia.org/wiki/Receiver_operating_characteristic

The ROC Curve
Draw a ‘convex hull’ around many points:
1 – Specificity
Sensitivity
This point is not on the convex hull.

ROC Analysis
1 – specificity
sensitivity

All the optimal detectors lie on the convex hull.
Which of these is best depends on the ratio of edges to non-edges, and the different cost of misclassification
Any detector on this side can lead to a better detector by flipping its output.
Take-home point : You should always quote sensitivity and specificity for your algorithm, if possible plotting an ROC graph. Remember also though, any statistic you quote should be an average over a suitable range of tests for your algorithm.

Holdout estimation
What to do if the amount of data is limited?

The holdout method reserves a certain amount for testing and uses the remainder for training.

Usually: one third for testing, the rest for training

Holdout estimation
Problem: the samples might not be representative
Example: class might be missing in the test data

Advanced version uses stratification
Ensures that each class is represented with approximately equal proportions in both subsets

Repeated holdout method
Repeat process with different subsamples

 more reliable

In each iteration, a certain proportion is randomly selected for training (possibly with stratificiation)

The error rates on the different iterations are averaged to yield an overall error rate

Repeated holdout method
Still not optimum: the different test sets overlap

Can we prevent overlapping?

Of course!

Holdout
Split dataset into two groups for training and test
Training dataset: used to train the model
Test dataset: use to estimate the error rate of the model

Drawback
When “unfortunate split” happens, the holdout estimate of error rate will be misleading

Entire dataset
Training set
Test set

Split the data into two

Random Subsampling
Split the data set into two groups
Randomly selects a number of samples without replacement
Usually, one third for testing, the rest for training

K-Fold Cross-validation
K-fold Partition
Partition K equal sized sub groups
Use K-1 groups for training and the remaining one for testing

Experiment 1
Experiment 2
Experiment 3
Experiment 4
Experiment 5
Test set

Cross-validation
Cross-validation avoids overlapping test sets
First step: split data into k subsets of equal size
Second step: use each subset in turn for testing, the remainder for training

Called k-fold cross-validation

Cross-validation
Often the subsets are stratified before the cross-validation is performed

The error estimates are averaged to yield an overall error estimate

More on cross-validation
Standard method for evaluation: stratified ten-fold cross-validation
Why ten?
Empirical evidence supports this as a good choice to get an accurate estimate
There is also some theoretical evidence for this
Stratification reduces the estimate’s variance
Even better: repeated stratified cross-validation
E.g. ten-fold cross-validation is repeated ten times and results are averaged (reduces the variance)

Leave-One-Out cross-validation
Leave-One-Out:
a particular form of cross-validation:
Set number of folds to number of training instances
I.e., for n training instances, build classifier n times
Makes best use of the data
Involves no random subsampling
Very computationally expensive
(exception: NN)

Leave-One-Out-CV and stratification
Disadvantage of Leave-One-Out-CV: stratification is not possible
It guarantees a non-stratified sample because there is only one instance in the test set!

Conclusions
Naïve Bayes is:
Really easy to implement and often works well
Often a good first thing to try
Commonly used as a “punching bag” for smarter algorithms

Evaluate classification algorithms
TP, FP, FN, TN
ROC
Cross validation

Questions & Suggestions?
The End

Appendix

Recovering the Probabilities (skip)
What if we want the probabilities though??
Suppose that for some constant K, we have:

And

How would we recover the original probabilities?

Recovering the Probabilities (skip)
Given:
Then for any constant C:

One suggestion: set C such that the greatest i is shifted to zero:

See https://stats.stackexchange.com/questions/105602/example-of-how-the-log-sum-exp-trick-works-in-naive-bayes?noredirect=1&lq=1

Detour: Model Parameters
In the context of a mathematical model, such as a probability distribution, the distinction between variables and parameters was described by Bard as follows: We refer to the relations which supposedly describe a certain physical situation, as a model. Typically, a model consists of one or more equations.

Parameter – Wikipedia
https://en.wikipedia.org/wiki/Parameter

Detour: Model Parameters
Mathematical functions have one or more arguments that are designated in the definition by variables. A function definition can also contain parameters, but unlike variables, parameters are not listed among the arguments that the function takes. When parameters are present, the definition actually defines a whole family of functions, one for every valid set of values of the parameters.

Parameter – Wikipedia
https://en.wikipedia.org/wiki/Parameter

Example: Model Parameters
For instance, one could define a general quadratic function by declaring

f ( x ) = a x 2 + b x + c

here, the variable x designates the function’s argument, but a, b, and c are parameters that determine which particular quadratic function is being considered.

Parameter – Wikipedia
https://en.wikipedia.org/wiki/Parameter

0053
.
0
14
9
9
3
9
3
9
3
9
2
=
´
´
´
´
=

0206
.
0
14
5
5
3
5
4
5
1
5
3
=
´
´
´
´
=

(
)
(
)
2
2
2
1
)
(
1
1
1
1
2
1
s
m
s
p
m
s
m
–
–
=
=
=
–
–
=
=
å
å
w
e
w
f
x
n
x
n
n
i
i
n
i
i

000036
.
0
14
9
9
3
0221
.
0
0340
.
0
9
2
=
´
´
´
´

000136
.
0
14
5
5
3
038
.
0
0291
.
0
5
3
=
´
´
´
´

(
)
(
)
(
)
(
)
0340
.
0
2
.
6
2
1
Yes
|
66
e
temperatur
2
2
.
6
2
2
73
66
=
=
=
–
–
e
f
p

å
Î
Î
+
=
positions
i
j
i
j
C
c
NB
c
x
P
c
P
c
)
|
(
log
)
(
log
argmax
j

FN
FP
TN
TP
TN
TP
d
c
b
a
d
a
+
+
+
+
=
+
+
+
+
=
Accuracy

c
b
a
a
p
r
rp
b
a
a
c
a
a
+
+
=
+
=
+
=
+
=
2
2
2
(F)

measure
–
F
(r)

Recall

(p)
Precision

d
w
c
w
b
w
a
w
d
w
a
w
4
3
2
1
4
1
Accuracy

Weighted
+
+
+
+
=

Related Posts