Learning Theory
COMP9417 Machine Learning & Data Mining
Term 1, 2022
Adapted from slides by Dr Michael
Copyright By PowCoder代写 加微信 powcoder
This lecture will introduce you to some foundational results that apply in machine learning irrespective of any particular algorithm and will enable you to define and reproduce some of the fundamental approaches and results from the computational and statistical theory. Following it you should be able to:
• describe a basic theoretical framework for sample complexity of learning
• describe the Probably Approximately Correct (PAC) learning framework
• describe the Vapnik-Chervonenkis (VC) dimension framework
COMP9417 T1, 2022 1
Introduction
Machine learning: Have a computer solve problems by learning from data
rather than being explicitly programmed.
o Regression
o Classification
o Clustering
o Reinforcement learning o…
COMP9417 T1, 2022 2
Computational Learning Theory
The goal of learning theory is to develop and analyse formal models that help us understand:
o what concepts we can hope to learn efficiently, and how much data is necessary to learn them
o what types of guarantees we might hope to achieve (error bounds, complexity bounds)
o why particular algorithms may or may not perform well under various conditions
COMP9417 T1, 2022 3
Leaning theory aims at a body of theory that captures all important aspects of the fundamentals of the learning process and any algorithm or class of algorithms designed to do learning — i.e., we desire theory to capture the algorithm-independent aspects of machine learning.
BUT: we’re not quite there yet …
COMP9417 T1, 2022 4
Computational Learning Theory
Inductive learning: learning from examples and all machine learning algorithms are a kind of inductive learning and there are some questions that we are interested to be able to answer in such framework:
o Probability of successful learning
o Number of training examples
o Complexity of hypothesis space
o Time complexity of learning algorithm
o Accuracy to which target concept is approximated o Manner in which training examples presented
COMP9417 T1, 2022 5
Computational Learning Theory
Some questions to ask, without focusing on any particular algorithm:
• Sample complexity
– How many training examples required for learner to converge (with high probability) to a successful hypothesis ?
• Computational complexity
– How much computational effort required for learner to converge (with high probability) to a successful hypothesis ?
• Hypothesis complexity
– How do we measure the complexity of a hypothesis ?
– How large is a hypothesis space?
COMP9417 T1, 2022 6
Computational Learning Theory
What do we consider to be a successful hypothesis:
– identical to target concept ?
– mostly agrees with target concept … ?
– … does this most of the time ?
COMP9417 T1, 2022 7
Computational Learning Theory
Instead of focusing on particular algorithms, learning theory aims to characterize classes of algorithms.
Probably Approximately Correct (PAC) is a framework for mathematical analysis of learning which was proposed by Valiant in 1984.
COMP9417 T1, 2022 8
Probably Approximately Correct (PAC)
The framework of Probably Approximately Correct (PAC) learning can be used to address questions such as:
– How many training examples?
– How much computational effort required? – How complex a hypothesis class needed? – How to quantify hypothesis complexity?
– How many mistakes will be made ?
COMP9417 T1, 2022 9
Sample Complexity
• set of instances 𝑋
• set of hypotheses 𝐻
• set of possible target concepts 𝐶
• training instances generated by a fixed, unknown probability distribution 𝒟 over 𝑋
Learnerobservesasequence𝐷oftrainingexamplesofform x,𝑐(x),for some target concept 𝑐 ∈ 𝐶
o Instances x are drawn from distribution 𝒟
o teacher provides target value 𝑐(x) for each x
Learner must output a hypothesis h estimating 𝑐
o h is evaluated by its performance on subsequent
instances drawn according to 𝒟 Note: randomly drawn instances
COMP9417 T1, 2022 10
True Error of a Hypothesis
COMP9417 T1, 2022 11
True Error of a Hypothesis
Definition: The true error (denoted 𝑒𝑟𝑟𝑜𝑟𝒟(h)) of hypothesis h with respect to target concept 𝑐 and distribution 𝒟 is the probability that h will misclassify an instance drawn at random according to 𝒟.
𝑒𝑟𝑟𝑜𝑟𝒟(h)≡ Pr[𝑐x≠h(x)] x∈𝒟
COMP9417 T1, 2022 12
Two Notions of Error
Training error of hypothesis h with respect to target concept 𝑐: o How often h x ≠ 𝑐(x) over training instances
True error of hypothesis h with respect to 𝑐:
o How often h x ≠ 𝑐(x) over future random instances
Our concern:
– Can we bound the true error of h given the training error of h? – First consider when training error of h is zero (i.e., h ∈ 𝑉 𝑆9,:)
COMP9417 T1, 2022 13
PAC Learning
Consider a class 𝐶 of possible target concepts defined over a set of instances 𝑋 of length 𝑛, and a learner 𝐿 using hypothesis space 𝐻.
Definition: 𝐶 is PAC-learnable by 𝐿 using 𝐻 if for all 𝑐 ∈ 𝐶, distributions
𝒟over𝑋,𝜖suchthat0 < 𝜖< 1/2,and𝛿suchthat0 <𝛿 < 1/2, learner 𝐿 will with probability at least (1 − 𝛿) output a hypothesis h ∈
𝐻 such that 𝑒𝑟𝑟𝑜𝑟𝒟(h) ≤ 𝜖 in time (for sample size) that is polynomial in 1⁄𝜖, 1⁄𝛿, 𝑛 and 𝑠𝑖𝑧𝑒(𝑐).
Probably Approximately Correct Learning L. Valiant, (1984; 2013).
COMP9417 T1, 2022 14
PAC Learning
– 𝐶: concept class
– 𝐿 : learner
– 𝐻: hypothesis space
– 𝑛: length of instances
– |𝐻|, size of hypothesis space
– 𝒟: distribution over inputs
– 𝜖:errorgoal(0 < 𝜖< 1⁄2)
– 𝛿:certaintygoal(0 <𝛿 < 1⁄2)
𝐶 is PAC-learnable by 𝐿 using 𝐻, if learner 𝐿, with probability (1 − 𝛿) , outputs a hypothesis h ∈ 𝐻, such that 𝑒𝑟𝑟𝑜𝑟𝒟(h) ≤ 𝜖 for a sample size polynomial in 1⁄𝜖, 1⁄𝛿, 𝑛 and 𝑠𝑖𝑧𝑒(𝑐).
We start to look at PAC learning using Concept Learning. COMP9417 T1, 2022 15
Prototypical Concept Learning Task
EnjoySport
• A set of example days, and each is described by six attributes.
• The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its attribute values.
COMP9417 T1, 2022 16
Prototypical Concept Learning Task
Instances 𝑋: Possible days, each described by the attributes Sky, AirTemp, Humidity, Wind, Water, Forecast
Target function c: EnjoySport : 𝑋 → 0,1
Hypotheses 𝐻: Conjunctions of literals.
Training examples 𝐷: Positive and negative examples of target function xN, 𝑐(xN) , ... , xP, 𝑐(xP)
Determine:
A hypothesis h in 𝐻 such that h(x) = 𝑐(x) for all x in 𝐷? A hypothesis h in 𝐻 such that h(x) = 𝑐(x) for all x in 𝑋?
COMP9417 T1, 2022 17
Concept Learning Concept learning: can be formulated as:
“problem of searching through a predefined space of potential hypothesis that best fits the training examples”
"the search for and listing of attributes that can be used to distinguish
exemplars from non exemplars of various categories"
Bruner, Goodnow, & Austin (1967)
Concept Learning: Acquiring the definition of a general category from given positive and negative training examples of the category.
COMP9417 T1, 2022 18
Concept Learning
Hypothesis representation for concept learning task:
– Each hypothesis consists of a conjunction of constraints on the instance attributes.
– Each hypothesis will be a vector of six constraints, specifying the values of the six attributes
– Each attribute will be:
o ? - indicating any value is acceptable for the attribute (don’t care)
o single value – specifying a single required value (ex. Warm) (specific) o ∅ - indicating no value is acceptable for the attribute (no value)
COMP9417 T1, 2022 19
Concept Learning
• EnjoySport concept learning task requires learning the sets of days for which 𝐸𝑛𝑗𝑜𝑦𝑆𝑝𝑜𝑟𝑡 = 𝑦𝑒𝑠, describing this set by a conjunction of constraints over the instance attributes.
• One example of hypothesis:
< 𝑆𝑘𝑦 = 𝑆𝑢𝑛𝑛𝑦, 𝐴𝑖𝑟𝑇𝑒𝑚𝑝 =? , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 =? , 𝑊𝑖𝑛𝑑 = 𝑆𝑡𝑟𝑜𝑛𝑔, 𝑊𝑎𝑡𝑒𝑟 =? , 𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = 𝑠𝑎𝑚𝑒 >
• The most general hypothesis:
< 𝑆𝑘𝑦 =?,𝐴𝑖𝑟𝑇𝑒𝑚𝑝 =?,𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 =?,𝑊𝑖𝑛𝑑 =?,𝑊𝑎𝑡𝑒𝑟 =?,𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 =?>
• The most specific hypothesis:
< 𝑆𝑘𝑦 = ∅,𝐴𝑖𝑟𝑇𝑒𝑚𝑝 = ∅,𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = ∅,𝑊𝑖𝑛𝑑 = ∅,𝑊𝑎𝑡𝑒𝑟 = ∅,𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = ∅ >
COMP9417 T1, 2022 20
Concept Learning as Search Question: What can be learned?
Answer: (only) what is in the hypothesis space
• Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation.
• The goal of this search is to find the hypothesis that best fits the training examples.
COMP9417 T1, 2022 21
EnjoySport – Hypothesis Space
How big is the hypothesis space for EnjoySport ?
• Instance space
• Hypothesis space
𝑆𝑘𝑦×𝐴𝑖𝑟𝑇𝑒𝑚𝑝×⋯×𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = 3×2×2×2×2×2 = 96 instances
𝑆𝑘𝑦×𝐴𝑖𝑟𝑇𝑒𝑚𝑝×⋯×𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = 5×4×4×4×4×4
= 5120 synthetically distinct hypothesis
𝑆𝑘𝑦×𝐴𝑖𝑟𝑇𝑒𝑚𝑝×⋯×𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 =1+4×3×3×3×3×3
= 973 semantically distinct hypothesis
(Any hypothesis with an ∅ constraint covers no instances, hence all are semantically equivalent)
COMP9417 T1, 2022 22
Instances, Hypotheses, and More-General-Than
COMP9417 T1, 2022 23
A generality order on hypotheses
Definition: Let hm and hn be Boolean-valued functions defined over instances 𝑋. Then hm is more-general-than-or-equal-to hn (written hm ≥p hn) if and only if
(∀𝑥 ∈ 𝑋)[(hn(x) = 1) → (hm(x) = 1)]
Intuitively, hm is more-general-than-or-equal-to hn if any instance satisfying hn
also satisfies hm.
hm is (strictly) more general than hn (written hm >p hn) if and only if
(hm≥p hn)⋀(hn≱p hm).
hm is more specific than hn when hn is more general than hm.
COMP9417 T1, 2022 24
Version Space
A hypothesis h is consistent with a set of training examples 𝐷 of target concept 𝑐 if and only if h(x) = 𝑐(x) for each training example x, 𝑐(x) in 𝐷.
𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡(h, 𝐷) ≡ (∀ x, 𝑐(x) ∈ 𝐷) h(x) = 𝑐(x)
The version space, 𝑉 𝑆9,:, with respect to hypothesis space 𝐻 and training examples 𝐷, is the subset of hypotheses from 𝐻 consistent with all training examples in 𝐷.
𝑉 𝑆9,: ≡ h ∈ 𝐻 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑡(h, 𝐷)}
COMP9417 T1, 2022 25
Exhausting the Version Space
COMP9417 T1, 2022 26
Exhausting the Version Space
Note: in the diagram
(𝑟 = trainingerror,𝑒𝑟𝑟𝑜𝑟 = trueerror)
Definition: The version space 𝑉𝑆9,: is said to be 𝜖-exhausted with respect to 𝑐 and 𝒟, if every hypothesis h in 𝑉𝑆9,: has error less than 𝜖 with respect to 𝑐 and 𝒟.
(∀ h∈𝑉𝑆9,:)𝑒𝑟𝑟𝑜𝑟𝒟(h) < 𝜖
So 𝑉𝑆9,: is not 𝜖-exhausted if it contains at least one h with 𝑒𝑟𝑟𝑜𝑟𝒟(h) ≥ 𝜖
COMP9417 T1, 2022 27
How many examples will -exhaust the VS? Theorem [Haussler, 1988].
If the hypothesis space 𝐻 is finite, and 𝐷 is a sequence of 𝑚 ≥ 1 independent random examples of some target concept 𝑐, then for any
0 ≤ 𝜖 ≤ 1, the probability that the version space with respect to 𝐻 and 𝐷 is not 𝜖-exhausted (with respect to 𝑐) is less than:
Interesting! This bounds the probability that any consistent learner will output a hypothesis h with 𝑒𝑟𝑟𝑜𝑟(h) ≥ 𝜖.
COMP9417 T1, 2022 28
How many examples will 𝝐-exhaust the VS? If we want this probability to be below 𝛿
𝑚 ≥ 1𝜖 l n 𝐻 + l n 𝛿1
COMP9417 T1, 2022 29
How many examples will 𝝐-exhaust the VS? How many examples are sufficient to assure with probability at least
(1 − 𝛿) that every h in 𝑉 𝑆9,: satisfies 𝑒𝑟𝑟𝑜𝑟𝒟(h) ≤ 𝜖? Use our theorem:
𝑚 ≥ 1𝜖 l n 𝐻 + l n 𝛿1
COMP9417 T1, 2022 30
How About EnjoySport?
𝑚 ≥ 1𝜖 l n 𝐻 + l n 𝛿1
If 𝐻 is as given in EnjoySport then |𝐻| = 973, and 𝑚≥1𝜖 ln973+ln 𝛿1
COMP9417 T1, 2022 31
How About EnjoySport?
... if want to assure that with probability 95%, 𝑉𝑆 contains only hypotheses with 𝑒𝑟𝑟𝑜𝑟𝒟(h) ≤ 0.1, then it is sufficient to have m examples, where
𝑚≥1 ln973+ln 1 0.1 0.05
𝑚≥10 ln973+ln 20 𝑚≥10 6.88+3
COMP9417 T1, 2022 32
Agnostic PAC Learning
So far, assumed 𝑐 ∈ 𝐻 — consistent learners (noise-free)
Agnostic learning setting: don’t assume 𝑐 ∈ 𝐻
• What do we want then?
– The hypothesis h that makes fewest errors on training data
Hoeffding bounds:
For hypothesis space 𝐻:
𝑃𝑟 𝑒𝑟𝑟𝑜𝑟𝒟 h > 𝑒𝑟𝑟𝑜𝑟: h + 𝜖 ≤ 𝑒~áPà 𝑃𝑟[𝑒𝑟𝑟𝑜𝑟𝒟(hâäãå) > 𝑒𝑟𝑟𝑜𝑟:(hâäãå) + 𝜖] ≤ |𝐻|𝑒~áPà
• What is sample complexity in this case?
𝑚≥ 1 ln 𝐻 +ln 1 2𝜖á 𝛿
COMP9417 T1, 2022
PAC Learning
So far, we have only considered cases with finite hypothesis space, however in reality many of the hypothesis spaces that we use have infinite dimension.
…what can we do/say about such hypothesis spaces?
COMP9417 T1, 2022 34
Hypothesis Space Size
Which hypothesis spaces are infinite, and which are finite? q Decision trees (with discrete inputs)
q Neural networks
q Linear classifiers
q Decision trees (with continuous inputs)
COMP9417 T1, 2022 35
Learners and Complexity
• Different learners have different complexity
• Complexity relates to the “representational power”
• Usual trade-off:
– More power = represent more complex systems, may overfit
– Less power: won’t overfit, but may not find “best learner”
Question: How can we quantify complexity?
– Not easily…
– One solution is VC (Vapnik-Chervonenkis) dimension
COMP9417 T1, 2022 36
Learners and Complexity
• How complexity relates to PAC learning?
– We want to be able to say something about the true error of the hypothesis based on the training error of the hypothesis
– We know that:
o in underfitting domain: those two errors are pretty similar
o In overfitting domain: test error (which is a representation of true error) might be much worse!
COMP9417 T1, 2022 37
Shattering a Set of Instances
Definition: a dichotomy of a set 𝑆 is a partition of 𝑆 into two disjoint subsets.
Definition: a set of instances 𝑆 is shattered by hypothesis space 𝐻 if and only if for every dichotomy of 𝑆 there exists some hypothesis in 𝐻 consistent with this dichotomy.
COMP9417 T1, 2022 38
Shattering
We say a classifier 𝑓(x) can shatter points xN, …, xP iff for all possible 𝑦N, … , 𝑦P, 𝑓(x) can achieve zero error on training data (xN, 𝑦N),…, (xP, 𝑦P). (i.e., there exist some 𝜃 that gets zero error)
Can 𝑓 x = sign(𝜃ê + 𝜃N𝑥N + 𝜃á𝑥á) shatter these points?
COMP9417 T1, 2022 39
Shattering Example:
Can 𝑓 x = sign(xëx − 𝜃) shatter these points?
This particular classifier can not shatter these two points.
This configuration can not be classified by 𝑓 x
COMP9417 T1, 2022 40
The Vapnik-Chervonenkis Dimension
Definition: The Vapnik-Chervonenkis dimension,𝑉𝐶(𝐻), of hypothesis space 𝐻 defined over instance space 𝑋 is the size of the largest finite subset of 𝑋 shattered by 𝐻. If arbitrarily large finite sets of 𝑋 can be shattered by 𝐻, then 𝑉𝐶(𝐻) ≡ ∞.
𝑽𝑪 dimension: the largest set of inputs that the hypothesis class can shatter (label in all possible ways).
Note: the 𝑉𝐶 dimension can be defined for an infinite hypothesis space 𝐻 since it depends only on the size of finite subsets of 𝑋.
COMP9417 T1, 2022 41
The Vapnik-Chervonenkis Dimension
𝑋 = {1,…,10}
𝐻 = {h(x)=x≥𝜃}
For one input (e.g., 𝑆 = {6}), the hypothesis class can label it in all possible ways.
What about any pair of input (e.g., 𝑆 = {5,7})? • No. So, 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 = 1
Although 𝐻, here, is an infinite hypothesis space, but it is a weak hypothesis space and not very expressive!
COMP9417 T1, 2022 42
The Vapnik-Chervonenkis Dimension
𝐻 = {h(x)=x∈[𝑎,𝑏]}
• What is the size of parameter?
• What is the size of hypothesis space?
• What is the size of 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛?
COMP9417 T1, 2022 43
The Vapnik-Chervonenkis Dimension
𝐻 = {h(x)=x∈[𝑎,𝑏]}
• What is the size of parameter? 2 parameters 𝑎 and 𝑏
• What is the size of hypothesis space? infinite
• What is the size of 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛? 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛 = 2
COMP9417 T1, 2022 44
The Vapnik-Chervonenkis Dimension
𝐻 = {h(x)=𝑤ëx≥𝜃}
• What is the size of parameter?
• What is the size of hypothesis space?
• What is the size of 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛?
COMP9417 T1, 2022 45
The Vapnik-Chervonenkis Dimension
𝐻 = {h(x)=𝑤ëx≥𝜃}
• What is the size of parameter? 3 parameters
• What is the size of hypothesis space? infinite
• What is the size of 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛? See next 3 slides
COMP9417 T1, 2022 46
VC Dimension of Linear Decision Surfaces
From the left, shown are three dichotomies of the same three instances. Can a linear classifier be found for the other five dichotomies? On the right, this set of
four instances clearly cannot be shattered by a hypothesis space of linear classifiers.
COMP9417 T1, 2022 47
VC Dimension of Linear Decision Surfaces
Clearly, for a subset of 2 instances we can find a linear classifier for all possible dichotomies.
The same argument as for 2 instances applies (see first three examples on previous slide, and case (a) on next slide), as long as the instances are not collinear (case (b) on next slide). So the 𝑉𝐶 dimension is at least 3.
However, in this setting, there is no set of 4 points that can be shattered. In general, for linear classifiers in 𝑑 dimensions (𝑑 is number of features)
the 𝑉𝐶 dimension is 𝑑 + 1.
COMP9417 T1, 2022 48
VC Dimension of Linear Decision Surfaces
COMP9417 T1, 2022 49
VC dimension and Number of parameters
Note that VC dimension does not necessarily equal the number of parameters!
Number of parameters does not necessarily equal complexity:
– Can define a classifier with lot of parameters but not much power!
– Can define a classifier with one parameter, but lots of power! (look at the example in next slide)
COMP9417 T1, 2022 50
How Complex is a Hypothesis ?
COMP9417 T1, 2022 51
How Complex is a Hypothesis ?
The solid curve is the function sin(50𝑥) for 𝑥 ∈ [0,1].
The blue (solid) and green (hollow) points illustrate how the associated indicator function 𝐼(sin(𝛼 𝑥) > 0) can shatter (separate) an arbitrarily large number of points by choosing an appropriately high frequency 𝛼.
Classes separated based on sin(𝛼 𝑥), for frequency 𝛼, a single parameter.
COMP9417 T1, 2022 52
Sample Complexity from VC Dimension
We can now generalize the PAC-learning result obtained earlier to answer the question: how many randomly drawn examples suffice to 𝜖-exhaust 𝑉𝑆9,: with
probability at least (1 − 𝛿)?
𝑚 ≥ 1𝜖 (4logá(2/𝛿) + 8𝑉𝐶(𝐻)logá(13/𝜖))
So we see that the concept of the 𝑉𝐶 dimension of a hypothesis class gives us a general framework for characterizing the complexity or capacity of hypotheses in terms of their ability to express all possible target concepts in a particular learning setting.
COMP9417 T1, 2022 53
Sample Complexity and 𝑽𝑪 dimension
• For finite hypothesis space:
𝑚 ≥ 1𝜖 l n 𝐻 + l n 𝛿1
• For infinite hypothesis space with finite 𝑉𝐶 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛:
𝑚 ≥ 1𝜖 (4logá(2/𝛿) + 8𝑉𝐶(𝐻)logá(13/𝜖))
COMP9417 T1, 2022 54
𝑽𝑪 dimension of finite 𝑯
If 𝑑 = 𝑉𝐶(𝐻), then in means that there are at least 2ú hypothesis in the hypothesis space that can shatter 𝑑 points in all possible ways (which is 2ú ), so:
2ú ≤ |H| 𝑑 ≤ logá |𝐻|
𝐻 is PAC-learnable if and only if 𝑉𝐶 dimension is finite.
COMP9417 T1, 2022 55
• PAC learning
• Sample complexity
• Version space
• VC dimension and hypothesis space complexity
• VC dimension capture PAC-learnability
COMP9417 T1, 2022 56
Acknowledgements
• Material derived from slides for the book “Elements of Statistical Learning (2nd Ed.)” by T. Hastie, R. Tibshirani & J. Friedman. Springer (2009) http://statweb.stanford.edu/~tibs/ElemStatLearn/
• Material derived from slides for the book “Machine Learning: A Probabilistic Perspective” by P. IT Press (2012) http://www.cs.ubc.ca/~murphyk/MLbook
• Material derived from slides for the book “Machine Learning” by P. Flach Cambridge University Press (2012) http://cs.bris.ac.uk/~flach/mlbook
• Material derived from slides for the book “Bayesian Reasoning and Machine Learning” by D. University Press (2012) http://www.cs.ucl.ac.uk/staff/d.barber/brml
• Material derived from slides for the book “Machine Learning” by T. Graw- Hill (1997) http://www-2.cs.cmu.edu
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com