CS代考计算机代写 algorithm AI The Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis Dimension
Prof. Dan A. Simovici
UMB
1/96

Outline
1 Basic Definitions for Vapnik-Chervonenkis Dimension
2 Growth Functions
3 The Sauer-Shelah Theorem
4 The Link between VCD and PAC Learning
5 The VCD of Collections of Sets
2/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Trace of a Collection of Sets
Definition
Let C be a collection of sets and let U be a set. The trace of collection C on the set U is the collection
CU = {U ∩ C | C ∈ C}.
If the trace of C on U, CU equals P(U), then we say that U is shattered
by C.
U is shattered by C if C can carve any subset of U as an intersection with a set in C.
3/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Example
Let U = {u1, u2} and let C be the collection of sets
C = {{u3}, {u1, u3}, {u2, u3}, {u1, u2, u3}}.
C shatters U because we can write:
∅ = {u1} = {u2} = {u1,u2} =
U∩{u3} U∩{u1,u3} U∩{u2,u3} U∩{u1,u2,u3}
4/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Definition
The Vapnik-Chervonenkis dimension of the collection C (called the VC-dimension for brevity) is the largest size of a set K that is shattered by C.
This largest size is denoted by VCD(C).
5/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Example
Note that the previous collection C cannot shatter the set
U′ = {u1, u2, u3} because this set has 8 subsets and C has just four sets. Thus, if is impossible to express all subsets of U′ as intersections of U′ with some set of C.
The VCD dimension of the collection C is 2.
6/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Note that:
We have VCD(C) = 0 if and only if |C| = 1.
If VCD(C) = d, then there exists a set K of size d such that for each subset L of K there exists a set C ∈ C such that L = K ∩ C .
C shatters K if and only if the trace of C on K denoted by CK shatters K. This allows us to assume without loss of generality that both the sets of the collection C and a set K shattered by C are subsets of a set U.
7/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Collections of Sets as Sets of Hypotheses
Let U be a set, K a subset, and let C be a collection of sets.
Each C ∈ C defines a hypothesis hC : U −→ {−1, 1} that is a dichotomy,
where
􏰑1 if u ∈ C, hC(u)= −1 ifu̸∈C.
K is shattered by C if and only if for every subset L of K there exists a hypothesis hC such that the set of positive examples {u ∈ U | hC (u) = 1} equals L.
8/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Finite Collections have Finite VC-Dimension
Let C be a collection of sets with VCD(C) = d and let K be a set shattered by C with |K| = d.
Since there exist 2d subsets of K, there are at least 2d subsets of C, so
2d 􏰀 |C|.
Consequently, VCD(C) 􏰀 log2 |C|. This shows that if C is finite, then VCD(C) is finite.
The converse is false: there exist infinite collections C that have a finite VC -dimension.
9/96

Basic Definitions for Vapnik-Chervonenkis Dimension
A Tabular Representation of Collections
If U = {u1,…,un} is a finite set, then the trace of a collection
C = {C1,…,Cp} of subsets of U on a subset K of U can be presented in an intuitive, tabular form.
Let θ be a table containing the rows t1, . . . , tp and the binary attributes u1,…,un.
Each tuple tk corresponds to a set Ck of C and is defined by
􏰑1 ifui∈Ck, tk [ui ] = 0 otherwise,
for 1 􏰀 i 􏰀 n. Then, C shatters K if the content of the projection r[K] consists of 2|K| distinct rows.
10/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Example
Let U = {u1,u2,u3,u4} and let
C = {{u2, u3}, {u1, u3, u4}, {u2, u4}, {u1, u2}, {u2, u3, u4}} represented by:
TC
u1
u2
u3
u4
0 1 0 1 0
1 0 1 1 1
1 1 0 0 1
0 1 1 0 1
The set K = {u1, u3} is shattered by the collection C because the projection on K ((0, 1), (1, 1), (0, 0), (1, 0), (0, 1)). contains the all four necessary tuples (0, 1), (1, 1), (0, 0), and (1, 0).
No subset K of U that contains at least three elements can be shattered by C because this would require the projection r[K] to contain at least eight tuples. Thus, VCD(C) = 2.
11/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Observations:
Every collection of sets shatters the empty set.
If C shatters a set of size n, then it shatters a set of size p, where p 􏰀 n.
For a collection of sets C and for m ∈ N, let ΠC[m]=max{|CK| | |K|=m}
be the largest number of distinct subsets of a set having m elements that can be obtained as intersections of the set with members of C.
We have ΠC[m] 􏰀 2m;
if C shatters a set of size m, then ΠC[m] = 2m.
12/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Definition
A Vapnik-Chervonenkis class (or a VC class) is a collection C of sets such that VCD(C) is finite.
13/96

Basic Definitions for Vapnik-Chervonenkis Dimension
Example
Let R be the set of real numbers and let I be the collection of sets {(−∞,t) | t ∈ R}.
We claim that any singleton is shattered by I. Indeed, if S = {x} is a singleton, then P({x}) = {∅,{x}}. Thus, if t 􏰁 x, we have
(−∞, t) ∩ S = {x}; also, if t < x, we have (−∞, t) ∩ S = ∅, so IS = P(S). There is no set S with |S| = 2 that can be shattered by I. Indeed, suppose that S = {x,y}, where x < y. Then, any member of I that contains y includes the entire set S, so IS = {∅,{x},{x,y}} ≠ P(S). This shows that I is a VC class and VCD(I) = 1. 14/96 Basic Definitions for Vapnik-Chervonenkis Dimension Example Consider the collection I = {[a, b] | a, b ∈ R, a 􏰀 b} of closed intervals. We claim that VCD(I) = 2. To justify this claim, we need to show that there exists a set S = {x,y} such that IS = P(S) and no three-element set can be shattered by I. For the first part of the statement, consider the intersections [u,v]∩S=∅, wherev0},I2={i|1􏰀i􏰀d+2,βi <0} are non-empty sets and X1 ={xi | i ∈I1},X2 ={xi | i ∈I2}, form a partition of X. 29/96 Basic Definitions for Vapnik-Chervonenkis Dimension Proof (cont’d) Define β = 􏰍i∈I1 βi. Since 􏰍i∈I1 βi = −􏰍i∈I2 βi, we have 􏰆βixi =􏰆−βixi. Also, ββ i ∈I1 i ∈I2 􏰆βi =􏰆−βi =1, ββ i ∈I1 i ∈I2 βi 􏰁0fori∈I1and−βi 􏰁0fori∈I2.Thisimpliesthat ββ 􏰆 βi xi β i ∈I1 belongs both to the convex hulls of X1 and X2. 30/96 Basic Definitions for Vapnik-Chervonenkis Dimension Let X be a set of d +2 points in Rd. By Radon’s Theorem it can be partitioned into X1 and X2 such that the two convex hulls intersect. When two sets are separated by a hyperplane, their convex hulls are also separated by the hyperplane. Thus, X1 and X2 cannot be separated by a hyperplane and X is not shattered. 31/96 Basic Definitions for Vapnik-Chervonenkis Dimension Example Let R be the set of rectangles whose sides are parallel with the axes x and y. ThereisasetS with|S|=4thatisshatteredbyR. LetS beasetof four points in R2 that contains a unique “northernmost point” Pn, a unique “southernmost point” Ps, a unique “easternmost point” Pe, and a unique “westernmost point” Pw . If L ⊆ S and L ̸= ∅, let RL be the smallest rectangle that contains L. For example, we show the rectangle RL for the set {Pn,Ps,Pe}. q Pw Ps Pn q q Pe q 32/96 Basic Definitions for Vapnik-Chervonenkis Dimension Example (cont’d) This collection cannot shatter a set of points that contains at least five points. Indeed, let S be such that |S| 􏰁 5. If the set contains more than one “northernmost” point, then we select exactly one to be Pn. Then, the rectangle that contains the set K = {Pn , Pe , Ps , Pw } contains the entire set S, which shows the impossibility of separating S. 33/96 Basic Definitions for Vapnik-Chervonenkis Dimension The Class of All Convex Polygons Example Consider the system of all convex polygons in the plane. For any positive integer m, place m points on the unit circle. Any subset of the points are the vertices of a convex polygon. Clearly that polygon will not contain any of the points not in the subset. This shows that we can shatter arbitrarily large sets, so the VC-dimension of the class of all convex polygons is infinite. 34/96 Basic Definitions for Vapnik-Chervonenkis Dimension The Case of Convex Polygons with d Vertices Example Consider the class of convex polygons that have no more than d vertices in R2 and place 2d + 1 points on a circle. Label a subset of these points as positive, and the remaining points as negative. Since we have an odd number of points there exists a majority in one of the classes (positive or negative). If the negative point are in majority, there are at most d positive points; these are contained by the convex polygon formed by joining the positive points. If the positive are in majority, consider the polygon formed by the tangents of the negative points. 35/96 Basic Definitions for Vapnik-Chervonenkis Dimension Negative Points in the Majority positive examples negative examples 36/96 Basic Definitions for Vapnik-Chervonenkis Dimension Positive Points in the Majority positive examples negative examples 37/96 Basic Definitions for Vapnik-Chervonenkis Dimension Example cont’d Since a set with 2d + 1 points can be shattered, the VC dimension of the set of convex polygons with at most d vertices is at least 2d + 1. If all labeled points are located on a circle then it is impossible for a point to be in the convex closure of a subset of the remaining points. Thus, placing the points on a circle maximizes the number of sets required to shatter the set, so the VC-dimension is indeed 2d + 1. 38/96 Growth Functions Definition Let H be a set of hypotheses and let (x1,...,xm) be a sequence of examples of length m. A hypothesis h ∈ H induces a classification (h(x1), . . . , h(xm)) of the components of this sequence. The growth function of H is the function ΠH : N −→ N gives the number of ways a sequence of examples of length m can be classified by a hypothesis in H: ΠH(m) = max |{(h(x1),...,h(xm)) | h ∈ H}| (x1 ,...,xm )∈X m 39/96 Growth Functions Dichotomies Definition A dichotomy is a hypothesis h : X −→ {−1, 1}. If H consists of dichotomies, then (x1, . . . , xm) can be classified in at most 2m ways. 40/96 The Sauer-Shelah Theorem Theorem Let S = {s1,...,sn} be a set and let C be a collection of subsets of S, C ⊆ P(S). If SH(C) is the family of subsets of S that are shattered by C then |SH(C)| 􏰁 |C|. 41/96 The Sauer-Shelah Theorem Proof The argument is by induction on |C|. Consider the subfamilies C0 = {U∈C|s1̸∈U} C1 = {U∈C|s1∈U} By the inductive hypothesis, |SH(C0)| 􏰁 |C0|, that is, C0 shatters at least as many subsets of S′ = {s2,s3,...,sn} as |C0|. Next, consider the family C 1′ = { U − { s 1 } | U ∈ C 1 } . The families C0 and C1 of subsets of S are disjoint and |C| = |C0| + |C1|. C0 and C1′ are families of subsets of S′ and |C1′ | = |C1|. Note that: 42/96 The Sauer-Shelah Theorem Proof (cont’d) By induction, C1′ shatters at least as many subsets of S′ = {s2,s3,...,sn} as its cardinality, that is, |SH(C1′ )| 􏰁 |C1′ |. The number of subsets of S′ shattered by C0 and C1′ sum up to at least |C0| + |C1′ | = |C0| + |C1| = |C|, and every subset of S′ shattered by C1′ is shattered by C1 ⊆ C. Note that there may be subsets V of S′ shattered by both C0 and C1′ . In this case both V and V ∪ {s1} are shattered by C. 43/96 The Sauer-Shelah Theorem For n, k ∈ N and 0 􏰀 k 􏰀 n define the number 􏰉 n 􏰊 as 􏰀k 􏰂 n 􏰃 􏰆k 􏰂 n 􏰃 􏰀k=i. Clearly, 􏰉 n 􏰊 = 1 and 􏰉 n 􏰊 = 2n. 􏰀0 􏰀n Theorem i=0 (Sauer-Shelah Theorem) Let S be a set with |S| = n and let C be a collection of subsets of S such that 􏰂n􏰃 |C|> 􏰀k
Then, there exists a subset T of S having at least k + 1 elements such that C shatters T.
44/96

The Sauer-Shelah Theorem
Proof
Let |SH(C)| be the number of sets shattered by C. We have |SH(C)| 􏰁 |C| by the previous theorem.
Let Pk(S) be the collection of subsets of S that contain k or fewer elements.
The inequality of the theorem means that |C| > |Pk(S)|, hence
|SH(C)| > |Pk(S)|. Therefore, there exists a subset T of S with at least k + 1 elements that is shattered by C.
45/96

The Sauer-Shelah Theorem
Theorem
Let φ : N2 −→ N be the function defined by
φ(d,m) = We have
for d,m ∈ N.
􏰑1
φ(d,m−1)+φ(d −1,m−1),
􏰂m􏰃 φ(d,m)= 􏰀d
if m = 0 or d = 0 otherwise.
46/96

The Sauer-Shelah Theorem
Proof
The argument is by strong induction on s = d + m.
The base case, s = 0, implies m = 0 and d = 0, and the equality is immediate.
47/96

The Sauer-Shelah Theorem
Suppose that the equality holds for φ(d′, m′), where d′ + m′ < d + m. We have: φ(d,m−1)+φ(d −1,m−1) (by definition) φ(d,m) = = 􏰍d 􏰉m−1􏰊 + 􏰍d −1 􏰉m−1􏰊 i=0 i i=0 i (by inductive hypothesis) = 􏰍d 􏰉m−1􏰊 + 􏰍d 􏰉m−1􏰊 i=0 i i=1 i−1 (by changing the summation index in the second sum) = 􏰍d 􏰉m−1􏰊 + 􏰍d 􏰉m−1􏰊 i=0 i i=0 (because 􏰉m−1􏰊 = 0) i−1 −1 = 􏰍d 􏰋􏰉m−1􏰊 + 􏰉m−1􏰊􏰌 i=0 i i−1 = 􏰍d 􏰉m􏰊=􏰉m􏰊, i=0i 􏰀d which gives the desired conclusion. 48/96 The Sauer-Shelah Theorem Another Inequality Suppose that VCD(C) = d and |S| = n. Then SH(C) ⊆ Pd(S), hence 􏰆d 􏰂 n 􏰃 􏰂 n 􏰃 |C|􏰀|SH(C)|􏰀 Together with the previous inequality we obtain: i = 􏰀d . 2 􏰀|C|􏰀 􏰀d =φ(n,d). d 􏰂n􏰃 i=1 49/96 The Sauer-Shelah Theorem Lemma For d ∈ N and d 􏰁 2 we have d−1 dd 2 􏰀d!. Proof: The argument is by induction on d. In the basis step, d = 2 both members are equal to 2. 50/96 The Sauer-Shelah Theorem Suppose the inequality holds for d. We have (d + 1)d+1 (d + 1)! (d + 1)d dd = d! = d! · (d + 1)d dd 􏰂 1􏰃d = d!· 1+d 􏰁2d· 1+d 􏰁2d dd 􏰂 1􏰃d (by inductive hypothesis) because 􏰂1􏰃d 1 1 + d 􏰁 1 + d d = 2. This concludes the proof of the inequality. 51/96 The Sauer-Shelah Theorem Lemma Wehaveφ(d,m)􏰀2md foreverym􏰁dandd􏰁1. d! Proof: The argument is by induction on d and n. If d = 1, then φ(1, m) = m + 1 􏰀 2m for m 􏰁 1, so the inequality holds for every m 􏰁 1, when d = 1. 52/96 The Sauer-Shelah Theorem Proof (cont’d) If m = d 􏰁 2, then φ(d,m) = φ(d,d) = 2d and the desired inequality follows immediately from a previous Lemma. Suppose that the inequality holds for m > d 􏰁 1. We have
φ(d,m+1) =
φ(d,m)+φ(d −1,m) (by the definition of φ)
md md−1
􏰀 2d!+2(d−1)!
(by inductive hypothesis)
= 2 md−1 􏰋1+m􏰌. (d − 1)! d
53/96

The Sauer-Shelah Theorem
Proof (cont’d)
It is easy to see that the inequality
2 md−1 􏰋1+m􏰌􏰀2(m+1)d
is equivalent to
(d − 1)! d d! d 􏰂 1􏰃d
m+1􏰀 1+m
and, therefore, is valid. This yields immediately the inequality of the lemma.
54/96

The Sauer-Shelah Theorem
The Asymptotic Behavior of the Function φ
Theorem
The function φ satisfies the inequality: φ(d,m) < 􏰋em􏰌d d for every m 􏰁 d and d 􏰁 1. Proof: By a previous Lemma, φ(d,m) 􏰀 2md . Therefore, we need to show only that d! 􏰂d􏰃d 2 e ε.
67/96

The Sauer-Shelah Theorem
Example
Suppose that C is finite. For any fixed set C0 ⊕ C ∈ ∆C0,ε(C), the probability that we fail to hit C0 ⊕ C in m random examples is at most
(1 − ε)m. Thus, the probability that we fail to hit some C0 ⊕ C ∈ ∆C0,ε(C) is bounded above by |C|(1 − ε)m.
68/96

The Link between VCD and PAC Learning
The Double Sample Theorem
Theorem
Let C be a concept class with VCD(C) = d.
Let A be any algorithm that given a set S of m labeled examples {(xi,c(xi)) | 1 􏰀 i 􏰀 m} sampled iid according to some fixed but unknown distribution D over the instance space X produces as output a hypothesis h that is consistent with c. Then, A is a PAC algorithm and
􏰂1 1 d 1􏰃 m􏰁k0 εlogδ+εlogε .
for some positive constant k0.
69/96

The Link between VCD and PAC Learning
Proof
Draw a sample S1 of size m from D and let A be the event that the elements of S1 fail to form an ε-net for ∆C0,ε(C).
If A occurs, then S1 misses some region T, where T ∈ ∆C0,ε(C).
Fix this region T and draw an additional sample S2 of size m from D.
70/96

The Link between VCD and PAC Learning
Let V be a binomial random variable that gives the number of hits of T bythesampleS2. WehaveE(V)=mεandvar(V)=mε(1−ε)because the probability of an element of S2 hitting T is ε.
By Chebyshev’s Inequality applied to V we have
P(|V −mε|􏰁a)􏰀 mε(1−ε). a2
Taking a = εm it follows that 2
P(|V−mε|􏰁εm) 􏰀 4(1−ε) 2 εm
provided that m 􏰁 8 . ε
􏰀 4􏰀1, εm 2
71/96

The Link between VCD and PAC Learning
Thus, if m 􏰁 8 , ε
The inequality
P(|V −εm|􏰀 εm)􏰁 1. 22
|V − εm| 􏰀 εm 2
is equivalent to εm 􏰀 V 􏰀 3εm, which implies P(V 􏰁 εm) 􏰁 1. 2222
72/96

The Link between VCD and PAC Learning
Proof (cont’d)
To summarize: we have calculated the probability that S2 will hit T many times given that T was fixed using the previous sampling, that is, given that S1 does not form an ε-net.
Let B be the event that S1 does not form an ε-net and that S2 hits T at least εm times. Then, we have shown that for m = O(1/ε) we have
2
P(B|A) 􏰁 1. 2
73/96

The Link between VCD and PAC Learning
Proof (cont’d)
Since P(B|A) 􏰁 1 we have 2
P(B) = P(B|A)P(A) 􏰁 1P(A). 2
Our goal of bounding P(A) is equivalent to finding δ such that P(B) 􏰀 δ 2
because this would imply P(A) 􏰀 δ.
74/96

The Link between VCD and PAC Learning
Proof (cont’d)
Let S = S1 ∪ S2 be a random sample of 2m. Note that since the samples
are iid obtaining S is equivalent of sampling S1 and S2 separately and let
T beafixedsetsuchthat|T|􏰁εm. 2
Consider a random partition of S into S1 and S2 and consider the
probability that S1 ∩ T = ∅.
An Equivalent Problem: we have 2m balls each colored red or blue with
exactly l red balls, where l 􏰁 εm . Divide the 2m balls into groups of equal 2
size S1 and S2. Find an upper bound on the probability that all l balls fall in S2 (that is, the probability that S1 ∩ R = ∅).
75/96

The Link between VCD and PAC Learning
Proof (cont’d)
Yet Another Equivalent Problem: Divide 2m non-colored balls into S1 and
S2, choose l to be colored red, and compute the probability that all red
balls fall in S2. The probability of this taking place is: 􏰉m􏰊
l 􏰉2m􏰊 l
Note that
􏰉m􏰊 l−1 m−i l􏰇􏰇−
l−11 1 εm 􏰉2m􏰊= 2m−i􏰀 2=2l=2 2.
l i=0 i=0
This is the probability for a fixed S and T. The probability that this
occurs for some T ∈ ∆C0,ε(S) such that |T| 􏰁 εm can be computed by 2
summing over all T and applying the union bound:
P(B) 􏰀 􏰀
|Π (εm)|2−εm 􏰀 |Π (εm)|2−εm ∆C ,ε(S) 2 ∆C (S) 2
0202
􏰂2εm􏰃d εm δ 2−2 􏰀 .
d2
76/96

The Link between VCD and PAC Learning
Proof (cont’d)
The last inequality implies
m􏰁k0 εlogδ+εlogε .
for some positive constant k0.
􏰂1 1 d 1􏰃
77/96

The Link between VCD and PAC Learning
Optional Material
78/96

The VCD of Collections of Sets
Let u : B2k −→ B2 be a Boolean function of k arguments and let C1,…,Ck be k subsets of a set U. Define the set u(C1,…,Ck) as the subset C of U whose indicator function is IC = u(IC1 , . . . , ICk ).
Example
If u : B2 −→ B2 is the Boolean function u(a1, a2) = a1 ∨ a2, then
u(C1, C2) is C1 ∪ C2; similarly, if u(x1, x2) = x1 ⊕ x2, then u(C1, C2) is the symmetric difference C1 ⊕ C2 for every C1, C2 ∈ P(U).
79/96

The VCD of Collections of Sets
Let u : B2k −→ B2 and C1,…,Ck are k family of subsets of U, the family of sets u(C1,…,Ck) is
u(C1,…,Ck) = {u(C1,…,Ck) | C1 ∈ C1,…,Ck ∈ Ck}. Theorem
Let α(k) be the least integer a such that a > k. log(ea)
If C1, . . . , Ck are k collections of subsets of the set U such that
d = max{VCD(Ci) | 1 􏰀 i 􏰀 k} and u : B2 −→ B2 is a Boolean function, then
VCD(u(C1,…,Ck)) 􏰀 α(k)·d.
80/96

The VCD of Collections of Sets
Proof
Let S be a subset of U that consists of m elements. The collection (Ci )S is not larger than φ(d,m). For a set in the collection W ∈ u(C1,…,Ck)S wecanwriteW =S∩u(C1,…,Ck),or,equivalently,
1W =1S ·u(1C1,…,1Ck).
There exists a Boolean function gS such that
1S ·u(1C1,…,1Ck)=gS(1S ·1C1,…,1S ·1Ck)=gS(1S∩C1,…,1S∩Ck).
Since there are at most φ(d , m) distinct sets of the form S ∩ Ci for every i , 1 􏰀 i 􏰀 k, it follows that there are at most (φ(d,m))k distinct sets W, hence u(C1,…,Ck)[m] 􏰀 (φ(d,m))k.
81/96

The VCD of Collections of Sets
Proof (cont’d)
By a previous theorem,
u(C1,…,Ck)[m] 􏰀 d .
We observed that if ΠC[m] < 2m, then VCD(C) < m. Therefore, to limit the Vapnik-Chervonenkis dimension of the collection u(C1, . . . , Ck ) it suffices to require that 􏰉em􏰊kd < 2m. d 􏰋em􏰌kd Let a = m . The last inequality can be written as (ea)kd < 2ad ; d equivalently, we have (ea)k < 2a, which yields k < a . If α(k) is the log(ea) least integer a such that k < a , then m 􏰀 α(k)d, which gives our log(ea) conclusion. 82/96 The VCD of Collections of Sets Example If k = 2, the least integer a such that a > 2 is k = 10, as it can be log(ea)
seen by graphing this function; thus, if C1,C2 are two collection of concepts with VCD(C1) = VCD(C2) = d, the Vapnik-Chervonenkis dimension of the collections C1 ∨ C2 or C1 ∧ C2 is not larger than 10d .
83/96

The VCD of Collections of Sets
Lemma
Let S,T be two sets and let f : S −→ T be a function. If D is a collection of subsets of T, U is a finite subset of S and C = f −1(D) is the collection {f−1(D) | D∈D},then|CU|􏰀|Df(U)|.
Proof: Let V = f(U) and denote f`U by g. For D,D′ ∈ D we have
(U ∩f−1(D))⊕(U ∩f−1(D′))
= U∩(f−1(D)⊕f−1(D′))=U∩(f−1(D⊕D′))
= g−1(V ∩(D⊕D′))=g−1(V ∩D)⊕g−1(V ⊕D′).
Thus, C = U ∩ f −1(D) and C′ = U ∩ f −1(D′) are two distinct members of CU, then V ∩ D and V ∩ D′ are two distinct members of Df (U). This implies |CU| 􏰀 |Df (U)|.
84/96

The VCD of Collections of Sets
Theorem
Let S,T be two sets and let f : S −→ T be a function. If D is a collection of subsets of T and C = f −1(D) is the collection {f −1(D) | D ∈ D}, then VCD(C) 􏰀 VCD(D). Moreover, if f is a surjection, then
VCD(C) = VCD(D).
85/96

The VCD of Collections of Sets
Proof
Suppose that C shatters an n-element subset K = {x1, . . . , xn} of S, so |CK|=2n ByapreviousLemmawehave|CK|􏰀|Df(U)|,so|Df(U)|􏰁2n, which implies |f (U)| = n and |Df (U)| = 2n, because f (U) cannot have more than n elements. Thus, D shatters f (U), so VCD(C) 􏰀 VCD(C). Suppose now that f is surjective and H = {t1, . . . , tm} is an m element set that is shattered by D. Consider the set L = {u1, . . . , um} such that
ui ∈f−1(ti)for1􏰀i􏰀m.LetUbeasubsetofL.SinceHisshattered by D, there is a set D ∈ D such that f (U) = H ∩ D, which implies
U = L ∩ f −1(D). Thus, L is shattered by C and this means that
VCD(C) = VCD(D).
86/96

The VCD of Collections of Sets
Definition
The density of C is the number
denss(C)=inf{s∈R>0 | ΠC[m]􏰀c·ms foreverym∈N},
for some positive constant c.
87/96

The VCD of Collections of Sets
Theorem
Let S,T be two sets and let f : S −→ T be a function. If D is a collection of subsets of T and C = f −1(D) is the collection {f −1(D) | D ∈ D}, then denss(C) 􏰀 denss(D). Moreover, if f is a surjection, then
denss(C) = denss(D).
Proof: Let L be a subset of S such that |L| = m. Then, |CL| 􏰀 |Df (L)|. In general, we have |f (L)| 􏰀 m, so |Df (L)| 􏰀 D[m] 􏰀 cms . Therefore, we have |CL| 􏰀 |Df (L)| 􏰀 D[m] 􏰀 cms , which implies denss(C) 􏰀 denss(D).
If f is a surjection, then, for every finite subset M of T such that |M| = m there is a subset L of S such that |L| = |M| and f (L) = M. Therefore, D[m] 􏰀 ΠC[m] and this implies denss(C) = denss(D).
88/96

The VCD of Collections of Sets
If C, D are two collections of sets such that C ⊆ D, then VCD(C) 􏰀 VCD(D) and denss(C) 􏰀 denss(D).
Theorem
LetCbeacollectionofsubsetsofasetSandletC′={S−C |C∈C}. Then,foreveryK∈P(S)wehave|CK|=|CK′ |.
89/96

The VCD of Collections of Sets
Proof
We prove the statement by showing the existence of a bijection
f : CK −→ CK′ . If U ∈ CK , then U = K ∩ C , where C ∈ C. Then
S − C ∈ C′ and we define f (U) = K ∩ (S − C) = K − C ∈ CK′ . The function f is well-defined because if K ∩ C1 = K ∩ C2, then
K−C1 =K−(K∩C1)=K−(K∩C2)=K−C2. Itisclearthatiff(U)=f(V)forU,V ∈CK,U=K∩C1,and
V =K∩C2,thenK−C1 =K−C2,soK∩C1 =K∩C2 andthismeans thatU=V. Thus,f isinjective. IfW ∈CK′ ,thenW =K∩C′ forsome C′ ∈ C. Since C′ = S − C for some C ∈ C, it follows that W = K − C, so W = f (U), where U = K ∩ C.
90/96

The VCD of Collections of Sets
Corollary
LetCbeacollectionofsubsetsofasetSandletC′={S−C |C∈C}. We have denss(C) = denss(C′) and VCD(C) = VCD(C′).
91/96

The VCD of Collections of Sets
Theorem
For every collection of sets we have denss(C) 􏰀 VCD(C). Furthermore, if denss(C) is finite, then C is a VC-class.
Proof: If C is not a VC-class the inequality denss(C) 􏰀 VCD(C) is clearly satisfied. Suppose now that C is a VC-class and VCD(C) = d. By Sauer-Shelah Theorem we have ΠC[m] 􏰀 φ(d,m); then, we obtain
ΠC[m] 􏰀 􏰉em􏰊d, so denss(C) 􏰀 d.
d
Suppose now that denss(C) is finite. Since ΠC[m] 􏰀 cms 􏰀 2m for m
sufficiently large, it follows that VCD(C) is finite, so C is a VC-class.
92/96

The VCD of Collections of Sets
Let D be a finite collection of subsets of a set S. The partition πD was
defined as consisting of the nonempty sets of the form
{Da1 ∩Da2 ∩···∩Dar,where(a,a,…,a)∈{0,1}r. 12r12r
Definition
A collection D = {D1,…,Dr} of subsets of a set S is independent if the partition πD has the maximum numbers of blocks, that is, it consists of 2r blocks.
If D is independent, then the Boolean subalgebra generated by D in the Boolean algebra (P(S), {∩, ∪, ̄, ∅, S}) contains 22r sets, because this subalgebra has 2r atoms. Thus, if D shatters a subset T with |T| = p, then the collection DT contains 2p sets, which implies 2p 􏰀 22r , or p 􏰀 2r .
93/96

The VCD of Collections of Sets
Definition
Let C be a collection of subsets of a set S. The independence number of C I(C) is:
I(C) = sup{r | {C1,…,Cr}
is independent for some finite {C1, . . . , Cr } ⊆ C}.
94/96

The VCD of Collections of Sets
Theorem
Let S,T be two sets and let f : S −→ T be a function. If D is a collection of subsets of T and C = f −1(D) is the collection {f −1(D) | D ∈ D}, then I (C) 􏰀 I (D). Moreover, if f is a surjection, then I (C) = I (D).
Proof: Let E = {D1,…,Dp} be an independent finite subcollection of D. The partition πE contains 2r blocks. The number of atoms of the subalgebra generated by {f −1(D1), . . . , f −1(Dp)} is not greater than 2r . Therefore, I(C) 􏰀 I(D); from the same supplement it follows that if f is surjective, then I (C) = I (D).
95/96

The VCD of Collections of Sets
Theorem
If C is a collection of subsets of a set S such that VCD(C) 􏰁 2n, then I(C) 􏰁 n.
Proof: Suppose that VCD(C) 􏰁 2n, that is, there exists a subset T of S that is shattered by C and has at least 2n elements. Then, the collection Ht contains at least 22n sets, which means that the Boolean subalgebra of P(T) generated by TC contains at least 2n atoms. This implies that the subalgebra of P(S) generated by C contains at least this number of atoms, so I(C) 􏰁 n.
96/96