CS计算机代考程序代写 CSC263 – Week 5, Lecture 1

CSC263 – Week 5, Lecture 1
Cristyn Howard
Monday, February 5, 2018
Q: A webserver needs to reference a large list of blacklisted websites. The entire list
cannot be stored in main memory, because it is too large. How can we store it?
A: In a BLOOM FILTER
Bloom Filters
• probablistic (i.e. approximate) dictionary that stores FS : a summary/fingerprint of a dynamic set S
BF[0, …, m − 1] : array of m bits, initialized to 0.
{h1, …, ht} : set of t < m hash fuctions where hi : U → {0, ..., m − 1} m, t : parameters defining # of bits & # of hash functions, respectively • Operations: Θ(n) INSERT(FS , x) : for i from 1 to t, BF[hi(x)] = 1; ∗ element passed to all hash functions, produces a t-tuple of indices ∈ {0, ..., m − 1} that form fingerprint of x ∗ bits at all indices in fingerprint of x permanently set to 1 Θ(n) SEARCH(FS , x) : for i from 1 to t, if (BF[hi(x)] == 0) return FALSE, else return TRUE ∗ ∗ ∗ – cannot delete elements from S – SEARCH may return false positives - may say that x is likely in S when it is not. • Use when: space constraints exist, false positives are acceptable, and when deleting is not necessary. Returns: {FALSE ≡ x definitely not in FS }||{TRUE ≡ x probably in FS } If any index in the hash of x is 0, then x has definitely not been placed in S. However, if x is not in S, and the union of all indices in the hashes of all elements stored in S contains all indices in the hash of x, then SEARCH will return a false positive. very space and time efficient way to store and search large sets. • Limitations: • Benefits: 1 . False Positive Probability Analysis Assume n elements inserted. Compute probability that SEARCH(BF, x) returns a false positive, i.e. a positive when x has not been inserted. Recall that hash functions have uniform, independent probability distributions. 2