程序代写代做代考 data structure algorithm chain Indexed Sets

Indexed Sets

Direct Access Sets?

Have been assuming need to search for a key

In an array sorted by key

Better: in a tree sorted by key

Can data just be indexed by key?

Questions

How could such indexing work?

Want to use any type as a key

Assuming such indexing, how long would put and get take in a set
containing N objects?

Algorithms (580) Dynamic Data Structures January 2018 37 / 50

Indexed Sets

Hash Tables index data (indirectly) by key

A hash table T is (like) an array with m slots

The key is converted to an integer index by a hash function h

So, an object with key k is stored at T [h(k)]

The time taken by h depends only on k

New object added into N object set in Θ(1) time (theoretical only!)

Algorithms (580) Dynamic Data Structures January 2018 38 / 50

Indexed Sets Mapping Functions

Numerical Encoding

Map any key object to a natural number

Requirement: equal keys have same result

Requirement: unequal keys have different result

Exercise

Design a function to map every ASCII string to a different natural number

Algorithms (580) Dynamic Data Structures January 2018 39 / 50

Indexed Sets Mapping Functions

Encoding Function

One Possibility

The formula
k = s[0] + s[1] ∗ 128 + s[2] ∗ 1282 + …

converts every ASCII string s to a different natural number.

Treat each character as a digit

Same principle can be applied (recursively) to any type

Question

What is the problem with this as a practical solution?

Algorithms (580) Dynamic Data Structures January 2018 40 / 50

Indexed Sets Mapping Functions

Collisions

Impractical to store every key at a different index

Very space inefficient, even if it’s possible

Result: collisions

Will need a way to resolve collisions (store both objects)

Algorithms (580) Dynamic Data Structures January 2018 41 / 50

Indexed Sets Mapping Functions

A Hash Function Part 2

Map the numerical code k from Step 1 to a position in the table

Step 2

If the table has size m:
h(k) = k mod m

New requirement: minimise collisions

spread the keys as evenly as possible

Question

What happens to the ASCII string keys if m = 128?

All keys starting a.. hash to same slot

If all keys start a… only one slot used

Using a prime radix for k limits the problem

Algorithms (580) Dynamic Data Structures January 2018 42 / 50

Indexed Sets Mapping Functions

Uniform Hashing

Lots of ways to hash: universal, fingerprint, cryptographic, …

Best result is data dependent

More uniform, often slower

Definition (Simple Uniform Hashing Assumption)

Given a hash table T with m slots, using hash faunction h, the simple
uniform hashing assumption (SUHA) states that each new key k is equally
likely to hash into any of the m slots. So, the probability that h(k) = i , for
every slot 1 ≤ i ≤ m is 1/m.

SUHA is an assumption about both h and input data

Allows analysis to ignore details of both

Algorithms (580) Dynamic Data Structures January 2018 43 / 50

Indexed Sets Mapping Functions

Hash Table Memory

Recall: need a way to resolve collisions (store both objects)

Exercise

Design a way to resolve collisions

Table has to store both objects somewhere

What is the worst case time to add a new object?

Algorithms (580) Dynamic Data Structures January 2018 44 / 50

Indexed Sets Chaining

Chaining

With collision resolution by Chaining

All objects whose key hashes to h(k) are placed into a linked list

The table contains a pointer to the list

So, T [i ] contains a list of objects x where h(x .key) = i

Algorithms (580) Dynamic Data Structures January 2018 45 / 50

Indexed Sets Chaining

Performance of Chaining

Add object x to table T :

Insert as head of list at T [h(x .key)]

takes Θ(1) time

Search for an object with key k

Search list at T [h(k)] for an object where x .key == k

In a table containing N values

Worst case is N elements in one chain: O(N) search

Under SUHA, expected time is O(N/m)

N/m is called the load factor

Algorithms (580) Dynamic Data Structures January 2018 46 / 50

Indexed Sets Chaining

Expected Time To Search

The expected time for an unsuccessful search for key k , in a hash table
with m slots, containing N objects, assuming simple uniform hashing:

By SUHA, expected length of each chain is N/m

k equally likely to hash to all m positions

Probability of searching chain at T [i ] is 1/m

Expected number of comparisons is

m∑
i=1

m
×

m
=

If N is proportional to m, expected running time for Search is Θ(1)

The design of the table needs to ensure N/m is Θ(1)

Successful search reasoning is similar: O(N/m)

Algorithms (580) Dynamic Data Structures January 2018 47 / 50

Indexed Sets Probing

Probing

In an open address hash table objects are stored directly in the table

We use probing to resolve collisions

To insert an object we probe the table until we find a space

The hash function generates a sequence 〈h(k , 0), . . . , h(k,m − 1)〉

The simplest form (above) is linear probing

Consecutive slots are probed, beginning with h(k), up to h(k)− 1

Algorithms (580) Dynamic Data Structures January 2018 48 / 50

Indexed Sets Probing

Performance of Probing

Definition (Uniform Hashing)

Given a hash table with m slots, a hash function produces uniform hashing
if, for an unknown key k , the probability that the probe sequence of k is p,
where p is a permutation of 〈0, . . . ,m − 1〉 is the same for all such p.

Uniform hashing first implies that every permutation is possible

Linear probing does not produce uniform hashing

Assuming uniform hashing, the expected number of keys compared when
inserting an object depends on the load factor N/m

Each probe is to a random slot, with probability N/m it is occupied

If N is proportional to m, expected time for insert (and search) is Θ(1)

Algorithms (580) Dynamic Data Structures January 2018 49 / 50

Indexed Sets Limitations

Limitations

Hash tables do not support operations such as:

In order iteration

Next key / object

Minimum key

Maximum key

since objects are stored, by design, in random order.

Algorithms (580) Dynamic Data Structures January 2018 50 / 50

Related Posts