PowerPoint Presentation
Dictionaries and Their Implementations
Chapter 18
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Contents
The ADT Dictionary
Possible Implementations
Selecting an Implementation
Hashing
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The ADT Dictionary
Recall concept of a sort key in Chapter 11
Often must search a collection of data for specific information
Use a search key
Applications that require value-oriented operations are frequent
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The ADT Dictionary
FIGURE 18-1 A collection of data about certain cities
Consider the need for searches through this data based on other than the name of the city
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
ADT Dictionary Operations
Test whether dictionary is empty.
Get number of items in dictionary.
Insert new item into dictionary.
Remove item with given search key from dictionary.
Remove all items from dictionary.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
ADT Dictionary Operations
Get item with a given search key from dictionary.
Test whether dictionary contains an item with given search key.
Traverse items in dictionary in sorted search-key order.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
ADT Dictionary
View interface, Listing 18-1
FIGURE 18-2 UML diagram for a class of dictionaries
.htm code listing files must be in the same folder as the .ppt files for these links to work
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Possible Implementations
Sorted (by search key), array-based
Sorted (by search key), link-based
Unsorted, array-based
Unsorted, link-based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Possible Implementations
FIGURE 18-3 A dictionary entry
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Possible Implementations
FIGURE 18-4 The data members for two sorted linear implementations of the ADT dictionary for the data in Figure 18-1 : (a) array based; (b) link based
View header file for class of dictionary entries, Listing 18-2
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Possible
Implementations
FIGURE 18-5 The data members for a binary search tree implementation of the ADT dictionary for the data in Figure 18-1
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Sorted Array-Based Implementation of ADT Dictionary
Consider header file for the class ArrayDictionary, Listing 18-3
Note definition of method add, Listing 18-A
Bears responsibility for keeping the array items sorted
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Binary Search Tree
Implementation of ADT Dictionary
Dictionary class will use composition
Will have a binary search tree as one of its data members
Reuses the class BinarySearchTree from Chapter 16
View header file, Listing 18-4
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Selecting an Implementation
Reasons for considering linear implementations
Perspective,
Efficiency
Motivation
Questions to ask
What operations are needed?
How often is each operation required?
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Selecting an Implementation
Three Scenarios
Insertion and traversal in no particular order
Retrieval – consider:
Is a binary search of a linked chain possible?
How much more efficient is a binary search of an array than a sequential search of a linked chain?
Insertion, removal, retrieval, and traversal in sorted order – add and remove must:
Find the appropriate position in the dictionary.
Insert into (or remove from) this position.
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Selecting an Implementation
FIGURE 18-6 Insertion for unsorted linear implementations: (a) array based; (b) link based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Selecting an Implementation
FIGURE 18-7 Insertion for sorted linear implementations: (a) array based; (b) pointer based
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Selecting an Implementation
FIGURE 18-8 The average-case order of the ADT dictionary operations for various implementations
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing
Binary search tree retrieval have order
O(log2n)
Need a different strategy to locate an item
Consider a “magic box” as an address calculator
Place/retrieve item from that address in an array
Ideally to a unique number for each key
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing
FIGURE 18-9 Address calculator
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing
Pseudocode for getItem
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing
Pseudocode for remove
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hash Functions
Possible algorithms
Selecting digits
Folding
Modulo arithmetic
Converting a character string to an integer
Use ASCII values
Factor the results, Horner’s rule
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-10 A collision
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
Approach 1: Open addressing
Probe for another available location
Can be done linearly, quadratically
Removal requires specify state of an item
Occupied, emptied, removed
Clustering is a problem
Double hashing can reduce clustering
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-11 Linear probing with h ( x ) = x mod 101
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-11 Linear probing with h ( x ) = x mod 101
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-12 Quadratic probing with h ( x ) = x mod 101
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-13 Double hashing during the
insertion of 58, 14, and 91
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
Approach 2: Restructuring the hash table
Each hash location can accommodate more than one item
Each location is a “bucket” or an array itself
Alternatively, design the hash table as an array of linked chains – called “separate chaining”
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Resolving Collisions
FIGURE 18-14 Separate chaining
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
Efficiency of hashing involves the load factor alpha (α)
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
Linear probing – average value for α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
Quadratic probing and double hashing – efficiency for given α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
Separate chaining – efficiency for given α
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
FIGURE 18-15 The relative efficiency of four
collision-resolution methods
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
The Efficiency of Hashing
FIGURE 18-15 The relative efficiency of four
collision-resolution methods
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Maintaining Hashing
Performance
Collisions and their resolution typically cause the load factor α to increase
To maintain efficiency, restrict the size of α
α 0.5 for open addressing
α 1.0 for separate chaining
If load factor exceeds these limits
Increase size of hash table
Rehash with new hashing function
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
What Constitutes a
Good Hash Function?
Easy and fast to compute?
Scatter data evenly throughout hash table?
How well does it scatter random data?
How well does it scatter non-random data?
Note: traversal in sorted order is inefficient when using hashing
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing and Separate Chaining
for ADT Dictionary
FIGURE 18-16 A dictionary entry when separate chaining is used
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
Hashing and Separate Chaining
for ADT Dictionary
View Listing 18-5, The class HashedEntry
Note the definitions of the add and remove functions, Listing 18-B
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013
End
Chapter 18
Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013