CS计算机代考程序代写 python data structure database chain Excel algorithm 1

1

CMPSC-132:Programming and Computation II

Department of Computer Science & Engineering
The Pennsylvania State University

1. Other Data Structures
A data structure is a particular way of organizing data in a computer so that it can be used effectively.

The idea is to reduce the space and time complexities of different tasks. Below is an overview of some

popular non-linear data structures.

1.1 Tree

Unlike arrays, linked lists, stack and queues, which are linear data structures, trees are hierarchical data

structures that are widely used, with a root value and subtrees of children with a parent node, represented

as a set of linked nodes. Unlike trees in nature, the tree data structure is upside down: the root of the tree is

on top. A tree consists of nodes and its connections are called edges. The bottom nodes are also named leaf

nodes or external nodes. A tree does not have a cycle. Additionally:

• Nodes with the same parent are called siblings

• The depth of a node is the number of edges from the root to the node.

• The height of a node is the number of edges from the node to the deepest leaf.

• The height of a tree is a height of the root.

• The degree of a node is the number of children connected to the node.

The degree of a tree is the maximum degree of node in a given tree. In the above example, node B

has degree 3, the other nodes have less degree. Therefore, the degree of the tree is 3 (ternary tree)

1.1.1 Binary trees

A binary tree is a data structure where every node has at most two children (tree has degree 2).

The root of a tree is on top. Since each element in a binary tree can have only 2 children, we typically name

them the left and right child. Since Python does not have built-in support for trees, you need to define a

class tree which has a left and right attribute.

Degree 3

Degree 2

Degree 0

2

class Node:

def __init__(self,key):

self.left = None

self.right = None

self.value = key

root = Node(10)

root.left = Node(4)

root.right = Node(6)

root.left.left = Node(8)

print(root.value)

print(root.left.value)

1.1.1.1 Types of binary trees

The most common types of binary trees are:

• Full binary tree: a binary tree in which each node is either a leaf or possesses exactly two child

nodes.

• Complete binary tree: a binary tree which is completely filled, with the possible exception of the

bottom level, which is filled from left to right.

• Perfect binary tree: a binary tree in which all internal nodes have two children and all leaves are at

same level.

• Degenerate binary tree: a binary tree where every internal node has one child. Such trees are

performance-wise same as a linked list.

3

Perfect Binary Tree Degenerate Binary Tree

1.1.1.2 Properties of binary trees

Binary trees have the following properties:

• A binary tree of n nodes has n-1 edges

• For every k ≥ 0, there are no more than 2k nodes in level k

• A binary tree with k levels has at most 2k-1 leaves

• The number of nodes on the last level is at most the sum of the number of nodes on all other levels

plus 1

• A binary tree with k levels has no more than 2k-1 node

1.1.2 Traversal of binary trees

A traversal is a process that visits all the nodes in the tree. Since a tree is a nonlinear data structure,

there is no unique traversal. A tree is typically traversed in two ways:

• Breadth-First Traversal (Or Level Order Traversal)

• Depth-First Traversals:

✓ Inorder Traversal (Left- Root -Right)

✓ Preorder Traversal (Root-Left-Right)

✓ Postorder Traversal (Left-Right- Root)

Important terms

Below are important terms with respect to a tree:

• path: refers to the sequence of nodes along the edges of a tree

• root: the node at the top of the tree. There is only one root per tree and one path from the root node

to any node

• parent: any node except the root node has one edge upward to a node called parent

• child: the node below a given node connected by its edge downward

• leaf: the node which does not have any child node

• subtree: represents the descendants of a node

• visiting: refers to checking the value of a node when control is on the node

• traversing: passing through nodes in a specific order

• levels: level of a node represents the generation of a node

• key: represents a value of a node based on which a search operation is to be carried out for a node

http://www.geeksforgeeks.org/level-order-tree-traversal/
http://www.geeksforgeeks.org/618/

4

1.1.2.1 Breadth-First Search

Breadth-first search (BFS) is a method for exploring a tree or graph. In a BFS, you first explore all the

nodes one step away, then all the nodes two steps away, etc. Here’s a how a BFS would traverse this tree,

starting with the root:

We would visit all the immediate children (all the nodes that are one step away from our starting node):

Then we would move on to all those nodes’ children (all the nodes that are two steps away from our starting

node):

And so on until we reach the end:

Advantages:

▪ BFS involves searching through a tree one level at a time. In a graph, BFS will find the shortest

path between the starting point and any other reachable node.

Disadvantages:

▪ BFS on a binary tree generally requires more memory than a DFS.

There is only one kind of breadth-first traversal known as the level order traversal. This traversal visits

nodes by levels from top to bottom and from left to right:

start at the root node (top of the

tree)

5

1.1.2.2 Depth-First Search

Depth-first search (DFS) is another method for exploring a tree or graph. In a DFS, you go as deep as

possible down one path before backing up and trying a different one. Depth-first search is like walking

through a corn maze. You explore one path, hit a dead end, and go back and try a different one. Here’s a

how a DFS would traverse this tree, starting with the root:

We would go down the first path we find until we hit a dead end:

Then we would do the same thing again—go down a path until we hit a dead end:

And again:

Level order traversal is:

1 2 3 4 5

6

And again:

Until we reach the end.

Advantages:

▪ DFS on a binary tree generally requires less memory than breadth-first.

▪ DFS can be easily implemented with recursion.

Disadvantages:

▪ DFS does not necessarily find the shortest path to a node

There are three commonly used depth-first traversals to visit all the nodes in a tree. The difference between

these patterns is the order in which each node is visited. These three traversals are:

• Inorder Traversal (Left- Root -Right): we recursively do a Left- Root -Right traversal on the left

subtree, visit the root node, and finally do a recursive Left-Root-Right traversal of the right subtree.

• Preorder Traversal (Root-Left-Right): we visit the root node first, then recursively do a DFS

traversal of the left subtree, followed by a recursive preorder traversal of the right subtree.

• Postorder Traversal (Left-Right- Root): we recursively do a Left-Right-Root traversal of the left

subtree and the right subtree followed by a visit to the root node.

1.1.3 Insertion in a Binary Tree in level order

Given a binary tree T and a node with a key (data), we can insert the key into T at first position

available in level order. The main idea is to perform a level order traversal of the tree (using a queue is the

most efficient way) and when we find a node whose left child is empty, we make the new node the left

child of the node. If we find a node whose right child is empty, we make the new node key the right child.

The point is to keep traversing the tree until we find a node whose either left or right is empty.

Inorder traversal: 4 2 5 1 3

Preorder traversal: 1 2 4 5 3

Postorder traversal: 4 5 2 3 1

D

A

B

C E

D

A

B

C E Z

Insert node Z

7

1.1.4 Binary Search Trees
Binary trees are an excellent data structure for storing items of a map (key-value pair), assuming

we have an order relation defined on the keys. In this context, a binary search tree (BST) is a binary tree T

where each node contains one key (also known as data) with each position p storing a key-value pair (k,v)

such that:

▪ Keys stored in the left subtree of p are less than k.

▪ Keys stored in the right subtree of p are greater than k.

▪ Duplicate keys are not allowed.

▪ The left and right subtrees must also be binary search trees

The basic idea behind this data structure is to have such a storing repository that provides an efficient way

for data sorting, searching and retrieving data. In the tree below, all nodes in the left subtree of 10 have

keys < 10 while all nodes in the right subtree are > 10.

1.1.4.1 Searching

Searching in a BST always starts at the root. We compare a data stored at the root with the key we

are searching for. If the node does not contain the key we proceed either to the left or right child depending

upon comparison. If the result of comparison is negative we go to the left child, otherwise we go to the

right child. The recursive structure of a BST commonly yields a recursive algorithm.

1.1.4.2 Insertion
The insertion procedure is quite similar to searching. We start at the root and recursively go down the tree

searching for a location in a BST to insert a new node. If the element to be inserted is already in the tree,

we are done (we do not insert duplicates). The new node will always replace an empty child.

10

6 18

25 5 8

10

6 18

25 5 8

Searching for 25 10==25?

18==25?

25==25? Done

25>10, keep search in left

subtree

25>18, keep search in left

subtree

8

Insert 2

Insert 7

1.1.4.3 Deletion
The remove operation on binary search trees is more complicated, than inserting and searching. Basically,

in can be divided into two stages, search for a node to remove, and if the node is found, perform the

removal. We start the deletion at the root, but unlike searching, we should track the parent of the current

node. After we found the node to be deleted, there are three possible cases:

1. Node to be deleted has no children: The deletion process in this case is simple, we just set the
corresponding link of the parent (edge) to None

Remove 25

10

6 18

25 5 8

2<10 2<6 2<5 10 6 18 25 5 8 2 10 6 18 25 5 8 2 7<10 7>6

7<8 10 6 18 25 5 8 2 7 10 6 18 25 5 8 2 7 10 6 18 5 8 2 7 9 2. Node to be deleted has one child: It this case, node is cut out from the tree, and we link the child directly to the parent of the removed node. Remove 8 3. Node to be deleted has two children: In this case the strategy is to replace the node being deleted with the largest node in the left subtree of the node to be removed and then delete that largest node. By symmetry, the node being deleted can be swapped with the smallest node is the right subtree. This means that two trees can be the result after such removal operation Remove 6 Option 2: Smallest node in right subtree 10 6 18 25 5 8 2 7 10 6 18 25 5 8 2 7 10 6 18 25 5 7 2 10 6 18 25 5 8 2 7 Option 1: Largest node in left subtree 10 6 18 25 5 8 2 7 10 5 18 25 2 8 7 10 6 18 25 5 8 2 7 10 7 18 25 5 8 2 10 1.1.5 Binary Heaps A binary heap is a complete binary tree which satisfies the heap ordering property. The ordering can be one of two types: • the min-heap property: the value of each node is greater than or equal to the value of its parent, with the minimum-value element at the root. • the max-heap property: the value of each node is less than or equal to the value of its parent, with the maximum-value element at the root. Min heap Max heap In a heap the highest (or lowest) priority element is always stored at the root. A heap is not a sorted structure and can be regarded as partially ordered. As you see from the heaps above, there is no particular relationship among nodes on any given level, even among the siblings. Since a heap is a complete binary tree, it has the smallest possible height, a heap with N nodes always has log N height. A heap is useful data structure when you need to remove the object with the highest (or lowest) priority. A common use of a heap is to implement a priority queue. 1.1.5.1 Insertion New elements are initially added to the end of the heap. The heap property is repaired by comparing the added element with its parent and moving the added element up a level (swapping positions with the parent). This process is called "percolation up". The comparison is repeated until the parent is larger than or equal to the percolating element. Insert 5 in the min heap 6 7 18 25 10 8 17 15 10 7 6 10 6 7 18 25 10 8 6 7 18 25 10 8 5 5<18, swap 6 7 5 25 10 8 18 5<6, swap 5 7 6 25 10 8 18 11 1.1.5.1 Deletion The minimum (or maximum) element can be found at the root. We remove the root and replace it with the last element of the heap and then restore the heap property by percolating down. Remove 5 1.2 Graph A graph is another abstract data structure with nodes (or vertices) that are connected by edges. Graphs can be directed or undirected. In directed graphs, edges point from the node at one end to the node at the other end. In undirected graphs, the edges simply connect the nodes at each end. The edges could represent distance or weight. Undirected Graph Directed Graph Graph are useful in cases where you have things that connect to other things. Nodes and edges could, for example, respectively represent cities and highways, routers and Ethernet cables, or Facebook users and their friendships. Two nodes connected by an edge are adjacent or neighbors. If a graph is weighted, each edge has a weight. The weight could, for example, represent the distance between two locations, or the cost or time it takes to travel between the locations. node edge 5 7 6 25 10 8 25 7 6 10 8 6 is the smallest children 25>6, swap

6

7 25

10 8

12

A graph is cyclic if it has at least one cycle (an unbroken series of nodes with no repeating nodes or edges

that connects back to itself). Graphs without cycles are acyclic.

1.2.1 Graph Representation

There are two well-known implementations of a graph, the adjacency matrix and the adjacency list.

1.2.1.1 Adjacency Matrix

One of the easiest ways to implement a graph is to use a two-dimensional matrix of size V × V where

V is the number of vertices in a graph. In this matrix implementation, each of the rows and columns

represent a vertex in the graph. The value that is stored in the cell at the intersection of row v and

column w indicates if there is an edge from vertex v to vertex w. A value in a cell represents the weight of

the edge from vertex v to vertex w (for unweighted graphs we use 1).

1 2 3 4 5

1 1

2 1 1

3 1

4 1 1 1

5 1 1 1

The advantage of the adjacency matrix is that it is simple, and for small graphs it is easy to see which nodes

are connected to other nodes. However, notice that most of the cells in the matrix are empty. Because most

of the cells are empty we say that this matrix is “sparse”. A matrix is not a very efficient way to store sparse

data. The adjacency matrix is a good implementation for a graph when the number of edges is large (in the

order of |V|2). A matrix is full when every vertex is connected to every other vertex. There are few real

problems that approach this sort of connectivity.

1.2.1.2 Adjacency List

A more space-efficient way to implement a sparsely connected graph is to use an adjacency list. In an

adjacency list implementation we keep a master list of all the vertices in the Graph object and then each

vertex object in the graph maintains a list of the other vertices that it is connected to.

1 4

2 4 5

3 5

4 1 2 5

5 3 2 4

graph = [

[0, 0, 0, 1, 0],

[0, 0, 0, 1, 1],

[0, 0, 0, 0, 1],

[1, 1, 0, 0, 1],

[0, 1, 1, 1, 0],

]

With list:

graph = [

[3],

[3, 4],

[4],

[0, 1, 4],

[2, 1, 3],

]

With dictionary:

graph = {

0: [3],

1: [3, 4],

2: [4],

3: [0, 1, 4],

4: [3, 1, 3],

}

13

The advantage of the adjacency list implementation is that it allows us to compactly represent a sparse

graph. The adjacency list also allows us to easily find all the links that are directly connected to a particular

vertex.

Python does not have a graph data type. To use graphs you can either use the networkx module (pip install

networkx) or implement it yourself

import networkx as nx

G=nx.Graph()

G.add_node(“A”)

G.add_node(“B”)

G.add_node(“C”)

G.add_edge(“A”,”B”)

G.add_edge(“B”,”C”)

G.add_edge(“C”,”A”)

print(“Nodes: ” + str(G.nodes()))

print(“Edges: ” + str(G.edges()))

For more details on networkx, read https://networkx.github.io/documentation/stable/_downloads/networkx_reference.pdf

class Graph(object):

def __init__(self, graph_dict=None):

if graph_dict == None:

graph_dict = {}

self.__graph_dict = graph_dict

def add_vertex(self, vertex):

if vertex not in self.__graph_dict:

self.__graph_dict[vertex] = []

def add_edge(self, edge):

edge = set(edge)

(vertex1, vertex2) = tuple(edge)

if vertex1 in self.__graph_dict:

self.__graph_dict[vertex1].append(vertex2)

else:

self.__graph_dict[vertex1] = [vertex2]

def vertices(self):

return list(self.__graph_dict.keys())

def edges(self):

return self.__generate_edges()

1.2.2 BFS and DFS for a graph

Given a graph G and a starting vertex s, a breadth first search proceeds by exploring edges in the graph

to find all the vertices in G for which there is a path from s. The remarkable thing about a breadth first

search is that it finds all the vertices that are a distance k from s before it finds any vertices that are a

distance k+1.

Nodes: [‘A’, ‘C’, ‘B’]

Edges: [(‘A’, ‘C’), (‘A’, ‘B’), (‘C’, ‘B’)]

initializes the graph object on adjacency list mode
using a dictionary. If no dictionary or None is given,
an empty dictionary will be used

edges can be provided using sets, lists or tuples

https://networkx.github.io/documentation/stable/_downloads/networkx_reference.pdf

14

BFS for a graph is similar to Breadth First Traversal of a tree. The only catch here is, unlike trees, graphs

may contain cycles, so we may come to the same node again. To avoid processing a node more than once,

we use a boolean visited array.

DFS explores the graph depth wise, that is we move down the graph from one node to another till the

time we are out of all unexplored edges at a node, once we reach that condition, we backtrack and move to

node one before and so on. In this traversal, we start with a node s, and mark it as visited. Now s be our

current node u:

• Move to v where there is an edge (u,v).

• If v is already visited, we move back to u.

• If v is not visited, mark v as visited and make v as current node and repeated above steps.

• At some point all the edges at u will lead to already visited node, then we drop u and move to node

which was visited before u.

This backtracking will make us reach the start node s again, and when there is no edge left to be explored

at s, the graph traversal is done. DFS for a graph is similar to preorder traversal of a tree. The only catch

here is, unlike trees, graphs may contain cycles, so we may come to the same node again. To avoid

processing a node more than once, we also use a boolean visited array.

1.2.3 Topological Sorting

A topological sort takes a directed acyclic graph (DAG) and produces a linear ordering of all its vertices

such that if the graph G contains an edge (v,w) then the vertex v comes before the vertex w in the ordering.

DAGs are used in many applications to indicate the precedence of events. For example, software project

schedules, precedence charts for optimizing database queries, and multiplying matrices. Any linear

ordering in which all the arrows go to the right is a valid solution. Topological sorting for a graph is not

possible if the graph is not a DAG.

Topological sorting can be accomplished in two different ways:

• Identifying incoming edges:

Step 1- Identify vertices that have no incoming edge (in-degree is zero). If no such edges exist,

graph is not DAG. If there are several vertices of in-degree zero, select one.

15

Step 2- Delete this vertex of in-degree zero and all its outgoing edges from the graph. Place it in

the output.

Step 3- Repeat Steps 1 and Step 2 until graph is empty

• Using DFS:
The topological sort is a simple but useful adaptation of a depth first search:

Step 1- Perform DFS for some graph G. (we call DFS to compute the finish times for each of the

vertices)

Step 2- Store the vertices in a list in decreasing order of finish time

DFS starting

from node A

1, 10

5, 6 4, 7

3, 8

2, 9

1, 10

5, 6 4, 7

3, 8

2, 9

16

Step 3- Return the ordered list as the result of the topological sort

list = [‘A’, ‘B’, ‘C’, ‘D’, ‘E’ ]

1.3 Hash Tables

A hash table is a collection of items which are stored in such a way as to make it easy to find them later.

Each position of the hash table, often called a slot, can hold an item and is named by an integer value

starting at 0. For example, we will have a slot named 0, a slot named 1, a slot named 2, and so on. Initially,

the hash table contains no items so every slot is empty. We can implement a hash table by using a list with

each element initialized to the special Python value None.

0 1 2 3 4 5 6 7 8 9 10

None None None None None None None None None None None

The mapping between an item and the slot where that item belongs in the hash table is called the hash

function. In simple terms, a hash function maps a big number or string to a small integer that can be used

as index in hash table. A good hash function must be efficiently computable and should uniformly distribute

the keys (each table position equally likely for each key). Assume that we have the set of integer items 32,

53, 79, 15, 93, and 67. The modular method is the most common hash function, it simply takes an item and

divides it by the table size, returning the remainder as its hash value (hash(item)=item%11).

Item Hash Value (item%11)

32 10

53 9

79 2

15 4

93 5

67 1

Once the hash values have been computed, we can insert each item into the hash table at the designated

position. Note that 6 of the 11 slots are now occupied. This is referred to as the load factor, and is commonly

denoted by λ =
# of items

table size
(For this example, λ =

6

11
).

0 1 2 3 4 5 6 7 8 9 10

None 67 79 None 15 93 None None None 53 32

Now when we want to search for an item, we simply use the hash function to compute the slot name

for the item and then check the hash table to see if it is present. This searching operation takes constant

time, since a constant amount of time is required to compute the hash value and then index the hash table

at that location. However, this technique is going to work only if each item maps to a unique location in

the hash table. For example, if we want to add the item 42 to our collection, it would have a hash value of

9 (42%11== 9). Since 53 also had a hash value of 9, we would have a problem. According to the hash

function, two or more items would need to be in the same slot. This is referred to as a collision, and it

creates problems for the hashing technique.

hash table of

size m=11

17

1.3.1 Hash Functions

Given a collection of items, a hash function that maps each item into a unique slot is referred to as

a perfect hash function. If we know the items and the collection will never change, then it is possible to

construct a perfect hash function. Unfortunately, given an arbitrary collection of items, there is no

systematic way to construct a perfect hash function. Thus, the goal is to create a hash function that

minimizes the number of collisions, is easy to compute, and evenly distributes the items in the hash table.

There are several ways to extend the simple modular method.

1.3.1.1 Folding method

This method for constructing hash functions begins by dividing the item into equal-size pieces (the last

piece may not be of equal size). These pieces are then added together to give the resulting hash item. For

example, if our item was the phone number 436-555-4601, we would take the digits and divide them into

groups of 2 (43,65,55,46,01). After the addition, 43+65+55+46+01, we get 210. If we assume our hash

table has 11 slots, then we need to perform the extra step of dividing by 11 and keeping the remainder. In

this case 210%11 is 1, so the phone number 436-555-4601 hashes to slot 1. Some folding methods go one

step further and reverse every other piece before the addition. For the above example, we

get 43+56+55+64+01=219 which gives 219%11=10.

1.3.1.2 Mid-square method

This method for constructing hash functions begins by squaring the item, and then extract some portion

of the resulting digits. For example, for the item 44, we would first compute 442=1,936. By extracting the

middle two digits, 93, and performing the remainder step, we get 93%11=5).

Item Hash Value (item%11) Mid-square

32 10 2

53 9 3

79 2 2

15 4 3

93 5 9

67 1 4

We can also create hash functions for character-based items such as strings. The word “home” can be

thought of as a sequence of ordinal values. We can then take these four ordinal values, add them up, and

use the modular method to get a hash value:

>>> ord(‘h’)

104

>>> ord(‘o’)

111

>>> ord(‘m’)

109

>>> ord(‘e’)

101

ord(c): Given a string representing one Unicode

character, return an integer representing the

Unicode code point of that character

h o m e

104 + 111 + 109 + 101 = 425

425 % 11 = 7

18

You may be able to think of a number of additional ways to compute hash values for items in a collection.

The important thing to remember is that the hash function has to be efficient so that it does not become the

dominant part of the storage and search process.

1.3.2 Collision Resolution

When two items hash to the same slot, we must have a systematic method for placing the second item

in the hash table. This process is called collision resolution. As we stated earlier, if the hash function is

perfect, collisions will never occur. However, since this is often not possible, collision resolution becomes

a very important part of hashing.

1.3.2.1 Open Addressing

One method for resolving collisions looks into the hash table and tries to find another open slot to hold

the item that caused the collision. A simple way to do this is to start at the original hash value position and

then move in a sequential manner through the slots until we encounter the first slot that is empty (we may

need to go back to the first slot (circularly) to cover the entire hash table). This collision resolution process

is referred to as open addressing, where we try to find the next open slot or address in the hash table. By

systematically visiting each slot one at a time, we are performing an open addressing technique

called linear probing.

When we attempt to place 60 into slot 5, a collision occurs. Under linear probing, we look sequentially,

slot by slot, until we find an open position. In this case, we find slot 6. Again, 71 should go in slot 5 but

must be placed in slot 7 since it is the next open position. The final value of 20 hashes to slot 9. Since slot

9 is full, we begin to do linear probing. We visit slot 10 and finally find an empty slot at position 0.

0 1 2 3 4 5 6 7 8 9 10

20 67 79 None 15 93 60 71 None 53 32

Once we have built a hash table using open addressing and linear probing, it is essential that we utilize the

same methods to search for items. Assume we want to look up the item 93. When we compute the hash

value, we get 5. Looking in slot 5 reveals 93, and we can return True. However, when we look for item 20,

the hash value is 9, and slot 9 is currently holding 53. We cannot simply return False since we know that

there could have been collisions. We are now forced to do a sequential search, starting at position 10,

looking until either we find the item 20 or we find an empty slot.

A disadvantage to linear probing is the tendency for clustering; items become clustered in the table.

This means that if many collisions occur at the same hash value, a number of surrounding slots will be

filled by the linear probing resolution. This will have an impact on other items that are being inserted, for

example, when we try to add the item 26, a cluster of values hashing to 5 had to be skipped to finally find

an open position.

0 1 2 3 4 5 6 7 8 9 10

20 67 79 None 15 93 60 71 None 53 32

cluster

19

One way to deal with clustering is to extend the linear probing technique so that instead of looking

sequentially for the next open slot, we skip slots, thereby more evenly distributing the items that have

caused collisions. This will potentially reduce the clustering that occurs. Below there is a hash table where

the collision resolution is done with a +3 probe. This means that once a collision occurs, we will look at

every third slot until we find one that is empty.

0 1 2 3 4 5 6 7 8 9 10

71 67 79 None 15 93 None 20 60 53 32

The general name for this process of looking for another slot after a collision is rehashing. In

general, hash_index=(hash_index + skip) % hash_table_size. It is important to note that the size of the

“skip” must be such that all the slots in the table will eventually be visited. Otherwise, part of the table will

be unused. To ensure this, it is often suggested that the table size be a prime number.

A variation of the linear probing idea is called quadratic probing. This method is similar to linear

probing and the only difference is that if you were to try to insert into a space that is filled you would first

check 12 = 1 element away then 22 = 4 elements away, then 32 =9 elements away then 42=16 elements away

and so on.

0 1 2 3 4 5 6 7 8 9 10

None 67 79 71 15 93 60 20 None 53 32

With linear probing we know that we will always find an open spot if one exists (it might be a long

search but we will find it). However, this is not the case with quadratic probing unless you take care in the

choosing of the table size. In order to guarantee that the quadratic probes will hit every single available

spot eventually, the hash table size must be a prime number and never be more than half full (even by one

element).

1.3.2.2 Chaining

An alternative method for handling the collision problem is to allow each slot to hold a reference to a

collection (or chain) of items. Chaining allows many items to exist at the same location in the hash table.

When collisions happen, the item is still placed in the proper slot of the hash table. As more and more items

hash to the same location, the difficulty of searching for the item in the collection increases.

0 1 2 3 4 5 6 7 8 9 10

None None None None None

67 79 15 93 53 32

60 20

71

20

When we want to search for an item, we use the hash function to generate the slot where it should reside.

Since each slot holds a collection, we use a searching technique to decide whether the item is present. The

advantage is that on the average there are likely to be many fewer items in each slot, so the search is perhaps

more efficient.

It is important to mention that if the load factor (λ) is small, then there is a lower chance of collisions,

meaning that items are more likely to be in the slots where they belong. If λ is large, meaning that the table

is filling up, then there are more and more collisions. This means that collision resolution is more difficult,

requiring more comparisons to find an empty slot.

1.3.3 Hashing Implementation

Python has useful data types that could help us to develop our own hashing algorithm. For example, a

dictionary that can store key-data pairs could be used to look up a data value based on a given key.

However, we are living in a digitally disrupted world, where sensitive information flows among users and

organizations. As a result, information security has become very important in most organizations. The main

reason for this is that access to information and the associated resources has become easier because of the

developments in distributed processing, for example the Internet and electronic commerce. The result is

that organizations need to ensure that their information is properly protected and that they maintain a high

level of information security.

Hash functions are used inside some cryptographic algorithms, in digital signatures, message

authentication codes, manipulation detection, fingerprints, checksums (message integrity check), hash

tables, password storage and much more. As a Python programmer you may need these functions to check

for duplicate data or files, to check data integrity when you transmit information over a network, to securely

store passwords in databases, or maybe some work related to cryptography. The Federal Information

Processing Standards, as well as the Internet Engineering Task Force have defined and developed different

hash algorithms that are widely use all over the Internet. Python’s hashlib module implements a common

interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash

algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s

MD5 algorithm (defined in Internet RFC 1321).

class HashTable:

def __init__(self):

self.size = 11

self.slots = [None] * self.size

self.data = [None] * self.size

def hashfunction(self,key,size):

return key%size

def rehash(self,oldhash,size):

return (oldhash+1)%size

Class structure to develop

our own hashing

techniques

21

list(map(hash, [0, 1, 2, 3]))

# Output: [0, 1, 2, 3]

list(map(hash, [‘0′,’1′,’2′,’3’]))

#Output: [3512700405625046622, -2154559771013955726, 4558503884423229281, -5090714507869946363]

hash(‘0’)

#Output: 3512700405625046622

import hashlib

print(hashlib.algorithms_available)

# Output: {‘sha’, ‘sha224’, ‘sha3_256’, ‘DSA’, ‘shake_128’, ‘sha512’, ‘SHA224’, ‘RIPEMD160’,

‘dsaEncryption’, ‘ripemd160’, ‘sha3_512’, ‘MD5’, ‘SHA384’, ‘SHA256’, ‘md5’, ‘blake2b’,

‘sha3_384’, ‘sha384’, ‘sha256’, ‘sha3_224’, ‘ecdsa-with-SHA1’, ‘whirlpool’, ‘SHA’, ‘shake_256’,

‘sha1’, ‘md4’, ‘SHA1’, ‘DSA-SHA’, ‘SHA512’, ‘blake2s’, ‘MD4’, ‘dsaWithSHA’}

hash_object = hashlib.md5(b’Hello World’)

print(hash_object.hexdigest())

# Output: b10a8db164e0754105b7a99be72e3fe5

value = input(‘Enter string to hash: ‘)

hash_object = hashlib. sha256 (value.encode())

print(hash_object.hexdigest())

# Output:

Enter String to hash: Hello World

a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e

hashing using Python’s built-in hash function. Python is

using different hash() function depending on the type

of data

hashing using Python’s hashlib module

22

Appendix 1. Depth First Search Example

23

Appendix 2. Breadth First Search Example

24

References

Data Structures and Algorithms in Python. Michael H. Goldwasser, Michael T. Goodrich, and

Roberto Tamassia. Chapters 6, 7, 8, 10, 14

https://www.tutorialspoint.com/data_structures_algorithms/stack_algorithm.htm

http://en.wikipedia.org/wiki/Priority_queue

https://www.tutorialspoint.com/data_structures_algorithms/tree_data_structure.htm

https://docs.python.org/3/library/hashlib.html

https://www.tutorialspoint.com/data_structures_algorithms/stack_algorithm.htm
http://en.wikipedia.org/wiki/Priority_queue
https://www.tutorialspoint.com/data_structures_algorithms/tree_data_structure.htm
https://docs.python.org/3/library/hashlib.html