CS 112 – Data Structures
Sesh Venugopal
Sequential Search Average Case
return false;
}
Sesh Venugopal
Sequential Search on a list: Array or Linked List
// on an array of integers
for (int i=0; i < arr.length; i++) {
if (target == arr[i]) { return true;
// on a linked list, Node has int data for (Node ptr=front; ptr != null;
ptr=ptr.next) { if (target == ptr.data) {
return true; return false;
Big O running time is the same for either version:
Success:
Worst case: O(n), Best case: O(1)
Failure: O(n)
CS 112: Sequential Search Average Case 2
} }}
Sequential Search on a list: Average Count for Success
Basic Operation: Comparing target against list (array or linked list) item
What is the average number of comparisons for success?
A quick calculation would be to average out the best case (1 comparison) and the worst case (n comparisons), so (n+1)/2
Sesh Venugopal CS 112: Sequential Search Average Case 3
Sequential Search on a list: Average Count for Success
A more general approach to compute average, like you would do to find average of a bunch of n numbers, is to add up the numbers and and divide by n (e.g. your average score on 3 exams)
In the case of sequential search, the numbers to be averaged are the comparisons for matching at various positions.
If the elements of the list have positions numbered 1 thru n, and it takes Ci comparisons to match at the i-th position
then the average number of comparisons over all items is:
Sesh Venugopal CS 112: Sequential Search Average Case 4
Sequential Search on a list: Average Count for Success
It takes 1 comparison to match against the 1st item (C1), 2 comparisons to match against the second item (C2), etc.
So the average number of comparisons for success is:
(1+2+3+...+n)/n = n*(n+1)/2/n = (n+1)/2
Sesh Venugopal CS 112: Sequential Search Average Case 5
Average Count for Success:
The Assumption of Equal Likelihood
This formula we are using for the average assumes that all matches are equally likely.
For instance, if you have a list of 10 items, and 100 matches are made on the list, then under this assumption, each item would be matched 10 times (i.e. same number of matches for all items)
So we can rethink the formula as a weighted average:
C1*1/n+C2*1/n+...+Cn*1/n=1/n*(C1 +C2 +...+Cn)
Each item’s comparison count for match is weighted with its likelihood or probability of being matched – which is 1/n since all items are equally likely to be matched
Sesh Venugopal CS 112: Sequential Search Average Case 6
Average Count for Success: General Formula, Random Likelihoods
In practice, it is absurd to believe that all items will be equally likely to be matched on. More generally, there will be biases toward a few items—they will be a lot more likely to be matched on than others. (Think of Google search trends, with its day-to-day fluctuations.)
So, in practice, we will need to assign a different probability of match for each item, say Pi for the i-th item . Which generalizes the weighted comparisons formula for average to:
C1*P1 + C2*P2 + ... + Cn*Pn
Each item’s comparison count for match is weighted with its likelihood or probability of being matched – but the probabilities are all different, in general (of course the sum of all probabilities equals 1)
Sesh Venugopal CS 112: Sequential Search Average Case 7
Average Count for Success: General Formula, Random Likelihoods
Here’s an example:
Comparisons
Probabilities for match
Using the weighted average formula gives us the following for average number of comparisons:
Note that the value 3 (1/7) is twice as likely to matched as 12 (1/14), and 25 (4/7) is 4 times as likely to be matched as 3.
Sesh Venugopal CS 112: Sequential Search Average Case 8
Rearranging items based on search pattern
Comparisons 1234567
3
12
86
17
15
25
6
Probabilities for match
1111141 -- -- -- -- -- -- -- 7 142114217 21
Since the value 25 is 4 times as likely to be matched on as value 3, shouldn’t 25 be all the way up front so that the search time (number of comparisons) is reduced on average?
And similarly for all other items: the more likely they are to be matched on, the more toward the front they should be, to reduce the number of comparisons to get to match on them
Rearranging the items in decreasing order of their match probabilities results in the minimum average number of comparisons.
Sesh Venugopal CS 112: Sequential Search Average Case 9
Rearranging items based on search pattern
Rearrange in descending probabilities
Comparisons 1234567 1234567
25
3
12
17
15
86
6
3
12
86
17
15
25
6
Probabilities for match
1111141 -- -- -- -- -- -- -- 7 142114217 21
4111111 -- -- -- -- -- -- -- 7 7 1414212121
Average number of comparisons for success AFTER rearrangement:
1*4/7 + 2*1/7 + 3*1/14 + 4*1/14 + 5*1/21 + 6*1/21 + 7*1/21 = 2.2
The average after rearrangement is less than half the original average of 4.7
Sesh Venugopal CS 112: Sequential Search Average Case 10
Adaptive Shuffling
Rearranging ALL items in one shot, knowing the spread of probabilities of matches, is only possible after the spread has attained long-term stability (will change very little over time). But this kind of long-term stability is not very likely to occur in practice – there is usually lots of fluctuation in short periods of time.
There is an alternative: shuffle ”adaptively” by moving an item, as soon as it is matched, toward the front (the expectation is that it is quite likely to be matched again very soon) – moving toward front will reduce comparisons for match
There are two possible kinds of adaptive moves.
One option is to move the just-matched item one spot toward the front.
Swap 15 with 17
3
12
86
17
15
25
6
3
12
86
15
17
25
6
Sesh Venugopal CS 112: Sequential Search Average Case 11
Adaptive Shuffling
Another, more aggressive option, is to move the just-matched item all the way to the first spot
Swap 15 with 3
But it would now push the previous 1st item (e.g. 3) backward, thereby undoing (perhaps drastically) the previous move that brought that item (3) to the 1st spot.
3
12
86
17
15
25
6
15
12
86
17
3
25
6
Sesh Venugopal CS 112: Sequential Search Average Case 12
Adaptive Shuffling
A better way to bring the just-matched item to the 1st spot is shuffle all preceding items one step back (like in insertion sort!)
But this would be much more expensive, O(n) time in the worst case
However, it would only take O(1) time in a linked list
15
3
12
86
17
25
6
3
12
86
17
15
25
6
Sesh Venugopal CS 112: Sequential Search Average Case 13
General Formula, Random Likelihoods: Widely Applicable
The general formula for weighted average given probabilities for match can be applied to ANY search algorithm on ANY data structure
C1*P1 + C2*P2 + ... + Cn*Pn
The actual number of comparisons (Ci’s) for match would depend on
the algorithm and the data structure
For example, this formula can be applied for binary search on a sorted array – you will need to compute the number of comparisons to find a match at each array position (the middle would be 1 comparison), then multiply by the associated probabilities for match.
Sesh Venugopal CS 112: Sequential Search Average Case 14