STL Set and Map
Data Structures and Abstractions
Sets
Lecture 19
*
Sets
By definition, Sets are unordered collections of data.
A = {2, 1}
B = {1, 2, 2, 1, 8/4}
C = {x: x2 – 3x + 2 = 0}
Note that A = B = C i.e. the sets above are equal to each other [1]
They are used in maths as well as in many other fields.
But in the computing domain, there are some variations in the way sets are dealt with.
Some sets can contain the actual data values, whilst others only keep a record of the presence or absence of data values. STL Bitsets
Elements of a set may not be repeated – a common variation [2]. If repeated elements are needed then a multiset or bag is used. STL multiset
A set is explicitly defined to be unordered but some implementations require ordering for efficiency reasons. The STL set is an associative container and in STL associative containers are ordered.
The last two variations break the Set abstraction but STL designers decided that is fine so sets and multisets are provided.
*
[1] Sets remain the same if the elements are repeated or rearranged.
For basic ideas about sets, see https://www.mathsisfun.com/sets/sets-introduction.html
[2] Also see unordered_set (C++11). Access is fast as key hashing is used. No repeats.
Think about the various operations that need to be efficient. Also how often certain operations are used in normal operation.
For example, you may have a data structure that is loaded with data once, and after that it is normally access or search that gets used most often.
Sets
There are some unique operations for sets:
subset
union
intersection
difference
element
Subset
Set B is a subset of Set A if all elements in B are also elements of A.
Example 1:
if A = {a, b, c, d, g} and B = {c, g}
then B is a subset of A.
Example 2:
if A = {a, b, c, d, g} and B = {c, g, u, w}
then B is not a subset of A.
A
B
A
B
*
Subset Animation [1]
1
4
6
9
11
78
12
24
1
4
6
9
13
subset =
true
false
END
*
1
It is an animation, so run the slide show.
In implementations where sets are ordered – STL
If the sets are not ordered (sorted), would this algorithm work?
What would you have to do if the sets are not sorted and which algorithm would be more efficient.
Subset Pseudo-code
IsSubsetOf (other)[1]
Boolean subset = true
WHILE more elements in this set AND
more elements in the other set AND
subset = true
IF this element = other element
Get next element from each set
ELSE IF this element < other element
subset = false
ELSE
Get next element from other set
ENDIF
ENDWHILE
IF more elements in this set
subset = false
ENDIF
END IsSubsetOf
*
[1] What are the pre-conditions of this algorithm?
Check the way the algorithms works. Examine the animation earlier to see what would happen to the algorithm
if this pre-condition is violated.
Union (or, ||)
The union of Set A and Set B is the collection containing all elements that are in either of them, removing double ups. [1]
For example:
if A = {a, b, c, d, g} and B = {c, g, u, w}
then C = A or B = {a, b, c, d, g, u, w}
C is shown in yellow:
A
B
*
1.
Using sets which do not allow duplicate elements
Union Animation
1
4
6
9
11
78
12
24
1
4
6
9
13
1
4
6
9
11
24
12
13
78
newSet
END
*
Union Pseudo-code
Union (other, newSet) [1]
WHILE more elements in this set AND
more elements in the other set
IF this element = other element
Add this element into newSet
Get next element from each set
ELSE IF this element < other element
Add this element to newSet
Get next element from this set
ELSE
Add other element to newSet
Get next element from other set
ENDIF
ENDWHILE
WHILE more elements in this set
Add this element to newSet
Get next element from this set
ENDWHILE
WHILE more elements in other set
Add other element to newSet
Get next element from other set
ENDWHILE
END Union [2]
*
[1] What is the pre-condition of this algorithm?
[2] What is the post-condition of this algorithm in terms of the property of the new set?
Intersection (and, &&)
The intersection of Set A and Set B is the collection containing all elements that appear in both of them.
For example:
if A = {a, b, c, d, g} and B = {c, g, u, w}
then C = A and B = {c, g}
C is shown in yellow:
A
B
*
Intersection Animation
1
4
6
9
11
78
12
24
1
4
6
9
13
1
4
6
9
newSet
END
*
Intersection Pseudo-code
Intersection (other, newSet) [1]
WHILE more elements in this set AND
more elements in the other set
IF this element = other element
Add this element into newSet
Get next element from each set
ELSE IF this element < other element
Get next element from this set
ELSE
Get next element from other set
ENDIF
ENDWHILE
END Intersecton
*
[1] As before can you see the pre and post conditions?
Difference (-)
The difference of Set B from Set A is the collection containing all elements that are in A but not in B.
For example:
if A = {a, b, c, d, g} and B = {c, g, u, w}
then C = A - B = {a, b, d}
C is shown in yellow:
B
A
*
Difference Animation
1
4
6
9
11
78
12
24
1
4
6
9
13
11
24
12
78
newSet
END
*
Difference Pseudo-code
Difference(other, newSet) [1]
WHILE more elements in this set AND
more elements in the other set
IF this element = other element
Get next element from each set
ELSE IF this element < other element [2]
Add this element to newSet
Get next element from this set
ELSE
Get next element from other set
ENDIF
ENDWHILE
WHILE more elements in this set
Add this element to newSet
Get next element from this container
ENDWHILE
END Difference
*
[1]
Pre and post conditions?
[2]
What would you do if
This element > other element?
Elements of the other set are to be removed from this set.
The STL Set
There is an STL set in C++.
It requires the
As for the others it is declared using:
typedef set
IntSet aset;
The best place to go for information is (again as before):
http://www.cppreference.com/cppset/index.html
*
STL Set Methods [1]
aset.clear() Empties the set
aset.empty() Returns true if the set is empty
aset.begin() Returns an iterator to the first element in the set
aset.end() Returns an iterator to past the end of the set
aset.erase() Erase elements from the set
aset.find() Find elements in the set
aset.insert() Insert elements into the set
aset.size() Returns the size of the set
aset.swap() Swaps the contents of two sets
*
See textbook chapter on Standard Template Library, subsection on “Member Functions Common to all Containers”
Set Algorithms
For reasons of general utility, routines that could have been placed in the STL set class were placed in the algorithm class:
set_difference(set1.begin(), set1.end(),
set2.begin(), set2.end());
set_intersection(set1.begin(), set1.end(),
set2.begin(), set2.end());
set_union(set1.begin(), set1.end(),
set2.begin(), set2.end());
These should have been operations on sets, because it would have been intuitive.
Or (preferred) as helper functions available when #include
As these routines have general utility, they can applied to other linear data structures like vectors. This approach can be argued to be good, as re-use of code is happening.
And there is no subset at all but a subset set helper function can be written using algorithm’s includes or set::find() function.
*
Helper functions are not members of the class.
They provide added functionality to the class.
Syntax:
#include
bool includes( iterator start1, iterator end1, iterator start2, iterator end2 );
The includes() algorithm returns true if every element in [start2,end2) is also in [start1,end1). Both of the given ranges must be sorted in ascending order and must not contain duplicate elements.
includes() runs in linear time.
Some might say that STL Algorithms is like snake oil
“good for Man or Beast”
Treats “rheumatism, neuralgia, sciatica, lame back, lumbago, contracted muscles, toothache, sprains, swellings, “ .. cures frost bites, chill blains, bruises, sore throat, [and] bites of animals, insects and reptiles “
http://www.csicop.org/sb/9812/snakeoil.html
// Do the set difference using the insert iterator
set_difference(set1.begin(), set1.end(),
set2.begin(), set2.end(), resultItr);
An abstract representation could have operator-(..) defined, so that:
resultSet = set1 – set2
For this reason—unless the task is trivial—the STL set needs to be encapsulated or a helper operator/function is provided.
Prefer the helper operator/function as this means the least amount to code for a given functionality.
The helper operator or function uses only the set’s public interface.
Readings
Textbook: Standard Template Library, section on Associative containers relating to set and multiset.
Library EReserve: Preiss, Data structures and algorithms with object-oriented design, Chapter 12
http://www.cplusplus.com/reference/stl/
http://en.cppreference.com/w/cpp/container/set
http://en.cppreference.com/w/Main_Page