DB Fundamentals
DB Fundamentals
Indexes
Indexes
Indexes are vital to efficiently accessing data
An Index is a set of attributes whose values are used to control the order of tuple storage
An index key may be or may not be the primary key
They are generally chosen based on data columns from a table
The data from these columns may be stored in a separate structure known as an index page
Indexes are also stored in the database
Indexes
Indexes are vital to efficiently accessing data
It is like the index of a book in that it is additional to the contents of the base tables
Example Base Table:
ID FirstName MiddleName LastName
1 Ken J Sanchez
2 Terri Lee Duffy
3 Roberto Tamburello
4 Rob Walters
5 Gail A Erickson
6 Jossef H Goldberg
If there is no index, the search for the tuple with LastName = Goldberg needs 6 reads
Every row must be scanned until the last row is reached and checked
6 Rows to scan before Golberg is reached
SELECT * WHERE
LastName = ‘Goldberg’
Indexes
Indexes are vital to efficiently accessing data
It is like the index of a book in that it is additional to the contents of the base tables
Example Index (LastName, FirstName)
ID FirstName MiddleName LastName
1 Ken J Sanchez
2 Terri Lee Duffy
3 Roberto Tamburello
4 Rob Walters
5 Gail A Erickson
6 Jossef H Goldberg
Now with the (LastName, FirstName) index the search needs only 3 reads
ID LastName FirstName
2 Duffy Terri
5 Erickson Gail
6 Goldberg Jossef
1 Sanchez Ken
3 Tamburello Roberto
4 Walters Rob
INDEX LastName ASC
Base Table
VS
INDEX
LastName ASC
Then by FirstName ASC
In this example, if the index contains the Last and First Names and these are the only attributes being selected, then the index only needs to be scanned from start to Goldberg = 3 records.
If the query included other attributes not in the index, once the index position was reached, further reads would occur by moving from the index position to the base table to get the remaining attribute values (ie, 4+ reads)
4
Indexes
Indexes are vital to efficiently accessing data
The size of each index is smaller than the base table
This makes it faster to search
Multiple indexes can be created for a single base table
Indexes are generally created for each type of complex query to speed it up
Each index speeds up the search involving the value of the index key only. It is not helpful for searching the values of other attributes
In the previous example, the index does not help search on Middle Name
Indexes are generally stored as binary trees
These are efficient for searching
5
Indexes – BTree
Indexes are vital to efficiently accessing data – they get balanced
In a balanced tree, the length of the path from the Root Node to every leaf node is the same
This means access times are constant and optimal
Not that the IDs point to the base table IDs and are used to retrieve the appropriate records
6
Indexes – BTree
Indexes are quite complex
These are covered next year
An index can be sparse/dense
Sparse
Only some records may appear in the index
The Index points to the actual leaf nodes that contain all the records
Dense
More attribute values in the index (more records
to search)
The Index
7
Indexes in MS-SQL
There are two main types of indexes
Clustered Indexes
The Leaf nodes of the B-Tree contains the actual data rows for the table
There can only be one clustered index per table
In MS-SQL this is generally the Primary Key
Non-Clustered Indexes
The Leaf nodes of the B-Tree contain only the index data with a pointer to the associated data page where the remaining data resides
In MS-SQL these can be alternate keys or just an alternate index to the PK index
There is also a 3rd UNIUQE index type (for implementing alternate keys)
Both of these can be over unique or non-unique data.
However, If the clustered index is not a unique index, SQL Server makes any duplicate keys unique by adding an internally generated value called a uniqueifier.
Adding a uniqueifier adds overhead in calculating and in storing it.
If this overhead will be noticeable depends on several factors.
How much data the table contains.
What is the rate of inserts.
How often is the CI used in a select (when no covering indexes exist, pretty much always).
Non-unique clustered indexes are quite common when range scans on particular (non-unique) column is the prevalent access pattern
8
Indexes
CREATING Indexes SQL
INDEX – SQL Creation
— Creating a PK automatically creates a Clustered Index on that key
ALTER TABLE Customers ADD CONSTRAINT PK_Customers Primary Key(CustomerID)
— Add a clustered index that is not the PK
CREATE CLUSTERED INDEX Index_Customers ON Customers(LastName, FirstName)
ALTER TABLE Customers ADD CONSTRAINT PK_Customers Primary Key(CustomerID)
INDEX – SQL Creation
— Put back the standard index/PK (note still not the same)
CREATE UNIQUE CLUSTERED INDEX Index_Customers ON Customers(CustomerID)
ALTER TABLE Customers ADD CONSTRAINT PK_Customers Primary Key(CustomerID)
— create non-clustered index
CREATE NONCLUSTERED INDEX LNameFName_Customers ON Customers(LastName, FirstName);
INDEX – SQL Removal
— Drop the Clustered Index
DROP INDEX Index_Customers ON Customers
/docProps/thumbnail.jpeg