Query optimization (Oracle) Practice Lab
This practice lab session presents the concrete application of the concepts, structures and algorithms presented before, on query execution and query optimization, with the ORACLE DBMS. This system is a good example of a sophisticated optimizer based on index structures and comprehensive evaluation algorithms. All the join algorithms described in this course are indeed implemented in ORACLE. In addition, the system offers simple and practical tools (EXPLAIN in particular) to analyze the execution plan chosen by the optimizer, as well as obtain performance statistics (I / O cost and CPU cost, among others).
Starting Oracle and connecting with SQLDeveloper (this should be already tested, skip read)
You can launch the Oracle server (its Docker image) by running the command
docker run –name oracle-ee -p 1521:1521 -p 5500:5500 -e DB_PASSWD=’YOURPASSWORDHERE’ store/oracle/database-enterprise:12.2.0.1
The -p option indicates that the container’s port 1521 (the one usually used by the MySQL server) is mapped to the same port 1521 of the Docker (the one we will use); same for 5500.
You have to replace YOURPASSWORDHERE by a not-too-simple password, at least 8 characters long, with at least one upper-case, at least one digit and at least one special character.
You now have your own Oracle server on your machine.
If you want your Oracle databases to persist (not to be deleted when you shut down the oracle image) you can use the option -v (for sharing a directory with your local file system)
-v /home/oracle/oradata:/opt/oracle/oradata
Connecting to the Oracle server is done via an Oracle client, such as the SQL Developer client. https://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html (once again, this requires a login, an Oracle Web account this time).
After installing SQL Developer, you can launch it and connect to the Oracle server by clicking on the + icon and filling the connection form as follows (using your chosen password):
If everything went well you can connect and run a test query for asking the time, such as SELECT SYSDATE FROM DUAL;
Database
Execute in SQLDeveloper, in order, one at a time, the following commands: @/YourPathHere/schema_stage1.sql
@/YourPathHere/data1.sql @/YourPathHere/data2.sql @/YourPathHere/schema_stage2.sql
Check at each execution that no error messages have been displayed (besides the initial DROP commands, when no tables exist in the database). Steps 2, 3, 4 may take a little while (be patient).
The schema of the database we will use is the following:
ARTIST (IDARTIST, LASTNAME, FIRSTNAME, DOB)
COUNTRY (CODE, NAME, LANGUAGE)
MOVIE (IDMOVIE, TITLE, YEAR, IDMES, GENRE, SUMMARY, CODECOUNTRY)
• IDMES is the identifier of the movie director
• We’ll have Moviei, for i=1,2,3,4
ROLE (IDMOVIE, DARTIST, ROLENAME)
MOVIERATER (EMAIL, LASTNAME, FIRSTNAME, PROVINCE)
RATING (IDMOVIE, EMAIL, RATE)
Oracle’s “Explain” tool
We can observe the execution plan Oracle choses for a given query by prefixing the query by the following syntax:
EXPLAIN PLAN
SET STATEMENT_ID=’CHOOSEANAMEFORIT’ FOR
YOURQUERYHERE;
The description of the plan is then stored in an Oracle system table, and we can see it under various forms, the most common encoding the tree-like shape of the plan by indented lines.
For that you need to query Oracle’s system table:
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, ‘CHOOSEANAMEFORIT’,’BASIC’));
————————————————————————————- | Id | Operation | Name | Rows | Bytes | Cost | ——————————————————————————— |0|SELECTSTATEMENT| |14 |518 | 2 | |1|TABLEACCESSFULL|ARTIST |14 |518 | 2 |
• Id : Identifier of the operation
• Operation : type of operation used (see table below)
• Name : Name of the table used
• Cost : estimated cost (no cost unit though, so not very informative)
• Rows : The number of rows Oracle estimates will be processed. It is an estimation based on
the table statistics (hence their importance…)
• Bytes : Number of bytes Oracle estimates will be processed
Alternatively, in SQL Developer you can use the menu EXPLAIN(plan) + TRACE(statistics) by pressing F6
Here are some of the available statistics we can also see along with a plan (in the so called Trace). We will focus only on BUFFER READS (from buffer) and DISK READS (from disk).
STATISTICS : ——————————————————————– 541 recursive calls
0 db block gets
8940 consistent gets
5542 physical reads
332 redo size
2220518 bytes sent via SQL*Net to client 41729 bytes received via SQL*Net from client 3761 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
56386 rows processed
What is displayed by pressing F6 is configurable in SQL Developer / Preferences / Database / Autotrace Explain Plan. For instance, if you do not see the IO_Cost in your statistics list, you need to configure and mark it in the Autotrace list.
These statistics can be interpreted as follows (not all matter for the purposes of this lab, Skip Read):
• recursive calls : number of recursive calls generated at both the user and system level. Oracle maintains tables used for internal processing. When Oracle needs to make a change to these
tables, it internally generates an SQL statement, which in turn generates a recursive call.
• db block gets: number of times a block was requested. See also ”BUFFER READS”, below.
• consistent gets: number of times a consistent read was requested for a block.
• physical reads: total number of data blocks read from disk. This number equals the value of
“DISK READS direct” plus all reads into buffer cache.
• redo size: total amount of redo generated in bytes.
• sorts (disk): number of sort operations that required at least one disk write.
• sorts (memory): number of sort operations that were performed completely in memory and
did not require any disk writes. You cannot do much better than memory sorts, except maybe no sorts at all. Sorting is usually caused by selection criteria specifications within table join SQL
Here are some operations and options in Oracle query execution plans (list by no means exhaustive, skip read initially).
OPERATION
OPTION
MEANING
AGGREGATE
GROUP BY
Computing a one-line result by grouping and aggregating
AND-EQUAL
An operation that has as input a set of rowIds and returns their intersection (used in accesses based on an index)
COUNTING
Counting the number of lines
FILTER
Applies a filter on a set of rows.
INDEX
UNIQUE SCAN
Finding one rowId in an index.
INDEX
RANGE SCAN
Finding one or several rowIds in an index.
MERGE JOIN
Self explanatory
NESTED LOOPS
Self explanatory
SORT
UNIQUE
Sorting for duplicate elimination
SORT
GROUP BY
Sorting for grouping
SORT
JOIN
Sorting for joining (merge-join)
SORT
ORDER BY
Sorting for ORDER BY
TABLE ACCESS
FULL
Getting all the rows of a table
TABLE ACCESS
CLUSTER
Getting rows by a search key in a clustered index
TABLE ACCESS
BY ROW ID
Getting rows by rowId
Oracle implements 3 join algorithms: Nested Loops join (when there is at least one index), Sort-Merge join and Hash join when there is no index.
Environment – Cache and page size
We will work with a buffer cache (RAM) of limited size (on purpose, in order to make Oracle’s optimizer more “creative”), for that you need to execute the following command:
alter system set db_cache_size = 400M;
The page size is 8KB. You can obtain this value (8KB is the default) by show parameter db_block_size ;
Section 1
The goal of this first section is to discover and analyze the database that will be used for studying query optimization in Oracle. You will have to extract the necessary information for a good understanding of the execution plans and related statistics.
Question 1. Fill the following table :
Tables and indexes:
Table name
Nb of records
Nb of pages (SKIP initially)
Index (type, field, clustered / unclusterd)
ARTIST
MOVIE1
MOVIE2
MOVIE3
MOVIE4
MOVIERATER
RATING
COUNTRY
ROLE
Question 2. Fill the following tables:
Section 2 – Instructions
Attributs:
Attribut name
Cardinality
MIN
MAX
MOVIE1.IDMOVIE
MOVIE1.YEAR
MOVIE1.CODECOUNTRY
For each SQL query to be executed, here are the instructions that will be used in order to understand its optimization and execution :
• RULE Mode without statistics over the tables
ALTER SESSION SET OPTIMIZER_MODE=RULE;
• CHOOSE The mode with statistics, allows to obtain a more efficient execution plan ALTER SESSION SET OPTIMIZER_MODE=CHOOSE;
(by default, we will use CHOOSE)
• CACHE You flush (empty) the cache before each execution: ALTER SYSTEM FLUSH BUFFER_CACHE;
• MOVIEi A query on table MOVIE can be executed on one of the MOVIE copies (1-4). Each time, which MOVIE copy to use be indicated in text. Example:
SELECT title, genre FROM MOVIE1 f WHERE IDMOVIE=50273 ; Overall example:
ALTER SESSION SET OPTIMIZER_MODE=RULE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
SELECT title, genre FROM MOVIE1 m WHERE IDMOVIE=50273 ;
For each query, you will complete a form like the following :
Explain by one phrase also for each query what can be observed.
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE1
RULE
INDEX RANGE SCAN + ACCESS ROWIDS
14177
14167
MOVIE1
CHOOSE
TABLE ACCESS FULL
10165
10000
Example : in the form above, in RULE mode we use the index, since it is there. However, in CHOOSE mode the optimizer understands that using the index would be too expensive, thus a full scan is done instead.
Section 3 – Simple queries
Question 3.1 On MOVIE1 and MOVIE4 :
SELECT TITLE FROM MOVIEi WHERE IDMOVIE=50273 ;
Your explanation:
Question 3.2 On MOVIE1and MOVIE4 :
SELECT COUNT(*) FROM MOVIEi WHERE IDMOVIE BETWEEN 50273 AND 60000 ;
Your explanation:
Question 3.3 On MOVIE2, with RULE, CHOOSE : SELECT TITLE FROM MOVIEi WHERE YEAR=1999 ;
Your explanation:
Question 3.4 On MOVIE2, MOVIE3 :
SELECT COUNT(*) FROM MOVIEi WHERE YEAR=1999 ;
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE1
MOVIE4
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE1
MOVIE4
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE2
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE3
Your explanation:
Question 3.5 On MOVIE2, MOVIE3, with RULE, CHOOSE : SELECT TITLE FROM MOVIEi WHERE CODECOUNTRY=’aaej’ ;
Your explanation:
Section 4 – Join queries
Question 4.1 On MOVIE2 and MOVIE3, with RULE, CHOOSE :
SELECT TITLE, LASTNAME
FROM MOVIEi M, ARTIST A
WHERE CODECOUNTRY=’aaej’ AND M.IDMES=A.IDARTIST ;
Your explanation:
Question 4.2 On MOVIE3 :
SELECT LASTNAME, COUNT(*)
FROM MOVIEi M, ARTIST A
WHERE YEAR=1999 AND M.IDMES=A.IDARTIST;
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE2
MOVIE3
MOVIE3
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE2
MOVIE3
MOVIE3
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE3
Your explanation:
Question 4.3 On MOVIE3 :
SELECT LASTNAME, COUNT(*)
FROM ARTIST
WHERE IDARTIST IN (SELECT DISTINCT IDMES
GROUP BY LASTNAME;
Your explanation:
Question 4.4. On Movie3 :
FROM MOVIEi WHERE ANNEE=1999)
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE3
SELECT COUNT(*)
FROM MOVIEi M, ROLE R, ARTIST A
WHERE M.IDMOVIE=R.IDMOVIE AND R.IDACTOR=A.IDARTIST AND
M.IDMES=A.IDARTIST AND CODECOUNTRY=’aaej’ AND YEAR=1999;
Your explanation:
Question 4.5 On MOVIE2 and MOVIE4 :
SELECT TITLE
FROM MOVIEi M, RATING R WHERE M.IDMOVIE = R.IDMOVIE GROUP BY TITLE
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE3
HAVING AVG(RATE) > 15;
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE4
Your explanation:
Question 4.6 On MOVIE2 and MOVIE4 ;
SELECT TITLE
FROM MOVIEi M, (SELECT IDMOVIE, AVG(RATE) as AVG_RATE
FROM RATING
GROUP BY IDMOVIE) N
WHERE M.IDMOVIE = N.IDMOVIE AND AVG_RATE > 15;
Your explanation:
Section 5 – No cache (OPTIONAL)
To go further, you can test the same queries but without flushing the cache.
Section 6 – Query optimization (OPTIONAL)
The following queries are not optimized. Analyze the plan shown by EXPLAIN (without cache) and suggest new equivalent queries that can have less I/Os (i.e., less consistent get/physical read).
SELECT TITLE
FROM MOVIE1
WHERE exists (SELECT IDMOVIE
FROM ROLE R
WHERE MOVIE1.IDMOVIE = R.IDMOVIE and ROLENAME LIKE ‘t%’
GROUP BY IDMOVIE HAVING count(*) > 1);
TABLE
MODE
OPERATORS
BUFFER READS
DISK READS
MOVIE2
MOVIE4
SELECT TITLE
FROM MOVIE2
WHERE IDMOVIE IN (SELECT IDMOVIE FROM RATING
GROUP BY IDMOVIE
HAVING AVG(RATE) > 10) AND CODECOUNTRY= ‘aaej’;