2021/4/28 Query Cost Estimation
Query Cost Estimation
Query Cost Estimation Estimating Projection Result Size Estimating Selection Result Size Estimating Join Result Size
Cost Estimation: Postscript
>>
COMP9315 21T1 ♢ Cost Estimation ♢ [0/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
1/11
2021/4/28 Query Cost Estimation
∧ >>
❖ Query Cost Estimation
Without executing a plan, cannot always know its
precise cost.
Thus, query optimisers estimate costs via:
costofperformingoperation (dealtwithinearlier lectures)
sizeofresult (whichaffectscostofperformingnext operation)
Result size estimated by statistical measures on relations, e.g.
rS
cardinality of relation S
RS
V(A,S) # distinct values of attribute A min(A,S) minvalueofattributeA max(A,S) max value of attribute A
COMP9315 21T1 ♢ Cost Estimation ♢ [1/9]
avg size of tuple in relation S
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
2/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Projection Result Size
Straightforward, since we know: number of tuples in output
rout=|πa,b,..(T)|=|T|=rT (inSQL,becauseofbag semantics)
size of tuples in output
Rout = sizeof(a) + sizeof(b) + … + tuple-overhead
AssumepagesizeB, bout=ceil(rT/cout), where cout = oor(B/Rout)
Ifusingselect distinct…
| πa,b,..(T) | depends on proportion of duplicates
produced
COMP9315 21T1 ♢ Cost Estimation ♢ [2/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
3/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Selection Result Size
Selectivity = fraction of tuples expected to satisfy a condition.
Common assumption: attribute values uniformly distributed.
Example: Consider the query
select * from Parts where colour=’Red’
If V(colour,Parts)=4, r=1000 ⇒ | σcolour=red(Parts)|=250
In general, | σA=c(R) | ≅ rR / V(A,R)
Heuristic used by PostgreSQL: | σA=c(R) | ≅ r/10 COMP9315 21T1 ♢ Cost Estimation ♢ [3/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
4/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Selection Result Size
(cont)
Estimating size of result for e.g.
select * from Enrolment where year > 2015;
Could estimate by using:
uniformdistributionassumption, r, min/max years
Assume: min(year)=2010, max(year)=2019, |Enrolment|=105
105 from 2010-2019 means approx 10000 enrolments/year
this suggests 40000 enrolments since 2016
Heuristicusedbysomesystems: |σA>c(R)|≅r/3 COMP9315 21T1 ♢ Cost Estimation ♢ [4/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
5/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Selection Result Size
(cont)
Estimating size of result for e.g.
select * from Enrolment where course <> ‘COMP9315’;
Could estimate by using: uniformdistributionassumption, r, domainsize
e.g.|V(course,Enrolment)|=2000, |σA<>c(E)|=r* 1999/2000
Heuristicusedbysomesystems: |σA<>c(R)|≅r COMP9315 21T1 ♢ Cost Estimation ♢ [5/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
6/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Selection Result Size
(cont)
How to handle non-uniform attribute value distributions?
collect statistics about the values stored in the attribute/relation
store these as e.g. a histogram in the meta-data for the relation
So, for part colour example, might have distribution like:
White:35% Red:30% Blue:25% Silver:10% Use histogram as basis for determining # selected
tuples.
Disadvantage: cost of storing/maintaining histograms.
COMP9315 21T1 ♢ Cost Estimation ♢ [6/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
7/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Selection Result Size
(cont)
Summary: analysis relies on operation and data distribution:
E.g.select * from R where a = k; Case1: uniq(R.a) ⇒ 0or1result
Case2: rR tuples&&size(dom(R.a))=n ⇒ rR/n results
E.g.select * from R where a < k;
Case1: k≤min(R.a) ⇒ 0results
Case2: k>max(R.a) ⇒ ≅rRresults
Case3: size(dom(R.a))=n ⇒ ?min(R.a)…k… max(R.a) ?
COMP9315 21T1 ♢ Cost Estimation ♢ [7/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
8/11
2021/4/28 Query Cost Estimation
<< ∧ >> ❖ Estimating Join Result Size
Analysis relies on semantic knowledge about data/relations.
Consider equijoin on common attr: R ⨝a S
Case1: values(R.a)∩values(S.a)={} ⇒ size(R⨝a
S)=0
Case 2: uniq(R.a) and uniq(S.a) ⇒ size(R ⨝a S) ≤ min(|R|, |S|)
Case 3: pkey(R.a) and fkey(S.a) ⇒ size(R ⨝a S) ≤ |S|
COMP9315 21T1 ♢ Cost Estimation ♢ [8/9]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
9/11
2021/4/28 Query Cost Estimation
❖ Cost Estimation: Postscript Inaccurate cost estimation can lead to poor
evaluation plans.
Above methods can (sometimes) give inaccurate estimates.
To get more accurate cost estimates:
more time … complex computation of selectivity
more space … storage for histograms of data values
Either way, optimisation process costs more (more than query?)
Trade-off between optimiser performance and query performance.
COMP9315 21T1 ♢ Cost Estimation ♢ [9/9]
<< ∧
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
10/11
2021/4/28 Query Cost Estimation
Produced: 5 Apr 2021
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/qry-cost-est/slides.html
11/11