BPC
1
Hybrid TAGE & Perceptron
Branch Predictor
Zhenyu Wu
TAGE Predictor
2
Prediction Computation
n Base predictor 𝑇”
q PC-indexed 3-bit saturating counter
q Giving default prediction
n Tagged predictor 𝑇#(1 ≤ 𝑖 ≤ 4)
q 𝑇# are indexed using a geometric series of history length {𝐿 𝑖 =
(𝑖𝑛𝑡)(𝛼#01 ∗ 𝐿 1 + 0.5)}
q 11-bit tag, 2-bit unsigned useful counter u, 3-bit signed counter pred
q Giving prediction on a tag match
q Provider component & altpred
3
Updating Policy
n Update the useful counter u
q u is updated when altpred is different from final pred
q Increment if pred is correct, decrement otherwise
q Reset in period of 256K branches
n Update the pred counter of the provider component on a
correct prediction
n The overall prediction is incorrect
q Update the pred counter of the provider component 𝑇#
q If 𝑖 < 𝑀, allocate an entry on a predictor component 𝑇:(𝑖 < 𝑘 <
𝑀)
q Read 𝑀 − 𝑖 − 1 𝑢> from 𝑇>(𝑖 < 𝑗 < 𝑀)
4
Updating Policy (Cont.)
n Rules for new entry allocation
q Priority for allocation
n If exits k, such that 𝑢: = 0, then 𝑇: is allocated
n Else the u counters from the components 𝑇> (𝑖 < 𝑗 < 𝑀) are all decremented
q Avoiding ping-phenomenon
n If 𝑇> & 𝑇: can be allocated, then 𝑇> is chosen with higher probability.
q Initializing the allocated entry
n pred counter set to weak correct
n u useful counter set to strongly not useful
5
Perceptron Predictor
6
Prediction Computation
n A perceptron is represented by a vector of signed integer
weights (𝑤”..A)
q 𝑤” serves as bias
n The input is the global history record (𝑥1..A)
q 𝑥” is always set to 1, providing a bias input
q 𝑥# is either -1 (NT) or +1 (T)
n The output y of the perceptron is computed as
q 𝑦 = 𝑤” + ∑ 𝑥#𝑤#A#E1
q Predict to take if 𝑦 ≥ 0, not to take if 𝑦 < 0
7
Updating Policy
n Using the following algorithm to train the perceptron
q 𝜃 is the threshold parameter to decide when enough training has
been done
q 𝜃 = 1.93ℎ + 14
8
Combining Branch Predictors
9
How to combine TAGE with
Perceptron to make better prediction?
n The combined predictor contains 2 predictors: TAGE &
Perceptron
n Using a 2-bit saturating counter to select better predictor
n Each counter keeps track of which predictor is more
accurate for the shared branches
10
Storage Computation
n Perceptron
q 512 perceptron
q 8-bit unsigned integer weight
q 64 weights (1 for bias) per perceptron
q 512×8×64
𝑏𝑖𝑡𝑠 = 32𝐾𝐵
n TAGE
q 𝑇":21T×3
𝑏𝑖𝑡𝑠 = 3KB
q 𝑇1:21W× 5+ 11
𝑏𝑖𝑡𝑠 = 8𝐾𝐵
q 𝑇W:21W× 5 + 10
𝑏𝑖𝑡𝑠 = 7.5𝐾𝐵
q 𝑇T:21W× 5 + 9
𝑏𝑖𝑡𝑠 = 7𝐾𝐵
q 𝑇T:21W× 5 + 8
𝑏𝑖𝑡𝑠 = 6.5𝐾𝐵
q 3 + 8 + 7.5 + 7 + 6.5 = 32KB
n Combining 32 + 32 = 64𝐾𝐵
11