Lecture 6:
Arithmetic 2/3
John Owens
Introduction to Computer Architecture UC Davis EEC 170, Winter 2021
Cascading Adders
▪ Cascade Full Adders to make multibit adder:
▪ A+B=S
A B
Cin
A B
Cin
A B
Cin
A B
Cin
Cout Sum
Cout Sum
Cout Sum
Cout Sum
A3
Full Adder
B3
A2
S3
Full Adder
B2
S2
A1
Full Adder
B1
S1
A0
Full Adder
B0
S0
40
UC Davis EEC 170, Winter 2021 / © John Owens
But What About Performance? ▪ Critical path of one bitslice is CP
▪ Critical path of n-bit rippled-carry adder is n*CP
– Thought experiment:
– @ 7 nm: FO4 is ~2.5 ps
– 2 gate delays * 64b = 320 ps
– 4 GHz clock period is 250 ps
▪ Design Trick:
– Throw hardware at it
A0 B0
A1 B1
A2 B2
A3 B3
CarryIn0
1-bit ALU
CarryOut0
Result0
Result1
Result2
Result3
CarryIn1
1-bit ALU
CarryOut1
CarryIn2
1-bit ALU
CarryOut2 CarryIn3
CarryOut3
41
UC Davis EEC 170, Winter 2021 / © John Owens
1-bit ALU
Truth Table for Adder Bit Slice ▪ 3 inputs (A, B, Cin); 2 outputs (Sum, Cout)
A
B
Cin
Sum
Cout
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0=Cin
0
1
1
0
1=Cin
1
0
0
1
0=Cin
1
0
1
0
1=Cin
1
1
0
0
1
1
1
1
1
1
42 UC Davis EEC 170, Winter 2021 / © John Owens
A0 B0
A1 B1
A2 B2
A3 B3
Carry Look Ahead (Design trick: peek)
C0 = Cin
S
S
S
S
C4 = . . .
G
P
G = A and B P = A xor B WHY are these interesting?
43
UC Davis EEC 170, Winter 2021 / © John Owens
A B 0 0
0 1
1 0
1 1
Cout
0 “kill”
Cin “propagate” Cin “propagate” 1 “generate”
G P
G P
C2 = G1 + G0 · P1 + C0 · P0 · P1
C1 = G0 + C0 · P0
G P
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P
CLA vs. Ripple
C0 = Cin
S
G = A and B P = A xor B
A0 B0
CarryIn0 A0
Result0
Result1
Result2
Result3
G P
B0 A1 A1
B1
B1
A2 B2
1-bit ALU
C1 = G0 + C0 · P0
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G
P
CarryOut0 CarryIn1
S
1-bit ALU
G P
A2
A3 B3
CarryOut1 CarryIn2
S
1-bit ALU
G P
B2
A3 B3
CarryOut2 CarryIn3
1-bit ALU
CarryOut3
G P
S
C4 = . . .
44
UC Davis EEC 170, Winter 2021 / © John Owens
Cascaded Carry Look-ahead (16-bit)
C
L C0
▪ Abstraction!
▪ Good exercise to work
A
4-bit Adder
4-bit Adder
4-bit Adder
G0 P0
C1 = G0 + C0 · P0
out what G and P are for a 4-bit adder
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P
C4 = . . .
45
UC Davis EEC 170, Winter 2021 / © John Owens
Design Trick: Guess (or “Precompute”)
CP(2n) = 2*CP(n)
n-bit adder
n-bit adder
CP(2n) = CP(n) + CP(mux)
n-bit adder
1
n-bit adder
0
n-bit adder
Cout
Carry-select adder
46
UC Davis EEC 170, Winter 2021 / © John Owens
Carry Skip Adder: reduce worst case delay
B A4 B A0
P3 S P3 S
4-bit Ripple Adder
4-bit Ripple Adder
P2
P1
P0
P2
P1
P0
▪ Just speed up the slowest case for each block
▪ Exercise: optimal design uses variable block sizes (why?)
47 UC Davis EEC 170, Winter 2021 / © John Owens
Adder Lessons
▪ Reuse hardware if possible
– +/- reuse is compelling argument for 2’s complement
▪ For higher performance:
– Look for critical path, optimize for it Reorganize equations [propagate/generate / carry lookahead]
– Precompute [carry save]
– Reduce worst-case delay [carry skip]
48 UC Davis EEC 170, Winter 2021 / © John Owens
Multiply (unsigned)
▪ Paper and pencil example (unsigned):
Multiplicand 1000 Multiplier x1001 1000
0000 0000
1000 Product 01001000
▪ m bits x n bits = m+n bit product
▪ Binary makes it easy:
– 0 => place 0 ( 0 x multiplicand )
– 1 => place a copy ( 1 x multiplicand )
▪ 4 versions of multiply hardware & algorithm (see book):
– successive refinement
49 UC Davis EEC 170, Winter 2021 / © John Owens
§3.3 Multiplication
m bits x n bits = m+n bit product
How do we store a 128b result from a 64b x 64b multiply?
50 UC Davis EEC 170, Winter 2021 / © John Owens
Unsigned Combinational Multiplier
0000
A3 A2 A1 A0
A3 A2
A3 A2
A3 A2
P7 P6 P5 P4 P3 P2 P1 P0
B0 B1
B2 B3
A1
A0
A1
A0
A1
A0
51 UC Davis EEC 170, Winter 2021 / © John Owens
Unsigned Combinational Multiplier
▪ ▪
▪
At each stage shift A left (multiply it by 2)
Use next bit of B to determine whether to add in shifted multiplicand (Stage i A3 accumulates A * 2i if Bi == 1)
A3 A2
0000
A3 A2 A1 A0 A2
B0 B1
B2 B3
A1
A0
A1
A0
Accumulate 2n bit partial product at each stage
A3 A2
A1
A0
▪ Q: How much
hardware for 32 bit multiplier? Critical path?
P7 P6 P5 P4 P3 P2 P1 P0
52 UC Davis EEC 170, Winter 2021 / © John Owens
Unsigned shift-add multiplier (version 1)
▪ 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg
Multiplicand
64 bits
Shift Left
Multiplier
32 bits
Control
Shift Right
64-bit ALU
Product
64 bits
Write
Multiplier = datapath + control
53
UC Davis EEC 170, Winter 2021 / © John Owens
Multiply Algorithm V1
Product
0000 0000 0011
1:0000 0010 0011 2:0000 0010 0011 3:0000 0010 0001 1:0000 0110 0001 2:0000 0110 0001 3:0000 0110 0000
0000 0110 0000
Start
1. Test Multiplier0
Multiplier Multiplicand 0000 0010
Multiplier0 = 1
Multiplier0 = 0
0000 0010
0000 0100
0000 0100 0000 0100 0000 1000 0000 1000
0000 1000
1a. Add multiplicand to product & place the result in Product register
54
UC Davis EEC 170, Winter 2021 / © John Owens
2. Shift the Multiplicand register left 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions Done
How can we make this more efficient? ▪ Remember original combinational multiplier:
0000
A3 A2 A1 A0
A3 A2
A3 A2
A3 A2
P7 P6 P5 P4 P3 P2 P1 P0
B0 B1
B2 B3
A1
A0
A1
A0
A1
A0
55 UC Davis EEC 170, Winter 2021 / © John Owens
Simply warp to let product move right...
0000
A3 A2 A1 A0
A3 A2 A1 A0
A3 A2 A1 A0
A3 A2 A1 A0
B0
B1
B2
B3
P7 P6 P5 P4 P3 P2 P1 P0
▪ Multiplicand stays still and product moves right
56 UC Davis EEC 170, Winter 2021 / © John Owens
Multiply Hardware Version 3
▪ 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg)
- Your book has details on this
Multiplicand 32 bits
32-bit ALU
Product (Multiplier) 64 bits
Shift Right
Write
Control
57
UC Davis EEC 170, Winter 2021 / © John Owens
Faster Multiplier
▪
Uses multiple adders
- Cost/performance tradeoff
▪
Can be pipelined
- Several multiplications performed in parallel
58 UC Davis EEC 170, Winter 2021 / © John Owens
Motivation for Booth’s Algorithm
▪ ▪
Example 2 x 6 = 0010 x 0110:
0010 x 0110
+ + + +
0000 shift (0 in multiplier) 0010 add (1 in multiplier) 0010 add (1 in multiplier)
0000 shift (0 in multiplier) 0001100
59 UC Davis EEC 170, Winter 2021 / © John Owens
Motivation for Booth’s Algorithm
▪ ALU with add or subtract can get same result in more than one way:
▪ 6 = 4 + 2 = -2 + 8
0110 = -00010 + 01000 = 11110 + 01000
▪ For example:
0010 (2 [multiplier]) x 0110 (6 [multiplicand])
0000 - 0010
0000 + 0010
00001100
shift (0 in multiplier)
sub (first 1 in multpl.)
shift (mid string of 1s)
add (prior step had last 1)
60 UC Davis EEC 170, Winter 2021 / © John Owens
Booth’s Algorithm
end of run
▪ Current Bit
1 0 1 1 0 1 0 0
011110
Bit to the Right Explanation
Begins run of 1s
Middle of run of 1s
End of run of 1s
Example Op 0001111000 sub 0001111000 none 0001111000 add 0001111000 none
Middle of run of 0s
▪ Originally for speed (when shift was faster than add)
middle of run
▪ Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one
▪ Handles two’s complement!
beginning of run
61 UC Davis EEC 170, Winter 2021 / © John Owens
Booth’s Example (2 x 7)
Big picture: Treat 7 as 8 - 1
Operation
0. initial value
1a. P = P - m 1b.
2.
3.
4a. 4b.
Multiplicand 0010
Product
0000 0111 0 1110 0111 0 1111 0011 1 1111 1001 1 1111 1100 1 0001 1100 1 0000 1110 0
next?
10 -> sub
shift P (sign ext) 11 -> nop, shift 11 -> nop, shift 01 -> add
shift done
Blue + red = multiplier
(red is 2 bits to compare); Green = arithmetic ops; Black = product
62 UC Davis EEC 170, Winter 2021 / © John Owens
0010
0010
0010
0010
0010
0010
+ 1110->
+ 0010->
Booth’s Example (2 x -3)
Operation
0. initial value
1a. P = P -m 1b.
2a.
2b.
3a.
3b. 4a.
4b.
Blue + red = multiplier (red is 2 bits to compare); Green = arithmetic ops; Black = product
Multiplicand 0010
1110 + 1110 0010
0010
+ 1110
0010 0010
Product
0000 1101 0 1110 1101 0
01
next?
10 -> sub
shift P (sign ext) 01 -> add + 0010
1111 011
0001 011 0000 1011 0
01 shift P
10 -> sub shift
1110 1011 0
1111 0101 1 11 -> nop
0010
1111 0101 1 1111 1010 1
shift done
63
UC Davis EEC 170, Winter 2021 / © John Owens
Radix-4 Modified Booth’s Algorithm
Current Bits
00 01 10 11
00 01 10 11
Bit to the Right
0 0 0 0
1 1 1 1
Explanation Example Recode
Middle of zeros Single one Begins run of 1s Begins run of 1s
Ends run of 1s Ends run of 1s Isolated 0 Middle of run
00 00 00 00 00 00 00 00 01 00 00 01 11 10 00 00 01 11 11 00
00 00 11 11 00 00 01 11 11 00 00 11 10 11 00 00 11 11 11 00
0
1 -2 -1
1
2 -1 0
Same insight as one-bit
Allows multiplication 2 bits at a time.
Booth’s, simply adjust for alignment of 2 bits.
64
UC Davis EEC 170, Winter 2021 / © John Owens
RISC-V Multiplication Support
▪ Four multiply instructions: – mul: multiply
– Gives the lower 64 bits of the product – mulh: multiply high
– Gives the upper 64 bits of the product, assuming the operands are signed
– mulhu: multiply high unsigned
– Gives the upper 64 bits of the product, assuming the operands are unsigned
– mulhsu: multiply high signed/unsigned
– Gives the upper 64 bits of the product, assuming one operand is signed and the other unsigned
– Use mulh result to check for 64-bit overflow
65 UC Davis EEC 170, Winter 2021 / © John Owens
RISC-V Support for multiply
▪ “If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order andrdhcannot be the same asrs1or
rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.”
66 UC Davis EEC 170, Winter 2021 / © John Owens
Multiplication Summary
▪ Iterative algorithm ▪ Design techniques:
– Analyze hardware—what’s not in use?
– Spend more hardware to get higher performance
– Booth’s Algorithm—more general (2’s complement)
– Booth’s Algorithm—recoding is powerful technique to think about problem in a different way
– Booth’s Algorithm—more bits at once gives higher performance
67 UC Davis EEC 170, Winter 2021 / © John Owens
RISC-V Support for divide
▪ 4 instructions:
-{div, divu, rem, remu} rd, rs1, rs2
– – – –
▪ “If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr,
rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides.”
▪ Overflow and division-by-zero don’t produce errors
– Just return defined results
– Faster for the common case of no error
div: rs1 / rs2, treat as signed
divu: rs1 / rs2, treat as unsigned
rem: rs1 mod rs2, treat as signed
remu: rs1 mod rs2, treat as unsigned
72 UC Davis EEC 170, Winter 2021 / © John Owens
MIPS Support for multiply/divide
▪ Rather than target the general-purpose registers:
– –
hi
mul placed its output into two specialhiandloregisters div placed its divide output intoloand its rem output into
–
general-purpose register)
MIPS providedmfloandmfhiinstructions (destination:
73 UC Davis EEC 170, Winter 2021 / © John Owens
Divide: Paper & Pencil
1001 Quotient Divisor 1000 1001010 Dividend
–100100
101
1010
–1000
10 Remainder
(or Modulo result)
▪
▪ ▪
See how big a number can be subtracted, creating quotient bit on each step
– Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder 3 versions of divide, successive refinement
74 UC Davis EEC 170, Winter 2021 / © John Owens
Divide Hardware Version 1
▪ 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg
Divisor
64 bits
Shift Right
64-bit ALU
Remainder
64 bits
Write
Quotient
32 bits
Control
Shift Left
75
UC Davis EEC 170, Winter 2021 / © John Owens
Divide Algorithm V1
▪ Takes n+1 steps for n-bit Quotient & Rem.
Start: Place Dividend in Remainder
▪ Remainder Quotient 0000 0111 0000
Divisor 0010 0000
Remainder ≥ 0
1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.
Test Remainder
Remainder < 0
2a. Shift the Quotient register to the left setting the new rightmost
bit to 1.
2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting
the new least significant bit to 0.
“restoring” division
3. Shift the Divisor register right1 bit.
Yes: n+1 repetitions (n = 4 here)
Done
n+1 repetition?
No: < n+1 repetitions
76
UC Davis EEC 170, Winter 2021 / © John Owens
Divide Algorithm I example (7 / 2)
Remainder Quotient Divisor
0000 0111 1:1110 0111 00000
2:0000 0111 0000 3:0000 0111 0000 1:1111 0111 0000
2:0000 0111 000
000000010 0000 0010 0000
0010 0000
0001 0000
0001 0000
0001 0000
0000 1000
0000 1000
0000 1000
0000 0100
0000 0100
0000 0100
0000 0010
0000 0010
0000 0010
0000 0001
Answer: Quotient = 3 Remainder = 1
0
0 0 0
00
0
3:0000 0111 000
1:11111111 000
2:0000 0111 00
00
00
0
3:0000 0111 00
000
1:0000 0011 00
000 0001
2:0000 0011 0
3:0000 0011 0
0001
0001 2:0000 0001 00011
3:0000 0001 00011
1:0000 0001 0
77
UC Davis EEC 170, Winter 2021 / © John Owens
Divide Hardware Version 3
▪ 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg)
Divisor
32 bits
32-bit ALU
“HI” “LO”
Shift Left
Write
Remainder
(Quotient)
Control
64 bits
78
UC Davis EEC 170, Winter 2021 / © John Owens
Final Multiply / Divide Hardware
Multiplicand
32 bits 32-bit ALU
64 bits
32 bits
32-bit ALU
“HI”
Shift Right
Write
Control
Product
(Multiplier)
Divisor
“LO”
Shift Left
Write
Remainder
(Quotient)
Control
64 bits
79
UC Davis EEC 170, Winter 2021 / © John Owens
SRT Division
D. Sweeney of IBM, J.E. Robertson of the University of Illinois, and T.D. Tocher of Imperial College, London
Current
Remainder
P-D (Partial Divisor) Plot
9|9 43211111 8|8 42211110 7|7 32111100 6|6 32111000 5 | 5 2 1 1 1 0 00 0 4 | 4 2 1 1 0 0 00 0 3|3 11000000 2|2 10000000 1|1 00000000 0 +------------------
0 1 2 3 4 5 6 78 9
80 UC Davis EEC 170, Winter 2021 / © John Owens
SRT Division
▪ Intel Pentium divide implementation: SRT division with 2 bits/iteration (radix 4)
▪ Allows negative entries
▪ 1066 entries in lookup table
81
UC Davis EEC 170, Winter 2021 / © John Owens
[http://members.cox.net/srice1/ pentbug/introduction.html]
Faster Division
▪ Can’t use parallel hardware as in multiplier
- Subtraction is conditional on sign of remainder
▪ Faster dividers (e.g. SRT division) generate multiple quotient bits per step
- Still require multiple steps
82 UC Davis EEC 170, Winter 2021 / © John Owens
Division Lessons
▪ In practice, slower than multiplication
- Also less frequent
- But, in the simple case, can use same hardware!
▪ Generates quotient and remainder together
▪ Floating-point division faster than integer division (why?) ▪ Similar hardware lessons as multiplier:
- Look for unused hardware
- Can process multiple bits at once at cost of extra hardware
83 UC Davis EEC 170, Winter 2021 / © John Owens
RISC-V logical instructions
Instruction Meaning Pseudocode
XORI rd,rs1,imm
SRAI rd,rs1,imm
Exclusive Or Immediate
rd ← ux(rs1) ⊕ ux(imm)
ORI rd,rs1,imm
Or Immediate
rd ← ux(rs1) ∨ ux(imm)
ANDI rd,rs1,imm
SLLI rd,rs1,imm
And Immediate
rd ← ux(rs1) ∧ ux(imm)
Shift Left Logical Immediate
rd ← ux(rs1) « ux(imm)
SRLI rd,rs1,imm
Shift Right Logical Immediate
rd ← ux(rs1) » ux(imm)
SLL rd,rs1,rs2
SRL rd,rs1,rs2
Shift Right Arithmetic Immediate
rd ← sx(rs1) » ux(imm)
Shift Left Logical
rd ← ux(rs1) « rs2
XOR rd,rs1,rs2
Exclusive Or
rd ← ux(rs1) ⊕ ux(rs2)
Shift Right Logical
rd ← ux(rs1) » rs2
SRA rd,rs1,rs2
OR rd,rs1,rs2
Shift Right Arithmetic
rd ← sx(rs1) » rs2
Or
rd ← ux(rs1) ∨ ux(rs2)
AND rd,rs1,rs2
And
rd ← ux(rs1) ∧ ux(rs2)
84 UC Davis EEC 170, Winter 2021 / © John Owens
Manipulating Bits with Shift, Or, And
▪
▪ ▪ ▪ ▪
▪
Mask out exponent with and
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
& 0 11111111 00000000000000000000000
------------------------------------
0 EEEEEEEE 00000000000000000000000
Right shift 23 bits to get
000000000000000000000000 EEEEEEEE
Do arithmetic manipulation
000000000000000000000000 ENEWENEW
Left shift 23 bits to get
0 ENEWENEW 00000000000000000000000
Zero out old exponent
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
& 1 00000000 11111111111111111111111
------------------------------------
S 00000000 MMMMMMMMMMMMMMMMMMMMMMM
Or in new exponent
S 00000000 MMMMMMMMMMMMMMMMMMMMMMM
| 0 ENEWENEW 00000000000000000000000
------------------------------------
S ENEWENEW MMMMMMMMMMMMMMMMMMMMMMM
85 UC Davis EEC 170, Winter 2021 / © John Owens
Shift Operations
▪ Arithmetic operation:
- Example: 00011 << 2 [3 left shift 2]
- 00011 << 2 = 01100 = 12 = 2 * 4
- Each bit shifted left == multiply by two
- Example: 01010 >> 1 [10 right shift 1]
– 01010 >> 1 = 00101 = 5 = 10/2
– Each bit shifted right == divide by two
– Why?
– Compilers do this—“strength reduction”
86 UC Davis EEC 170, Winter 2021 / © John Owens
Shift Operations
▪
▪
With left shift, what do we shift in?
– 00011 << 2 = 01100 (arithmetic)
- 0000XXXX << 4 = XXXX0000 (logical)
- We shifted in zeroes
How about right shift?
- XXXX0000 >> 4 = 0000XXXX (logical) – Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3) (arithmetic) – Shifted in zero
– 11110 (= -2) >> 1 = 11111 (-1) (arithmetic) – Shifted in one
87 UC Davis EEC 170, Winter 2021 / © John Owens
Shift Operations
▪ How about right shift?
– XXXX0000 >> 4 = 0000XXXX: Logical shift
– Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3)
11110 (= -2) >> 1 = 11111 (-1): Arithmetic shift
– Shifted in sign bit
▪ RISC-V supports both logical and arithmetic:
– slli, srai, srli: Shift amount taken from within instruction (“imm”)
6 bits 6 bits 5 bits 3 bits 5 bits
7 bits 5 bits 5 bits 3 bits 5 bits
– sll, sra, srl: shift amount taken from register (“variable”)
– How far can we shift with slli/srai/slli? With sll/sra/srl?
7 bits
7 bits
funct6
shamt
rs1
funct3
rd
opcode
funct7
rs2
rs1
funct3
rd
opcode
88
UC Davis EEC 170, Winter 2021 / © John Owens
Combinational Shifter from MUXes
Basic Building Block
AB sel
D
8-bit right shifter
A7 A6 A5 A4 A3 A2 A1 A0
S2 S1 S0
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
▪ What comes in the MSBs?
▪ How many levels for 64-bit shifter? ▪ What if we use 4-1 Muxes ?
R7 R6 R5 R4 R3 R2 R1 R0
89 UC Davis EEC 170, Winter 2021 / © John Owens
10
General Shift Right Scheme using 16 bit example
S0 (0,1)
S1 (0,2)
S2 (0,4)
S3 (0,8)
▪ If we added right-to-left connections, we could supportROTATE (not in RISC-V but found in other ISAs)
90 UC Davis EEC 170, Winter 2021 / © John Owens
Funnel Shifter ▪ Shift A by i bits
▪ Problem: Set Y, X, sa ▪ Logical:
▪ Arithmetic: ▪ Rotate:
▪ Left shifts:
Extract 64 bits of 128 (sa selects which bits)
| sa |
Y
X
Y
64
X
64
R
Shift Right
91
UC Davis EEC 170, Winter 2021 / © John Owens
64 R
Barrel Shifter
▪ Technology-dependent solutions: transistor per switch
in6
in5
in4
out3
out2
out1
out0
SR3
SR2 SR1 SR0
in3
in2 in1 in0
92
UC Davis EEC 170, Winter 2021 / © John Owens
Shifter Summary
▪ Shifts common in logical ops, also in arithmetic ▪ RISC-V has:
– 2 flavors of shift: logical and arithmetic
– 2 directions of shift: right and left
– 2 sources for shift amount: immediate, variable
▪ Lots of cool shift algorithms, but …
– Barrel shifter prevalent in today’s hardware
93 UC Davis EEC 170, Winter 2021 / © John Owens
Floating Point
▪ Representation for non-integral numbers
– Including very small and very large numbers
▪ Like scientific notation
– –2.34 × 1056
– +0.002 × 10–4
– +987.02 × 109 ▪ In binary
– ±1.xxxxxxx2 × 2yyyy
▪ Typesfloatanddoublein C
normalized
not normalized
94
UC Davis EEC 170, Winter 2021 / © John Owens
§3.5 Floating Point
Floating Point Standard
▪ Defined by IEEE Std 754-1985
▪ Developed in response to divergence of representations
– Portability issues for scientific code ▪ Now almost universally adopted
▪ Two representations
– Single precision (32-bit)
– Double precision (64-bit)
95 UC Davis EEC 170, Winter 2021 / © John Owens
96 UC Davis EEC 170, Winter 2021 / © John Owens
Floating-point Formats
▪ Single-precision (32 bits)
▪ Double precision (64 bits)
97 UC Davis EEC 170, Winter 2021 / © John Owens
IEEE Floating-Point Format
x = (−1)S ×(1+Fraction)×2(Exponent−Bias)
▪ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
▪ Normalize significand: 1.0 ≤ |significand| < 2.0
- Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
- Significand is Fraction with the “1.” restored
Fraction: single: 23 bits double: 52 bits
▪ Exponent: excess representation: actual exponent + Bias
- Ensures exponent is unsigned
- Single: Bias = 127; Double: Bias = 1023
98 UC Davis EEC 170, Winter 2021 / © John Owens
Exponent: single: 8 bits double: 11 bits
Single-Precision Range
▪ Exponents 00000000 and 11111111 reserved
▪ Smallest value
- Exponent: 00000001
⇒ actual exponent = 1 – 127 = –126
- Fraction: 000...00 ⇒ significand = 1.0
- ±1.0 × 2–126 ≈ ±1.2 × 10–38
▪ Largest value
- exponent: 11111110
⇒ actual exponent = 254 – 127 = +127
- Fraction: 111...11 ⇒ significand ≈ 2.0
- ±2.0 × 2+127 ≈ ±3.4 × 10+38
99
UC Davis EEC 170, Winter 2021 / © John Owens
Double-Precision Range
▪ Exponents 0000...00 and 1111...11 reserved
▪ Smallest value
- Exponent: 00000000001
⇒ actual exponent = 1 – 1023 = –1022
- Fraction: 000...00 ⇒ significand = 1.0
- ±1.0 × 2–1022 ≈ ±2.2 × 10–308
▪ Largest value
- Exponent: 11111111110
⇒ actual exponent = 2046 – 1023 = +1023
- Fraction: 111...11 ⇒ significand ≈ 2.0
- ±2.0 × 2+1023 ≈ ±1.8 × 10+308
100
UC Davis EEC 170, Winter 2021 / © John Owens
Floating-Point Precision
▪ Relative precision
- all fraction bits are significant
- Single: approx 2–23
- Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of
precision
- Double: approx 2–52
- Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision
101
UC Davis EEC 170, Winter 2021 / © John Owens
Floating-Point Example
▪ Represent –0.75 = –(0.5 + 0.25) = –(1.0 + 0.5) / 2
- -
–0.75 = (–1)1 × 1.12 × 2–1 - S =1
Fraction = 1000...002 - Exponent = –1 + Bias
- Single: –1 + 127 = 126 = 011111102
- Double: –1 + 1023 = 1022 = 011111111102
▪ Single: 1011111101000...00
▪ Double: 1011111111101000...00
102
UC Davis EEC 170, Winter 2021 / © John Owens
Floating-Point Example
▪ What number is represented by the single-precision float 11000000101000...00
- S =1
-
-
▪
Fraction = 01000...002
Fxponent = 100000012 = 129 x = (–1)1 × (1 + .012) × 2(129 – 127)
= (–1) × 1.25 × 22 = –5.0
103
UC Davis EEC 170, Winter 2021 / © John Owens
Denormal Numbers
▪ Exponent = 000...0 ⇒ hidden bit is 0
x=(−1)S ×(0+Fraction)×2−Bias
▪ Smaller than normal numbers
- allow for gradual underflow, with diminishing precision
▪ Denormal with fraction = 000...0
x=(−1)S ×(0+0)×2−Bias =±0.0
Two representations of 0.0!
104
UC Davis EEC 170, Winter 2021 / © John Owens
Infinities and NaNs
▪ Exponent = 111...1, Fraction = 000...0
- ±Infinity
- Can be used in subsequent calculations, avoiding need for overflow check
▪ Exponent = 111...1, Fraction ≠ 000...0
- Not-a-Number (NaN)
- Indicates illegal or undefined result
- e.g., 0.0 / 0.0
- Can be used in subsequent calculations
105
UC Davis EEC 170, Winter 2021 / © John Owens
Half-precision Floating Point
https://en.wikipedia.org/wiki/Bfloat16_floating-point_format 106 UC Davis EEC 170, Winter 2021 / © John Owens