CS代考计算机代写 computer architecture compiler scheme mips RISC-V algorithm Lecture 6:

Lecture 6:
Arithmetic 2/3
John Owens
Introduction to Computer Architecture UC Davis EEC 170, Winter 2021

Cascading Adders
▪ Cascade Full Adders to make multibit adder:
▪ A+B=S
A B
Cin
A B
Cin
A B
Cin
A B
Cin
Cout Sum
Cout Sum
Cout Sum
Cout Sum
A3
Full Adder
B3
A2
S3
Full Adder
B2
S2
A1
Full Adder
B1
S1
A0
Full Adder
B0
S0
40
UC Davis EEC 170, Winter 2021 / © John Owens

But What About Performance? ▪ Critical path of one bitslice is CP
▪ Critical path of n-bit rippled-carry adder is n*CP
– Thought experiment:
– @ 7 nm: FO4 is ~2.5 ps
– 2 gate delays * 64b = 320 ps
– 4 GHz clock period is 250 ps
▪ Design Trick:
– Throw hardware at it
A0 B0
A1 B1
A2 B2
A3 B3
CarryIn0
1-bit ALU
CarryOut0
Result0
Result1
Result2
Result3
CarryIn1
1-bit ALU
CarryOut1
CarryIn2
1-bit ALU
CarryOut2 CarryIn3
CarryOut3
41
UC Davis EEC 170, Winter 2021 / © John Owens
1-bit ALU

Truth Table for Adder Bit Slice ▪ 3 inputs (A, B, Cin); 2 outputs (Sum, Cout)
A
B
Cin
Sum
Cout
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0=Cin
0
1
1
0
1=Cin
1
0
0
1
0=Cin
1
0
1
0
1=Cin
1
1
0
0
1
1
1
1
1
1
42 UC Davis EEC 170, Winter 2021 / © John Owens

A0 B0
A1 B1
A2 B2
A3 B3
Carry Look Ahead (Design trick: peek)
C0 = Cin
S
S
S
S
C4 = . . .
G
P
G = A and B P = A xor B WHY are these interesting?
43
UC Davis EEC 170, Winter 2021 / © John Owens
A B 0 0
0 1
1 0
1 1
Cout
0 “kill”
Cin “propagate” Cin “propagate” 1 “generate”
G P
G P
C2 = G1 + G0 · P1 + C0 · P0 · P1
C1 = G0 + C0 · P0
G P
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P

CLA vs. Ripple
C0 = Cin
S
G = A and B P = A xor B
A0 B0
CarryIn0 A0
Result0
Result1
Result2
Result3
G P
B0 A1 A1
B1
B1
A2 B2
1-bit ALU
C1 = G0 + C0 · P0
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G
P
CarryOut0 CarryIn1
S
1-bit ALU
G P
A2
A3 B3
CarryOut1 CarryIn2
S
1-bit ALU
G P
B2
A3 B3
CarryOut2 CarryIn3
1-bit ALU
CarryOut3
G P
S
C4 = . . .
44
UC Davis EEC 170, Winter 2021 / © John Owens

Cascaded Carry Look-ahead (16-bit)
C
L C0
▪ Abstraction!
▪ Good exercise to work
A
4-bit Adder
4-bit Adder
4-bit Adder
G0 P0
C1 = G0 + C0 · P0
out what G and P are for a 4-bit adder
C2 = G1 + G0 · P1 + C0 · P0 · P1
C3 = G2 + G1 · P2 + G0 · P1 · P2 + C0 · P0 · P1 · P2
G P
C4 = . . .
45
UC Davis EEC 170, Winter 2021 / © John Owens

Design Trick: Guess (or “Precompute”)
CP(2n) = 2*CP(n)
n-bit adder
n-bit adder
CP(2n) = CP(n) + CP(mux)
n-bit adder
1
n-bit adder
0
n-bit adder
Cout
Carry-select adder
46
UC Davis EEC 170, Winter 2021 / © John Owens

Carry Skip Adder: reduce worst case delay
B A4 B A0
P3 S P3 S
4-bit Ripple Adder
4-bit Ripple Adder
P2
P1
P0
P2
P1
P0
▪ Just speed up the slowest case for each block
▪ Exercise: optimal design uses variable block sizes (why?)
47 UC Davis EEC 170, Winter 2021 / © John Owens

Adder Lessons
▪ Reuse hardware if possible
– +/- reuse is compelling argument for 2’s complement
▪ For higher performance:
– Look for critical path, optimize for it Reorganize equations [propagate/generate / carry lookahead]
– Precompute [carry save]
– Reduce worst-case delay [carry skip]
48 UC Davis EEC 170, Winter 2021 / © John Owens

Multiply (unsigned)
▪ Paper and pencil example (unsigned):
Multiplicand 1000 Multiplier x1001 1000
0000 0000
1000 Product 01001000
▪ m bits x n bits = m+n bit product
▪ Binary makes it easy:
– 0 => place 0 ( 0 x multiplicand )
– 1 => place a copy ( 1 x multiplicand )
▪ 4 versions of multiply hardware & algorithm (see book):
– successive refinement
49 UC Davis EEC 170, Winter 2021 / © John Owens
§3.3 Multiplication

m bits x n bits = m+n bit product
How do we store a 128b result from a 64b x 64b multiply?
50 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned Combinational Multiplier
0000
A3 A2 A1 A0
A3 A2
A3 A2
A3 A2
P7 P6 P5 P4 P3 P2 P1 P0
B0 B1
B2 B3
A1
A0
A1
A0
A1
A0
51 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned Combinational Multiplier
▪ ▪

At each stage shift A left (multiply it by 2)
Use next bit of B to determine whether to add in shifted multiplicand (Stage i A3 accumulates A * 2i if Bi == 1)
A3 A2
0000
A3 A2 A1 A0 A2
B0 B1
B2 B3
A1
A0
A1
A0
Accumulate 2n bit partial product at each stage
A3 A2
A1
A0
▪ Q: How much
hardware for 32 bit multiplier? Critical path?
P7 P6 P5 P4 P3 P2 P1 P0
52 UC Davis EEC 170, Winter 2021 / © John Owens

Unsigned shift-add multiplier (version 1)
▪ 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg
Multiplicand
64 bits
Shift Left
Multiplier
32 bits
Control
Shift Right
64-bit ALU
Product
64 bits
Write
Multiplier = datapath + control
53
UC Davis EEC 170, Winter 2021 / © John Owens

Multiply Algorithm V1
Product
0000 0000 0011
1:0000 0010 0011 2:0000 0010 0011 3:0000 0010 0001 1:0000 0110 0001 2:0000 0110 0001 3:0000 0110 0000
0000 0110 0000
Start
1. Test Multiplier0
Multiplier Multiplicand 0000 0010
Multiplier0 = 1
Multiplier0 = 0
0000 0010
0000 0100
0000 0100 0000 0100 0000 1000 0000 1000
0000 1000
1a. Add multiplicand to product & place the result in Product register
54
UC Davis EEC 170, Winter 2021 / © John Owens
2. Shift the Multiplicand register left 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No: < 32 repetitions Yes: 32 repetitions Done How can we make this more efficient? ▪ Remember original combinational multiplier: 0000 A3 A2 A1 A0 A3 A2 A3 A2 A3 A2 P7 P6 P5 P4 P3 P2 P1 P0 B0 B1 B2 B3 A1 A0 A1 A0 A1 A0 55 UC Davis EEC 170, Winter 2021 / © John Owens Simply warp to let product move right... 0000 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 A3 A2 A1 A0 B0 B1 B2 B3 P7 P6 P5 P4 P3 P2 P1 P0 ▪ Multiplicand stays still and product moves right 56 UC Davis EEC 170, Winter 2021 / © John Owens Multiply Hardware Version 3 ▪ 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg) - Your book has details on this Multiplicand 32 bits 32-bit ALU Product (Multiplier) 64 bits Shift Right Write Control 57 UC Davis EEC 170, Winter 2021 / © John Owens Faster Multiplier ▪ Uses multiple adders - Cost/performance tradeoff ▪ Can be pipelined - Several multiplications performed in parallel 58 UC Davis EEC 170, Winter 2021 / © John Owens Motivation for Booth’s Algorithm ▪ ▪ Example 2 x 6 = 0010 x 0110: 0010 x 0110 + + + + 0000 shift (0 in multiplier) 0010 add (1 in multiplier) 0010 add (1 in multiplier) 0000 shift (0 in multiplier) 0001100 59 UC Davis EEC 170, Winter 2021 / © John Owens Motivation for Booth’s Algorithm ▪ ALU with add or subtract can get same result in more than one way: ▪ 6 = 4 + 2 = -2 + 8 0110 = -00010 + 01000 = 11110 + 01000 ▪ For example: 0010 (2 [multiplier]) x 0110 (6 [multiplicand]) 0000 - 0010 0000 + 0010 00001100 shift (0 in multiplier) sub (first 1 in multpl.) shift (mid string of 1s) add (prior step had last 1) 60 UC Davis EEC 170, Winter 2021 / © John Owens Booth’s Algorithm end of run ▪ Current Bit 1 0 1 1 0 1 0 0 011110 Bit to the Right Explanation Begins run of 1s Middle of run of 1s End of run of 1s Example Op 0001111000 sub 0001111000 none 0001111000 add 0001111000 none Middle of run of 0s ▪ Originally for speed (when shift was faster than add) middle of run ▪ Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one ▪ Handles two’s complement! beginning of run 61 UC Davis EEC 170, Winter 2021 / © John Owens Booth’s Example (2 x 7) Big picture: Treat 7 as 8 - 1 Operation 0. initial value 1a. P = P - m 1b. 2. 3. 4a. 4b. Multiplicand 0010 Product 0000 0111 0 1110 0111 0 1111 0011 1 1111 1001 1 1111 1100 1 0001 1100 1 0000 1110 0 next? 10 -> sub
shift P (sign ext) 11 -> nop, shift 11 -> nop, shift 01 -> add
shift done
Blue + red = multiplier
(red is 2 bits to compare); Green = arithmetic ops; Black = product
62 UC Davis EEC 170, Winter 2021 / © John Owens
0010
0010
0010
0010
0010
0010
+ 1110->
+ 0010->

Booth’s Example (2 x -3)
Operation
0. initial value
1a. P = P -m 1b.
2a.
2b.
3a.
3b. 4a.
4b.
Blue + red = multiplier (red is 2 bits to compare); Green = arithmetic ops; Black = product
Multiplicand 0010
1110 + 1110 0010
0010
+ 1110
0010 0010
Product
0000 1101 0 1110 1101 0
01
next?
10 -> sub
shift P (sign ext) 01 -> add + 0010
1111 011
0001 011 0000 1011 0
01 shift P
10 -> sub shift
1110 1011 0
1111 0101 1 11 -> nop
0010
1111 0101 1 1111 1010 1
shift done
63
UC Davis EEC 170, Winter 2021 / © John Owens

Radix-4 Modified Booth’s Algorithm
Current Bits
00 01 10 11
00 01 10 11
Bit to the Right
0 0 0 0
1 1 1 1
Explanation Example Recode
Middle of zeros Single one Begins run of 1s Begins run of 1s
Ends run of 1s Ends run of 1s Isolated 0 Middle of run
00 00 00 00 00 00 00 00 01 00 00 01 11 10 00 00 01 11 11 00
00 00 11 11 00 00 01 11 11 00 00 11 10 11 00 00 11 11 11 00
0
1 -2 -1
1
2 -1 0
Same insight as one-bit
Allows multiplication 2 bits at a time.
Booth’s, simply adjust for alignment of 2 bits.
64
UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Multiplication Support
▪ Four multiply instructions: – mul: multiply
– Gives the lower 64 bits of the product – mulh: multiply high
– Gives the upper 64 bits of the product, assuming the operands are signed
– mulhu: multiply high unsigned
– Gives the upper 64 bits of the product, assuming the operands are unsigned
– mulhsu: multiply high signed/unsigned
– Gives the upper 64 bits of the product, assuming one operand is signed and the other unsigned
– Use mulh result to check for 64-bit overflow
65 UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Support for multiply
▪ “If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order andrdhcannot be the same asrs1or
rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies.”
66 UC Davis EEC 170, Winter 2021 / © John Owens

Multiplication Summary
▪ Iterative algorithm ▪ Design techniques:
– Analyze hardware—what’s not in use?
– Spend more hardware to get higher performance
– Booth’s Algorithm—more general (2’s complement)
– Booth’s Algorithm—recoding is powerful technique to think about problem in a different way
– Booth’s Algorithm—more bits at once gives higher performance
67 UC Davis EEC 170, Winter 2021 / © John Owens

RISC-V Support for divide
▪ 4 instructions:
-{div, divu, rem, remu} rd, rs1, rs2
– – – –
▪ “If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U] rdr,
rs1, rs2 (rdq cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single divide operation instead of performing two separate divides.”
▪ Overflow and division-by-zero don’t produce errors
– Just return defined results
– Faster for the common case of no error
div: rs1 / rs2, treat as signed
divu: rs1 / rs2, treat as unsigned
rem: rs1 mod rs2, treat as signed
remu: rs1 mod rs2, treat as unsigned
72 UC Davis EEC 170, Winter 2021 / © John Owens

MIPS Support for multiply/divide
▪ Rather than target the general-purpose registers:
– –
hi
mul placed its output into two specialhiandloregisters div placed its divide output intoloand its rem output into

general-purpose register)
MIPS providedmfloandmfhiinstructions (destination:
73 UC Davis EEC 170, Winter 2021 / © John Owens

Divide: Paper & Pencil
1001 Quotient Divisor 1000 1001010 Dividend
–100100
101
1010
–1000
10 Remainder
(or Modulo result)

▪ ▪
See how big a number can be subtracted, creating quotient bit on each step
– Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder 3 versions of divide, successive refinement
74 UC Davis EEC 170, Winter 2021 / © John Owens

Divide Hardware Version 1
▪ 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg
Divisor
64 bits
Shift Right
64-bit ALU
Remainder
64 bits
Write
Quotient
32 bits
Control
Shift Left
75
UC Davis EEC 170, Winter 2021 / © John Owens

Divide Algorithm V1
▪ Takes n+1 steps for n-bit Quotient & Rem.
Start: Place Dividend in Remainder
▪ Remainder Quotient 0000 0111 0000
Divisor 0010 0000
Remainder ≥ 0
1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.
Test Remainder
Remainder < 0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. “restoring” division 3. Shift the Divisor register right1 bit. Yes: n+1 repetitions (n = 4 here) Done n+1 repetition? No: < n+1 repetitions 76 UC Davis EEC 170, Winter 2021 / © John Owens Divide Algorithm I example (7 / 2) Remainder Quotient Divisor 0000 0111 1:1110 0111 00000 2:0000 0111 0000 3:0000 0111 0000 1:1111 0111 0000 2:0000 0111 000 000000010 0000 0010 0000 0010 0000 0001 0000 0001 0000 0001 0000 0000 1000 0000 1000 0000 1000 0000 0100 0000 0100 0000 0100 0000 0010 0000 0010 0000 0010 0000 0001 Answer: Quotient = 3 Remainder = 1 0 0 0 0 00 0 3:0000 0111 000 1:11111111 000 2:0000 0111 00 00 00 0 3:0000 0111 00 000 1:0000 0011 00 000 0001 2:0000 0011 0 3:0000 0011 0 0001 0001 2:0000 0001 00011 3:0000 0001 00011 1:0000 0001 0 77 UC Davis EEC 170, Winter 2021 / © John Owens Divide Hardware Version 3 ▪ 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Divisor 32 bits 32-bit ALU “HI” “LO” Shift Left Write Remainder (Quotient) Control 64 bits 78 UC Davis EEC 170, Winter 2021 / © John Owens Final Multiply / Divide Hardware Multiplicand 32 bits 32-bit ALU 64 bits 32 bits 32-bit ALU “HI” Shift Right Write Control Product (Multiplier) Divisor “LO” Shift Left Write Remainder (Quotient) Control 64 bits 79 UC Davis EEC 170, Winter 2021 / © John Owens SRT Division D. Sweeney of IBM, J.E. Robertson of the University of Illinois, and T.D. Tocher of Imperial College, London Current Remainder P-D (Partial Divisor) Plot 9|9 43211111 8|8 42211110 7|7 32111100 6|6 32111000 5 | 5 2 1 1 1 0 00 0 4 | 4 2 1 1 0 0 00 0 3|3 11000000 2|2 10000000 1|1 00000000 0 +------------------ 0 1 2 3 4 5 6 78 9 80 UC Davis EEC 170, Winter 2021 / © John Owens SRT Division ▪ Intel Pentium divide implementation: SRT division with 2 bits/iteration (radix 4) ▪ Allows negative entries ▪ 1066 entries in lookup table 81 UC Davis EEC 170, Winter 2021 / © John Owens [http://members.cox.net/srice1/ pentbug/introduction.html] Faster Division ▪ Can’t use parallel hardware as in multiplier - Subtraction is conditional on sign of remainder ▪ Faster dividers (e.g. SRT division) generate multiple quotient bits per step - Still require multiple steps 82 UC Davis EEC 170, Winter 2021 / © John Owens Division Lessons ▪ In practice, slower than multiplication - Also less frequent - But, in the simple case, can use same hardware! ▪ Generates quotient and remainder together ▪ Floating-point division faster than integer division (why?) ▪ Similar hardware lessons as multiplier: - Look for unused hardware - Can process multiple bits at once at cost of extra hardware 83 UC Davis EEC 170, Winter 2021 / © John Owens RISC-V logical instructions Instruction Meaning Pseudocode XORI rd,rs1,imm SRAI rd,rs1,imm Exclusive Or Immediate rd ← ux(rs1) ⊕ ux(imm) ORI rd,rs1,imm Or Immediate rd ← ux(rs1) ∨ ux(imm) ANDI rd,rs1,imm SLLI rd,rs1,imm And Immediate rd ← ux(rs1) ∧ ux(imm) Shift Left Logical Immediate rd ← ux(rs1) « ux(imm) SRLI rd,rs1,imm Shift Right Logical Immediate rd ← ux(rs1) » ux(imm) SLL rd,rs1,rs2 SRL rd,rs1,rs2 Shift Right Arithmetic Immediate rd ← sx(rs1) » ux(imm) Shift Left Logical rd ← ux(rs1) « rs2 XOR rd,rs1,rs2 Exclusive Or rd ← ux(rs1) ⊕ ux(rs2) Shift Right Logical rd ← ux(rs1) » rs2 SRA rd,rs1,rs2 OR rd,rs1,rs2 Shift Right Arithmetic rd ← sx(rs1) » rs2 Or rd ← ux(rs1) ∨ ux(rs2) AND rd,rs1,rs2 And rd ← ux(rs1) ∧ ux(rs2) 84 UC Davis EEC 170, Winter 2021 / © John Owens Manipulating Bits with Shift, Or, And ▪ ▪ ▪ ▪ ▪ ▪ Mask out exponent with and S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM & 0 11111111 00000000000000000000000 ------------------------------------ 0 EEEEEEEE 00000000000000000000000 Right shift 23 bits to get 000000000000000000000000 EEEEEEEE Do arithmetic manipulation 000000000000000000000000 ENEWENEW Left shift 23 bits to get 0 ENEWENEW 00000000000000000000000 Zero out old exponent S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM & 1 00000000 11111111111111111111111 ------------------------------------ S 00000000 MMMMMMMMMMMMMMMMMMMMMMM Or in new exponent S 00000000 MMMMMMMMMMMMMMMMMMMMMMM | 0 ENEWENEW 00000000000000000000000 ------------------------------------ S ENEWENEW MMMMMMMMMMMMMMMMMMMMMMM 85 UC Davis EEC 170, Winter 2021 / © John Owens Shift Operations ▪ Arithmetic operation: - Example: 00011 << 2 [3 left shift 2] - 00011 << 2 = 01100 = 12 = 2 * 4 - Each bit shifted left == multiply by two - Example: 01010 >> 1 [10 right shift 1]
– 01010 >> 1 = 00101 = 5 = 10/2
– Each bit shifted right == divide by two
– Why?
– Compilers do this—“strength reduction”
86 UC Davis EEC 170, Winter 2021 / © John Owens

Shift Operations


With left shift, what do we shift in?
– 00011 << 2 = 01100 (arithmetic) - 0000XXXX << 4 = XXXX0000 (logical) - We shifted in zeroes How about right shift? - XXXX0000 >> 4 = 0000XXXX (logical) – Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3) (arithmetic) – Shifted in zero
– 11110 (= -2) >> 1 = 11111 (-1) (arithmetic) – Shifted in one
87 UC Davis EEC 170, Winter 2021 / © John Owens

Shift Operations
▪ How about right shift?
– XXXX0000 >> 4 = 0000XXXX: Logical shift
– Shifted in zero
– 00110 (= 6) >> 1 = 00011 (3)
11110 (= -2) >> 1 = 11111 (-1): Arithmetic shift
– Shifted in sign bit
▪ RISC-V supports both logical and arithmetic:
– slli, srai, srli: Shift amount taken from within instruction (“imm”)
6 bits 6 bits 5 bits 3 bits 5 bits
7 bits 5 bits 5 bits 3 bits 5 bits
– sll, sra, srl: shift amount taken from register (“variable”)
– How far can we shift with slli/srai/slli? With sll/sra/srl?
7 bits
7 bits
funct6
shamt
rs1
funct3
rd
opcode
funct7
rs2
rs1
funct3
rd
opcode
88
UC Davis EEC 170, Winter 2021 / © John Owens

Combinational Shifter from MUXes
Basic Building Block
AB sel
D
8-bit right shifter
A7 A6 A5 A4 A3 A2 A1 A0
S2 S1 S0
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
▪ What comes in the MSBs?
▪ How many levels for 64-bit shifter? ▪ What if we use 4-1 Muxes ?
R7 R6 R5 R4 R3 R2 R1 R0
89 UC Davis EEC 170, Winter 2021 / © John Owens
10

General Shift Right Scheme using 16 bit example
S0 (0,1)
S1 (0,2)
S2 (0,4)
S3 (0,8)
▪ If we added right-to-left connections, we could supportROTATE (not in RISC-V but found in other ISAs)
90 UC Davis EEC 170, Winter 2021 / © John Owens

Funnel Shifter ▪ Shift A by i bits
▪ Problem: Set Y, X, sa ▪ Logical:
▪ Arithmetic: ▪ Rotate:
▪ Left shifts:
Extract 64 bits of 128 (sa selects which bits)
| sa |
Y
X
Y
64
X
64
R
Shift Right
91
UC Davis EEC 170, Winter 2021 / © John Owens
64 R

Barrel Shifter
▪ Technology-dependent solutions: transistor per switch
in6
in5
in4
out3
out2
out1
out0
SR3
SR2 SR1 SR0
in3
in2 in1 in0
92
UC Davis EEC 170, Winter 2021 / © John Owens

Shifter Summary
▪ Shifts common in logical ops, also in arithmetic ▪ RISC-V has:
– 2 flavors of shift: logical and arithmetic
– 2 directions of shift: right and left
– 2 sources for shift amount: immediate, variable
▪ Lots of cool shift algorithms, but …
– Barrel shifter prevalent in today’s hardware
93 UC Davis EEC 170, Winter 2021 / © John Owens

Floating Point
▪ Representation for non-integral numbers
– Including very small and very large numbers
▪ Like scientific notation
– –2.34 × 1056
– +0.002 × 10–4
– +987.02 × 109 ▪ In binary
– ±1.xxxxxxx2 × 2yyyy
▪ Typesfloatanddoublein C
normalized
not normalized
94
UC Davis EEC 170, Winter 2021 / © John Owens
§3.5 Floating Point

Floating Point Standard
▪ Defined by IEEE Std 754-1985
▪ Developed in response to divergence of representations
– Portability issues for scientific code ▪ Now almost universally adopted
▪ Two representations
– Single precision (32-bit)
– Double precision (64-bit)
95 UC Davis EEC 170, Winter 2021 / © John Owens

96 UC Davis EEC 170, Winter 2021 / © John Owens

Floating-point Formats
▪ Single-precision (32 bits)
▪ Double precision (64 bits)
97 UC Davis EEC 170, Winter 2021 / © John Owens

IEEE Floating-Point Format
x = (−1)S ×(1+Fraction)×2(Exponent−Bias)
▪ S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)
▪ Normalize significand: 1.0 ≤ |significand| < 2.0 - Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) - Significand is Fraction with the “1.” restored Fraction: single: 23 bits double: 52 bits ▪ Exponent: excess representation: actual exponent + Bias - Ensures exponent is unsigned - Single: Bias = 127; Double: Bias = 1023 98 UC Davis EEC 170, Winter 2021 / © John Owens Exponent: single: 8 bits double: 11 bits Single-Precision Range ▪ Exponents 00000000 and 11111111 reserved ▪ Smallest value - Exponent: 00000001 ⇒ actual exponent = 1 – 127 = –126 - Fraction: 000...00 ⇒ significand = 1.0 - ±1.0 × 2–126 ≈ ±1.2 × 10–38 ▪ Largest value - exponent: 11111110 ⇒ actual exponent = 254 – 127 = +127 - Fraction: 111...11 ⇒ significand ≈ 2.0 - ±2.0 × 2+127 ≈ ±3.4 × 10+38 99 UC Davis EEC 170, Winter 2021 / © John Owens Double-Precision Range ▪ Exponents 0000...00 and 1111...11 reserved ▪ Smallest value - Exponent: 00000000001 ⇒ actual exponent = 1 – 1023 = –1022 - Fraction: 000...00 ⇒ significand = 1.0 - ±1.0 × 2–1022 ≈ ±2.2 × 10–308 ▪ Largest value - Exponent: 11111111110 ⇒ actual exponent = 2046 – 1023 = +1023 - Fraction: 111...11 ⇒ significand ≈ 2.0 - ±2.0 × 2+1023 ≈ ±1.8 × 10+308 100 UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Precision ▪ Relative precision - all fraction bits are significant - Single: approx 2–23 - Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision - Double: approx 2–52 - Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision 101 UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Example ▪ Represent –0.75 = –(0.5 + 0.25) = –(1.0 + 0.5) / 2 - - –0.75 = (–1)1 × 1.12 × 2–1 - S =1 Fraction = 1000...002 - Exponent = –1 + Bias - Single: –1 + 127 = 126 = 011111102 - Double: –1 + 1023 = 1022 = 011111111102 ▪ Single: 1011111101000...00 ▪ Double: 1011111111101000...00 102 UC Davis EEC 170, Winter 2021 / © John Owens Floating-Point Example ▪ What number is represented by the single-precision float 11000000101000...00 - S =1 - - ▪ Fraction = 01000...002 Fxponent = 100000012 = 129 x = (–1)1 × (1 + .012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0 103 UC Davis EEC 170, Winter 2021 / © John Owens Denormal Numbers ▪ Exponent = 000...0 ⇒ hidden bit is 0 x=(−1)S ×(0+Fraction)×2−Bias ▪ Smaller than normal numbers - allow for gradual underflow, with diminishing precision ▪ Denormal with fraction = 000...0 x=(−1)S ×(0+0)×2−Bias =±0.0 Two representations of 0.0! 104 UC Davis EEC 170, Winter 2021 / © John Owens Infinities and NaNs ▪ Exponent = 111...1, Fraction = 000...0 - ±Infinity - Can be used in subsequent calculations, avoiding need for overflow check ▪ Exponent = 111...1, Fraction ≠ 000...0 - Not-a-Number (NaN) - Indicates illegal or undefined result - e.g., 0.0 / 0.0 - Can be used in subsequent calculations 105 UC Davis EEC 170, Winter 2021 / © John Owens Half-precision Floating Point https://en.wikipedia.org/wiki/Bfloat16_floating-point_format 106 UC Davis EEC 170, Winter 2021 / © John Owens