编程代写 COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN
The Hardware/Software Interface
Arithmetic for Computers

Arithmetic for Computers
 Operations on integers
 Addition and subtraction  Multiplication and division  Dealing with overflow
 Floating-point real numbers
 Representation and operations
Chapter 3 — Arithmetic for Computers — 2
§3.1 Introduction

Integer Addition
 Example: 7 + 6
 Overflow if result out of range
 Adding +ve and –ve operands, no overflow
 Adding two +ve operands  Overflow if result sign is 1
 Adding two –ve operands  Overflow if result sign is 0
Chapter 3 — Arithmetic for Computers — 3
§3.2 Addition and Subtraction

Integer Subtraction
 Add negation of second operand  Example:7–6=7+(–6)
+7: 0000 0000 … 0000 0111 –6: 1111 1111 … 1111 1010 +1: 0000 0000 … 0000 0001
 Overflow if result out of range
 Subtracting two +ve or two –ve operands, no overflow
 Subtracting +ve from –ve operand  Overflow if result sign is 0
 Subtracting –ve from +ve operand  Overflow if result sign is 1
Chapter 3 — Arithmetic for Computers — 4

Multiplication
 Start with long-multiplication approach
multiplicand
multiplier
1000 1001000
Length of product is the sum of operand lengths (with carry)
Chapter 3 — Arithmetic for Computers — 5
§3.3 Multiplication

Multiplication Hardware
Initially 0
Chapter 3 — Arithmetic for Computers — 6

Optimized Multiplier
 Perform steps in parallel: add/shift
 One cycle per partial-product addition
 That’s ok, if frequency of multiplications is low
Chapter 3 — Arithmetic for Computers — 7

Faster Multiplier
 Uses multiple adders
 Cost/performance tradeoff
 Can be pipelined
 Several multiplications performed in parallel
Chapter 3 — Arithmetic for Computers — 8

RISC-V Multiplication
 Four multiply instructions:
 mul: multiply
 Gives the lower 64 bits of the product
 mulh: multiply high
 Gives the upper 64 bits of the product, assuming the
operands are signed
 mulhu: multiply high unsigned
 Gives the upper 64 bits of the product, assuming the operands are unsigned
 mulhsu: multiply high signed/unsigned
 Gives the upper 64 bits of the product, assuming one
operand is signed and the other unsigned
 Use mulh result to check for 64-bit overflow
Chapter 3 — Arithmetic for Computers — 9

quotient dividend
1000 1001010
n-bit operands yield n-bit quotient and remainder
 Check for 0 divisor
 Long division approach
 If divisor ≤ dividend bits
 1 bit in quotient, subtract
 Otherwise
 0 bit in quotient, bring down next
dividend bit
 Restoring division
 Do the subtract, and if remainder goes < 0, add divisor back  Signed division  Divide using absolute values  Adjust sign of quotient and remainder as required Chapter 3 — Arithmetic for Computers — 10 §3.4 Division Division Hardware Initially divisor in left half Initially dividend Chapter 3 — Arithmetic for Computers — 11 Optimized Divider  One cycle per partial-remainder subtraction  Looks a lot like a multiplier!  Same hardware can be used for both Chapter 3 — Arithmetic for Computers — 12 Faster Division  Can’t use parallel hardware as in multiplier  Subtraction is conditional on sign of remainder  Faster dividers generate multiple quotient bits per step  Still require multiple steps Chapter 3 — Arithmetic for Computers — 13 RISC-V Division  Four instructions:  div, rem: signed divide, remainder  divu, remu: unsigned divide, remainder  Overflow and division-by-zero don’t produce errors  Just return defined results  Faster for the common case of no error (Note that the RVS returns errors) Chapter 3 — Arithmetic for Computers — 14 Floating Point  Representation for non-integral numbers  Including very small and very large numbers  Like scientific notation  –2.34 × 1056  +0.002 × 10–4  +987.02 × 109  In binary  ±1.xxxxxxx2 × 2yyyy normalized not normalized  Types float and double in C Chapter 3 — Arithmetic for Computers — 15 §3.5 Floating Point Floating Point Standard  Defined by IEEE Std 754-1985  Developed in response to divergence of representations  Portability issues for scientific code  Now almost universally adopted  Two representations  Single precision (32-bit)  Double precision (64-bit) Chapter 3 — Arithmetic for Computers — 16 IEEE Floating-Point Format single: 8 bits single: 23 bits double: 11 bits double: 52 bits x = (−1)S ×(1+Fraction)×2(Exponent−Bias)  S: sign bit (0 ⇒ non-negative, 1 ⇒ negative)  Normalize significand: 1.0 ≤ |significand| < 2.0  Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit)  Significand is Fraction with the “1.” restored  Exponent: excess representation: actual exponent + Bias  Ensures exponent is unsigned  Single: Bias = 127; Double: Bias = 1023 Chapter 3 — Arithmetic for Computers — 17 Single-Precision Range  Exponents 00000000 and 11111111 reserved  Smallest value  Exponent: 00000001 ⇒ actual exponent = 1 – 127 = –126  Fraction: 000...00 ⇒ significand = 1.0  ±1.0 × 2–126 ≈ ±1.2 × 10–38  Largest value  exponent: 11111110 ⇒ actual exponent = 254 – 127 = +127  Fraction: 111...11 ⇒ significand ≈ 2.0  ±2.0 × 2+127 ≈ ±3.4 × 10+38 Chapter 3 — Arithmetic for Computers — 18 Double-Precision Range  Exponents 0000...00 and 1111...11 reserved  Smallest value  Exponent: 00000000001 ⇒ actual exponent = 1 – 1023 = –1022  Fraction: 000...00 ⇒ significand = 1.0  ±1.0 × 2–1022 ≈ ±2.2 × 10–308  Largest value  Exponent: 11111111110 ⇒ actual exponent = 2046 – 1023 = +1023  Fraction: 111...11 ⇒ significand ≈ 2.0  ±2.0 × 2+1023 ≈ ±1.8 × 10+308 Chapter 3 — Arithmetic for Computers — 19 Floating-Point Precision  Relative precision  all fraction bits are significant  Single: approx 2–23  Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision  Double: approx 2–52  Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision For a quick estimate: 210=1024, so 10 bits convert to approx. 3 decimal digits. Chapter 3 — Arithmetic for Computers — 20 Floating-Point Example  Represent –0.75  –0.75 = (–1)1 × 1.12 × 2–1 S=1  Fraction = 1000...002  Exponent = –1 + Bias  Single: –1 + 127 = 126 = 011111102  Double: –1 + 1023 = 1022 = 011111111102  Single: 1011111101000...00  Double: 1011111111101000...00 Chapter 3 — Arithmetic for Computers — 21 Floating-Point Example  What number is represented by the single- precision float 11000000101000...00 S=1  Fraction = 01000...002  Fxponent = 100000012 = 129 x=(–1)1 ×(1+012)×2(129–127) = (–1) × 1.25 × 22 Chapter 3 — Arithmetic for Computers — 22 Denormal Numbers  Exponent = 000...0 ⇒ hidden bit is 0 x=(−1)S ×(0+Fraction)×2−Bias  Smaller than normal numbers  allow for gradual underflow, with diminishing precision  Denormal with fraction = 000...0 x=(−1)S ×(0+0)×2−Bias =±0.0 Chapter 3 — Arithmetic for Computers — 23 Two representations of 0.0! Infinities and NaNs  Exponent = 111...1, Fraction = 000...0  ±Infinity  Can be used in subsequent calculations, avoiding need for overflow check  Exponent = 111...1, Fraction ≠ 000...0  Not-a-Number (NaN)  Indicates illegal or undefined result  e.g., 0.0 / 0.0  Can be used in subsequent calculations Chapter 3 — Arithmetic for Computers — 24 Floating-Point Addition  Consider a 4-digit decimal example  9.999 × 101 + 1.610 × 10–1  1. Align decimal points  Shift number with smaller exponent  9.999 × 101 + 0.016 × 101  2. Add significands  9.999×101 +0.016×101 =10.015×101  3. Normalize result & check for over/underflow  1.0015 × 102  4. Round and renormalize if necessary  1.002 × 102 Chapter 3 — Arithmetic for Computers — 25 Floating-Point Addition  Now consider a 4-digit binary example  1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)  1. Align binary points  Shift number with smaller exponent  1.0002 × 2–1 + –0.1112 × 2–1  2. Add significands  1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1  3. Normalize result & check for over/underflow  1.0002 × 2–4, with no over/underflow  4. Round and renormalize if necessary  1.0002 × 2–4 (no change) = 0.0625 Chapter 3 — Arithmetic for Computers — 26 FP Adder Hardware  Much more complex than integer adder  Doing it in one clock cycle would take too  Much longer than integer operations  Slower clock would penalize all instructions  FP adder usually takes several cycles  Can be pipelined Chapter 3 — Arithmetic for Computers — 27 FP Adder Hardware Step 3 Step 4 Chapter 3 — Arithmetic for Computers — 28 Floating-Point Multiplication  Consider a 4-digit decimal example  1.110 × 1010 × 9.200 × 10–5  1. Add exponents  For biased exponents, subtract bias from sum  Newexponent=10+–5=5  2. Multiply significands  1.110 × 9.200 = 10.212 ⇒ 10.212 × 105  3. Normalize result & check for over/underflow  1.0212 × 106  4. Round and renormalize if necessary  1.021 × 106  5. Determine sign of result from signs of operands  +1.021 × 106 Chapter 3 — Arithmetic for Computers — 29 Floating-Point Multiplication  Now consider a 4-digit binary example  1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)  1. Add exponents  Unbiased:–1+–2=–3  Biased:(–1+127)+(–2+127)=–3+254–127=–3+127  2. Multiply significants  1.0002 × 1.1102 = 1.1102 ⇒ 1.1102 × 2–3  3. Normalize result & check for over/underflow  1.1102 × 2–3 (no change) with no over/underflow  4. Round and renormalize if necessary  1.1102 × 2–3 (no change)  5. Determine sign: +ve × –ve ⇒ –ve  –1.1102 × 2–3 = –0.21875 Chapter 3 — Arithmetic for Computers — 30 FP Multiplier Hardware Add exponents Step 3 Step 4 MULTIPLIER Determine sign Chapter 3 — Arithmetic for Computers — 31 FP Arithmetic Hardware  FP multiplier is of similar complexity to FP adder  But uses a multiplier for significands instead of an adder  FP arithmetic hardware usually does  Addition, subtraction, multiplication, division, reciprocal, square-root  FP ↔ integer conversion  Operations usually takes several cycles  Can be pipelined Chapter 3 — Arithmetic for Computers — 32 FP Instructions in RISC-V  Separate FP registers: f0, ..., f31  double-precision  single-precision values stored in the lower 32 bits  FP instructions operate only on FP registers  Programs generally don’t do integer ops on FP data, or vice versa  More registers with minimal code-size impact  FP load and store instructions (p.206)  flw, fld  fsw, fsd //p206 ;changed to double precision FP data: DF 2,1,0 addi x10, x0, data fld f0, 0(x10) fld f1, 8(x10) fadd.d f2, f0, f1 fsd f2, 16(x10) ; data[2]=data[0]+data[1] Chapter 3 — Arithmetic for Computers — 33 FP Instructions in RISC-V  Single-precision arithmetic  fadd.s, fsub.s, fmul.s, fdiv.s, fsqrt.s  e.g., fadds.s f2, f4, f6  Double-precision arithmetic  fadd.d, fsub.d, fmul.d, fdiv.d, fsqrt.d  e.g., fadd.d f2, f4, f6  Single- and double-precision comparison  feq.s, flt.s, fle.s  feq.d, flt.d, fle.d (beq slt/blt bge)  Result is 0 or 1 in integer destination register  Use beq, bne to branch on comparison result  Branch on FP condition code 0 or 1, e.g. for code in x5  beq x5, x0, label Chapter 3 — Arithmetic for Computers — 34 FP Example: °F to °C float f2c=((5.0/9.0)*(fahr - 32.0)); fahr in f10, result in f10, literals in memory space  RISC-V code: //p.208-209 ;changed to double precision FP const5: DF 5.0 const9: DF 9.0 const32: DF 32.0 fahr: DF 100.0 ;100.0F=37.7C fld f10, fahr(x0) f2c: fld f0, const5(x0) ;f0=5.0 fld f1, const9(x0) ;f1=9.0 fdiv.d f0, f0, f1 ;f0=5.0/9.0 fld f1, const32(x0);f1=32.0 fsub.d f10, f10, f1 ;f10=fahr-32.0 fmul.d f10, f0, f10 ;f2c result Chapter 3 — Arithmetic for Computers — 35 FP Example: Array Multiplication  C = C + A× B  All 32 × 32 matrices, 64-bit double-precision elements void mm (double c[][], double a[][], double b[][]) { size_t i, j, k; for (i = 0; i < 32; i = i + 1) for (j = 0; j < 32; j = j + 1) for (k = 0; k < 32; k = k + 1) c[i][j] = c[i][j] + a[i][k] * b[k][j];  Addresses of c, a, b in x10, x11, x12, and i,j,kin x5, x6, x7 Chapter 3 — Arithmetic for Computers — 36 FP Example: Array Multiplication  RISC-V code: //p.210-211 ;http://matrix.reshish.com/multCalculation.php ; C1 C2 C3 addi x10, c(x0) addi x11, a(x0) addi x12, b(x0) addi x28, x0, 3 ;x28=rcl (row&column length) addi x5, x0, 0 ;i=0 L1: addi x6, x0, 0 ;j=0 L2: addi x7, x0, 0 ;k=0 mul x30, x5, x28 ;x30=i*rcl add x30, x30, x6 ;x30=i*size(row)+j slli x30, x30, 3 ;x30=x30*8 dwords2bytes add x30, x10, x30 ;address of c[i][j] fld f0, 0(x30) ;f0=c[i][j] 81 96 126 150 1,2,3, 4,5,6, 7,8,9 1,2,3, 4,5,6, 7,8,9 Chapter 3 — Arithmetic for Computers — 37 FP Example: Array Multiplication mul x29, x7, x28 ;x29=k*rcl add x29, x29, x6 ;x29=k*size(row)+j slli x29, x29, 3 ;x29=x29*8 dwords2bytes add x29, x12, x29 ;addrees of b[k][j] fld f1, 0(x29) ;f1=b[k][j] mul x29, x5, x28 ;x29=i*rcl add x29, x29, x7 ;x29=i*size(row)+k slli x29, x29, 3 ;x29=x29*8 dwords2bytes add x29, x11, x29 ;addrees of a[i][k] fld f2, 0(x29) ;f2=a[i][k] fmul.d f1, f2, f1 ;f1=a[i][k]+b[k][j] fadd.d f0, f0, f1 ;f0=c[i][j]+a[i][k]+b[k][j] addi x7, x7, 1 ;k=k+1 bltu x7, x28, L3 ;if(k<32)goto L3 fsd f0, 0(x30) ;c[i][j]=f0 addi x6, x6, 1 ;j=j+1 bltu x6, x28, L2 ;if(j<32)goto L2 addi x5, x5, 1 ;i=i+1 bltu x5, x28, L1 ;if(k<32)goto L1 Chapter 3 — Arithmetic for Computers — 38 Accurate Arithmetic  IEEE Std 754 specifies additional rounding control  Extra bits of precision (guard, round, sticky)  Choice of rounding modes  Allows programmer to fine-tune numerical behavior of a computation  Not all FP units implement all options  Most programming languages and FP libraries just use defaults  Trade-off between hardware complexity, performance, and market requirements Chapter 3 — Arithmetic for Computers — 39  Graphics and audio applications can take advantage of performing simultaneous operations on short vectors  Example: 128-bit adder:  Sixteen 8-bit adds  Eight 16-bit adds  Four 32-bit adds  Also called data-level parallelism, vector parallelism, or Single Instruction, Multiple Data (SIMD) Chapter 3 — Arithmetic for Computers — 40 §3.6 Parallelism and Computer Arithmetic: Subword Parallelism Associativity  Parallel programs may interleave operations in unexpected orders  Assumptions of associativity may fail  Need to validate parallel programs under varying degrees of parallelism Chapter 3 — Arithmetic for Computers — 41 Who Cares About FP Accuracy?  Important for scientific code  But for everyday consumer use?  “My bank balance is out by 0.0002¢!”   The Intel Pentium FDIV bug  The market expects accuracy  See Colwell, The Pentium Chronicles Chapter 3 — Arithmetic for Computers — 42 Concluding Remarks  Bits have no inherent meaning  Interpretation depends on the instructions  ISAs supported arithmetic  Signed and unsigned integers  Floating-point approximation to reals  Computer representations of numbers  Bounded range and precision  Need to account for this in programs Chapter 3 — Arithmetic for Computers — 43 §3.10 Concluding Remarks 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts