CS计算机代考程序代写 algorithm mips Floating Point

Floating Point
McGill COMP273 1

Outline
• Special “numbers” revisited • Rounding
• FP add/sub
• FP on MIPS
• Integer multiplication & division
McGill COMP273 2

IEEE 754 Floating Point Review
S
E
F
S
E
F
Precision
Sign (S)
Exponent (E)
Fraction (F)
Bias
Float
1 bit
8 bits
23 bits
127
Double
1 bit
11 bits
52 bits
1023
(-1)S x (1+F) x 2(E-bias)
• Numbers in normalized form, i.e., 1.xxxx…
• The standard also defines special symbols

Special Numbers Reviewed • Special symbols (single precision)
Exponent
Fraction
Object represented
0
0
0
0
Nonzero
± denormalized number
1-254
Anything
± floating point number
255
0
± infinity
255
Nonzero
NaN (Not a Number)

Representation for Not a Number • What do I get if I calculate
sqrt(-4.0)or 0/0?
– If infinity is not an error, these shouldn’t be either. – Called Not a Number (NaN)
– Exponent = 255, Significand nonzero
• Why is this useful?
– Hope NaNs help with debugging?
– They contaminate: op(NaN,X) = NaN
McGill COMP273 5

Representation for Denorms (1/2)
• Problem: There’s a gap among representable FP numbers around 0
– Smallest representable positive number:
a = 1.00000000000000000000000 2 * 2-126 = 2-126
– Second smallest representable positive number: b = 1.00000000000000000000001 2 * 2-126 = 2-126 + 2-149
a – 0 = 2-126 b – a = 2-149
Gaps!b
–
0a
Normalization
and implicit 1
is to blame!
+
McGill COMP273
6

Representation for Denorms (2/2)
• Solution: special symbol in exponent field
– Use 0 in exponent field, nonzero for fraction
– Denormalized number
• Has no leading 1
• Has implicit exponent = -126 (i.e., don’t subtract bias)
– Smallest positive float: 2e-149
– 2nd smallest positive float: 2e-148
–
0
+
McGill COMP273
7

Small numbers and Denormalized
1.000000000000000000000102 x 2^-126 1.000000000000000000000012 x 2^-126 1.000000000000000000000002 x 2^-126 0.111111111111111111111112 x 2^-126 0.111111111111111111111102 x 2^-126 0.111111111111111111111012 x 2^-126 …
0.000000000000000000000112 x 2^-126 0.000000000000000000000102 x 2^-126 0.000000000000000000000012 x 2^-126 Next smaller number is zero
Denormalized!
McGill COMP273 8

Rounding
• When we perform math on real numbers, we must worry about rounding to fit the result in the significant field.
– The FP hardware carries two extra bits of precision, and then rounds to get the proper value
– Rounding also occurs when converting a double to a single precision value, or converting a floating point number to an integer
McGill COMP273 9

1.
2.
Round towards +infinity
– ALWAYS round “up”: 2.001 -> 3
– -2.001 -> -2
Round towards -infinity
– ALWAYS round “down”: 1.999 -> 1,
– -1.999 -> -2
ceiling(𝑥) or
𝑥
IEEE Has Four Rounding Modes
3. Truncate
– Just drop the last bits (round towards 0) 4. Round to (nearest) even
– Normal rounding, almost
McGill COMP273
10
floor(𝑥) or
𝑥

Round to Even • Round like you learned in grade school
• Except if the value is right on the borderline, in which case we round to the nearest EVEN number
– 2.5 -> 2 – 3.5 -> 4
• Insures fairness
– This way, half the time we round up on tie, the other half time we
round down
• This is the default rounding mode
McGill COMP273 11

FP Addition and Subtraction 1/2
• Much more difficult than with integers
• Cannot just add significands
• Recall how we do it:
1. De-normalize to match larger exponent
2. Add significands to get resulting one
3. Normalize and check for under/overflow
4. Round if needed (may need to goto 3)
• Note: If signs differ, perform a subtract instead – Subtract is similar except for step 2
McGill COMP273 12

FP Addition and Subtraction 2/2 • Problems in implementing FP add/sub:
– If signs differ for add (or same for sub), what is the sign of the result?
• Question:
– How do we integrate this into the integer arithmetic unit? – Answer: We don’t!
McGill COMP273 13

MIPS Floating Point Architecture (1/4) • Separate floating point instructions:
– Single Precision:
add.s, sub.s, mul.s, div.s
– Double Precision:
add.d, sub.d, mul.d, div.d
• These instructions are far more complicated than their integer counterparts, so they can take much longer to execute.
McGill COMP273 14

MIPS Floating Point Architecture (2/4) • Observations
– It’s inefficient to have different instructions take vastly differing amounts of time.
– Generally, a particular piece of data will not change from FP to int, or vice versa, within a program. So only one type of instruction will be used on it.
– Some programs do no floating point calculations
– It takes lots of hardware relative to integers to make Floating Point fast
McGill COMP273 15

MIPS Floating Point Architecture (3/4) • Pre 1990 Solution:
– separate chip to do floating point (FP)
• Coprocessor 1: FP chip
– Contains 32 32-bit registers: $f0, $f1, …
– Usually registers specified in FP instructions refer to this set
– Separate load and store: lwc1 and swc1
(“load word coprocessor 1”, “store …”)
– Double Precision: by convention, even/odd pair contain one
DP FP number: $f0/$f1, $f2/$f3, … , $f30/$f31 where the even register is the name
McGill COMP273 16

MIPS Floating Point Architecture (4/4)
• Pre 1990 Computers contains multiple separate chips: – Processor: handles all the normal stuff
– Coprocessor 1: handles FP and only FP;
– more coprocessors? (yes, more on this later)
• Today, FP coprocessor integrated with CPU, or specialized or inexpensive chips may leave out FP HW
• Instructions to move data between main processor and coprocessors, e.g., mfc0, mtc0, mfc1, mtc1
McGill COMP273 17

Some More Example FP Instructions
abs.s $f0, $f2 # f0 = abs( f2 );
neg.s $f0, $f2 # f0 = – f2;
sqrt.s $f0, $f2 # f0 = sqrt( f2 );
c.lt.s $f0, $f2 # is $f0 < $f2 ? bc1t label # branch on condition true See 4th edition text 3.5 and App. B for a complete list of floating point instructions McGill COMP273 18 mfc1 $t0, $f0 mtc1 $t0, $f0 cvt.d.s $f0 $f2 cvt.d.w $f0 $f2 cvt.s.d $f0 $f2 cvt.s.w $f0 $f2 Copying, Conversion, Rounding # copy $f0 to $t0 # copy $t0 to $f0 # f0f1 gets float f2 converted to double # f0f1 gets int f2 converted to double # f0 gets double f2f3 converted to float # f0 gets int f2 converted to float ceil.w.s $f0 $f2 # round to next higher integer floor.w.s $f0 $f2 # round down to next lower integer trunc.w.s $f0 $f2 # round towards zero round.w.s $f0 $f2 # round to closest integer McGill COMP273 19 • Option1 – Declare constant 3.14 in data segment of memory – Load the address label – Load to coprocessor .data PI: .float 3.14 .text la $t0 PI lwc1 $f0 ($t0) l.S $f0 PI • Option2 – Compute hexadecimal IEEE representation for 3.14 (it is 0x4048F5C3) – Load immediate – Move to coprocessor lui $t0 0x4048 ori $t0 $t0 0xF5C3 mtc1 $t0 $f0 Dealing with Constants float a = 3.14; Option 3, pseudoinstruction not available in MARS: li.s $f0, 3.14 # easiest McGill COMP273 20 Floating Point Register Conventions ($f0, $f1), and ($f2, $f3) Function return registers used to return float and double values from function calls. ($f12, $f13) and ($f14, $f15) Two pairs of registers used to pass float and double valued arguments to functions. Pairs of registers are parenthesized because they have to pass double values. To pass float values, only $f12 and $f14 are used. $f4, $f6, $f8, $f10, $f16, $f18 Temporary registers $f20, $f22, $f24, $f26, $f28, $f30 Save registers whose values are preserved across function calls Unfortunately no nice names (e.g., $t#, $s#) like with the main registers) With double precision instructions, the high-order 32-bits are in the implied odd register. McGill COMP273 21 Fahrenheit to Celsius float f2c(float f) { return 5.0/9.0*(f-32.0); } .data const5: .float 5.0 const9: .float 9.0 const32: .float 32.0 .text f2c: la $t0 const5 lwc1 $f16 ($t0) la $t0 const9 lwc1 $f18 ($t0) div.s $f16 $f16 $f18 la $t0 const32 lwc1 $f18 ($t0) sub.s $f18 $f12 $f18 mul.s $f0 $f16 $f18 jr $ra # f16 = 5.0/9.0 # f18 = fahr-32.0 # return f16*f18 McGill COMP273 22 Debugging FP Code in MARS • MARS displays floating point registers in hexadecimal • This makes debugging floating point code tricky... – Can use MARS “Floating Point Representation” tool to examine single precision – Alternatively syscall can be used to print to console Service Code in $v0 Arguments Print float 2 $f12 = float to print Print double 3 $f12 = double to print Print string 4 $a0 = address of null-terminated string to print McGill COMP273 23 # print( float vec[4] ) printFloatVector: addi $sp, $sp, -8 sw $ra, 0($sp) sw $s0, 4($sp) move $s0, $a0 lwc1 $f12, 0($s0) jal printFloat jal printSpace lwc1 $f12, 4($s0) jal printFloat jal printSpace lwc1 $f12, 8($s0) jal printFloat jal printSpace lwc1 $f12, 12($s0) jal printFloat jal printNewLine lw $ra, 0($sp) lw $s0, 4($sp) addi $sp, $sp, 4 jr $ra .data spaceString: .asciiz " " newlineString: .asciiz "\n" printSpace: li $v0, 4 la $a0, spaceString syscall jr $ra printNewLine: li $v0, 4 la $a0, newlineString syscall jr $ra printFloat: # in $f12 li $v0, 2 syscall jr $ra McGill COMP273 24 REMEMBER: Floating Point Fallacy • FP add, subtract associative? FALSE! x = – 1.5 x 1038 y = 1.5 x 1038 z = 1.0 x + (y + z) = –1.5x1038 + (1.5x1038 + 1.0) = –1.5x1038 + (1.5x1038) = 0.0 (x + y) + z = (–1.5x1038 + 1.5x1038) + 1.0 = (0.0) + 1.0 = 1.0 • Floating Point add, subtract are not associative! – Floating point result approximates real result! McGill COMP273 25 Casting floats ↔ ints • (int) floating point expression – Coerces and converts it to the nearest integer (C uses truncation) i = (int) (3.14159 * f); • (float) expression – converts integer to nearest floating point f = f + (float) i; McGill COMP273 26 int → float → int if ( i == (int)((float) i) ) { printf(“true”); } • Does this always print true? – No, it will not always print “true” – Large values of integers don’t have exact floating point representations • What about double? McGill COMP273 27 float → int → float if ( f == (float)((int) f) ) { printf(“true”); } • Does this always print true? – No, it will not always print “true” – Small floating point numbers (<1) don’t have integer representations – Same is true for large numbers – For other numbers, rounding errors McGill COMP273 28 MIPS Integer Multiplication • Syntax of Multiplication (signed): MULT reg1 reg2 • Result of multiplying 32 bit registers has 64 bits • MIPS splits 64-bit result into 2 special registers – upper half in hi, lower half in lo – Registers hi and lo are separate from the 32 general purpose registers – Use MFHI reg to move from hi to register – Use MFLO reg to move from lo to another register • Unusual syntax compared to other instructions! McGill COMP273 29 MIPS Integer Multiplication Example a = b * c; Let b be $s2; let c be $s3; And let a be $s0 and $s1 (it may be up to 64 bits) mult $s2 $s3 # b*c mfhi $s0 # get upper half of product mflo $s1 # get lower half of product • We often only care about the low half of the product! McGill COMP273 30 MIPS Integer Division • Syntax of Division (signed): DIV reg1 reg2 – Divides register 1 by register 2 – Puts remainder of division in hi – Puts quotient of division in lo • Notice that this can be used to implement both the division operator (/) and modulo operator (%) in a high level language McGill COMP273 31 MIPS Integer Division Example a = c / d; b = c % d; Variable Register a $s0 b $s1 c $s2 d $s3 div $s2 $s3 # lo=c/d, hi=c%d mflo $s0 # get quotient mfhi $s1 # get remainder McGill COMP273 32 Unsigned Instructions and Overflow • MIPS has versions of mult and div for unsigned operands: multu, divu – Determines whether or not the product and quotient are changed if the operands are signed or unsigned. • Typically unsigned instructions check for overflow (e.g., add vs addu) • MIPS does not check overflow or division by zero on ANY signed/unsigned multiply, divide instruction – Up to the software to check “hi”, “divisor” McGill COMP273 33 Things to Remember • Integer multiplication and division: – mult, div, mfhi, mflo • New MIPS registers ($f0-$f31) and instructions in two flavours – Single Precision .s – Double Precision .d • FP add and subtract are not associative... • IEEE 754 NaN & Denorms (precision) review • IEEE 754’s Four different rounding modes McGill COMP273 34 Review and More Information • Textbook – Section 3.5 Floating Point • We saw the representation and addition and multiplication algorithm material earlier in the term • And now we have seen the Floating-Point instructions McGill COMP273 35

Related Posts