程序代写代做代考 Introduction to Computer Systems 15-213/18-243, spring 2009 1st Lecture, Jan. 12th

Introduction to Computer Systems 15-213/18-243, spring 2009 1st Lecture, Jan. 12th

Floating Point II

http://xkcd.com/899/

CS295
L13: Floating Point II

Floating point topics
Fractional binary numbers
IEEE floating-point standard
Floating-point operations and rounding
Floating-point in C

There are many more details that we won’t cover
It’s a 58-page standard…
2

CS295
L13: Floating Point II
Start by talking about representing fractional numbers in binary in general
Then about the IEEE Standard for floating point representation, operations,
Will not cover:
All the details of the 58-page spec
Or expect you to perform operations
Want to show you how floating point values are represented
To give you a sense of the kinds of things that can go wrong with them

Tiny Floating Point Representation
We will use the following 8-bit floating point representation to illustrate some key points:

Assume that it has the same properties as IEEE floating point:
bias =
encoding of =
encoding of =
encoding of the largest (+) normalized # =
encoding of the smallest (+) normalized # =

3
S
E
M
1
4
3

CS295
L13: Floating Point II

Peer Instruction Question
Using our 8-bit representation, what value gets stored when we try to encode 2.625 = 21 + 2-1 + 2-3?

+ 2.5
+ 2.625
+ 2.75
+ 3.25
We’re lost…

4
S
E
M
1
4
3

CS295
L13: Floating Point II
Peer Instruction Question
Using our 8-bit representation, what value gets stored when we try to encode 384 = 28 + 27?

+ 256
+ 384
+
NaN
We’re lost…

5
S
E
M
1
4
3

CS295
L13: Floating Point II
Distribution of Values
What ranges are NOT representable?
Between largest norm and infinity
Between zero and smallest denorm
Between norm numbers?
Given a FP number, what’s the bit pattern of the next largest representable number?
What is this “step” when Exp = 0?
What is this “step” when Exp = 100?
Distribution of values is denser toward zero
6

Overflow (Exp too large)
Underflow (Exp too small)
Rounding

CS295
L13: Floating Point II
Floating Point Rounding
The IEEE 754 standard actually specifies different rounding modes:
Round to nearest, ties to nearest even digit
Round toward (round up)
Round toward (round down)
Round toward 0 (truncation)

In our tiny example:
Man = 1.001 01 rounded to M = 0b001
Man = 1.001 11 rounded to M = 0b010
Man = 1.001 10 rounded to M = 0b010
7
This is extra (non-testable) material
S
E
M
1
4
3

CS295
L13: Floating Point II
Floating Point Operations: Basic Idea

x +f y = Round(x + y)
x *f y = Round(x * y)

Basic idea for floating point operations:
First, compute the exact result
Then round the result to make it fit into the specificed precision (width of M)
Possibly over/underflow if exponent outside of range

8
S
E
M
Value = (-1)S×Mantissa×2Exponent

CS295
L13: Floating Point II
We won’t be asking you to do these yourself, but seeing a bit about how it works will help you understand the ways in which it can go wrong, which we may ask you about

Mathematical Properties of FP Operations
Overflow yields and underflow yields
Floats with value and NaN can be used in operations
Result usually still or NaN, but not always intuitive
Floating point operations do not work like real math, due to rounding
Not associative: (3.14+1e100)–1e100 != 3.14+(1e100–1e100)
0 3.14
Not distributive: 100*(0.1+0.2) != 100*0.1+100*0.2
30.000000000000003553 30
Not cumulative
Repeatedly adding a very small number to a large one may do nothing
9

CS295
L13: Floating Point II
NaNs or infinity can taint results, spread through a numerical computations
Floating point operations are not like real mathematical operations
The same algebraic properties, associative, commutative, distributive, no longer work right

Floating point topics
Fractional binary numbers
IEEE floating-point standard
Floating-point operations and rounding
Floating-point in C

There are many more details that we won’t cover
It’s a 58-page standard…
10

Floating Point in C
Two common levels of precision:
float 1.0f single precision (32-bit)
double 1.0 double precision (64-bit)

#include to get INFINITY and NAN constants

Equality (==) comparisons between floating point numbers are tricky, and often return unexpected results, so just avoid them!
11
!!!

CS295
L13: Floating Point II

Floating Point Conversions in C
Casting between int, float, and double changes the bit representation
int float
May be rounded (not enough bits in mantissa: 23)
Overflow impossible
int or float double
Exact conversion (all 32-bit ints representable)
long double
Depends on word size (32-bit is exact, 64-bit may be rounded)
double or float int
Truncates fractional part (rounded toward zero)
“Not defined” when out of range or NaN: generally sets to Tmin
(even if the value is a very big positive)

12
!!!

CS295
L13: Floating Point II

Peer Instruction Question
We execute the following code in C. How many bytes are the same (value and position) between i and f?

0 bytes
1 byte
2 bytes
3 bytes
We’re lost…

13
int i = 384; // 2^8 + 2^7
float f = (float) i;

CS295
L13: Floating Point II
Purposely choose to skip this one. But do include solution in posted materials.

Floating Point and the Programmer
14
#include

int main(int argc, char* argv[]) {
float f1 = 1.0;
float f2 = 0.0;
int i;
for (i = 0; i < 10; i++) f2 += 1.0/10.0; printf("0x%08x 0x%08x\n", *(int*)&f1, *(int*)&f2); printf("f1 = %10.9f\n", f1); printf("f2 = %10.9f\n\n", f2); f1 = 1E30; f2 = 1E-30; float f3 = f1 + f2; printf("f1 == f3? %s\n", f1 == f3 ? "yes" : "no" ); return 0; } $ ./a.out 0x3f800000 0x3f800001 f1 = 1.000000000 f2 = 1.000000119 f1 == f3? yes CS295 L13: Floating Point II Floating Point Summary Floats also suffer from the fixed number of bits available to represent them Can get overflow/underflow “Gaps” produced in representable numbers means we can lose precision, unlike ints Some “simple fractions” have no exact representation (e.g. 0.2) “Every operation gets a slightly wrong result” Floating point arithmetic not associative or distributive Mathematically equivalent ways of writing an expression may compute different results Never test floating point values for equality! Careful when converting between ints and floats! 15 CS295 L13: Floating Point II Number Representation Really Matters 1991: Patriot missile targeting error clock skew due to conversion from integer to floating point 1996: Ariane 5 rocket exploded ($1 billion) overflow converting 64-bit floating point to 16-bit integer 2000: Y2K problem limited (decimal) representation: overflow, wrap-around 2038: Unix epoch rollover Unix epoch = seconds since 12am, January 1, 1970 signed 32-bit integer representation rolls over to TMin in 2038 Other related bugs: 1982: Vancouver Stock Exchange 10% error in less than 2 years 1994: Intel Pentium FDIV (floating point division) HW bug ($475 million) 1997: USS Yorktown “smart” warship stranded: divide by zero 1998: Mars Climate Orbiter crashed: unit mismatch ($193 million) 16 CS295 L13: Floating Point II This is the part where I try to scare you… -15-10-5051015DenormalizedNormalizedInfinity Chart4 0.005 0.25 15 0.0625 0.3125 -15 0.125 0.375 0.1875 0.4375 -0.005 0.5 -0.0625 0.625 -0.125 0.75 -0.1875 0.875 1 1.25 1.5 1.75 2 2.5 3 3.5 4 5 6 7 8 10 12 14 -0.25 -0.3125 -0.375 -0.4375 -0.5 -0.625 -0.75 -0.875 -1 -1.25 -1.5 -1.75 -2 -2.5 -3 -3.5 -4 -5 -6 -7 -8 -10 -12 -14 Denormalized Normalized Infinity 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sheet1 0.02 0.25 0.5 0.75 -0.02 -0.25 -0.5 -0.75 0 0 0 0 0 0 0 0 1 1.25 1.5 1.75 2 2.5 3 3.5 -1 -1.25 -1.5 -1.75 -2 -2.5 -3 -3.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 -5 0 0 Sheet1 Denormalized Normalized Infinity Sheet2 1/3/2 FP Format Fractions Denormalized 0 0.25 0.5 0.75 Normalized 1 1.25 1.5 1.75 Exponents 0.25 0.005 0.0625 0.125 0.1875 0.25 0.25 0.3125 0.375 0.4375 0.5 0.5 0.625 0.75 0.875 1 1 1.25 1.5 1.75 2 2 2.5 3 3.5 4 4 5 6 7 8 8 10 12 14 0.005 0.0625 0.125 0.1875 -0.005 -0.0625 -0.125 -0.1875 0 0 0 0 0 0 0 0 0.25 0.3125 0.375 0.4375 0.5 0.625 0.75 0.875 1 1.25 1.5 1.75 2 2.5 3 3.5 4 5 6 7 8 10 12 14 -0.25 -0.3125 -0.375 -0.4375 -0.5 -0.625 -0.75 -0.875 -1 -1.25 -1.5 -1.75 -2 -2.5 -3 -3.5 -4 -5 -6 -7 -8 -10 -12 -14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 -15 0 0 Sheet2 Denormalized Normalized Infinity Sheet3 Denormalized Normalized Infinity

Related Posts