Carnegie Mellon
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
14
–
513
18
–
613
Carnegie Mellon
Floating Point
15-213/18-213/14-513/15-513/18-613: Introduction to Computer Systems 4th Lecture, Sept. 10, 2020
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Carnegie Mellon
Announcements
Lab 0 due today 11:59 pm ET
Lab 1 went out on Tuesday, due 9/17 ▪ Puzzles can be tricky, so start early
Written Assignment 1 released yesterday, due 9/16 ▪ Available on canvas. Hand-in via canvas
Bootcamp 3 is Friday 7-9 pm ET ▪ Debugging & gdb
First Recitations are Monday
▪ Students requesting in-person recitations will get assigned to an in-person
recitation section
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
3
Carnegie Mellon
Today: Floating Point
Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
Rounding, addition, multiplication
Floating point in C Summary
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
4
Carnegie Mellon
Fractional binary numbers What is 1011.1012?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
5
Carnegie Mellon
Fractional Binary Numbers
bi-1
•••
b1
b0
b-1
b-2
b-3
•••
b-j
1/2
1/4
1/8
Representation
▪ Bits to right of “binary point” represent fractional powers of 2 ▪ Represents rational number:
2-j
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
6
2i
•••
2i-1
4
2
1
bi
b2
•••
Carnegie Mellon
Fractional Binary Numbers: Examples
Value
5 3/4 = 23/4 2 7/8 = 23/8 1 7/16 = 23/16
Representation
101.112 010.1112 001.01112
= 4 + 1 + 1/2 + 1/4
= 2 + 1/2 + 1/4 + 1/8 = 1 + 1/4 + 1/8 + 1/16
Observations
▪ Divide by 2 by shifting right (unsigned)
▪ Multiply by 2 by shifting left
▪ Numbers of form 0.111111…2 are just below 1.0
▪ 1/2 + 1/4 + 1/8 + … + 1/2i + … ➙ 1.0 ▪ Use notation 1.0 – ε
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
7
Carnegie Mellon
Representable Numbers
Limitation #1
▪ Can only exactly represent numbers of the form x/2k
▪ Other rational numbers have repeating bit representations
▪ Value ▪ 1/3 ▪ 1/5
▪ 1/10
Representation
0.0101010101[01]…2 0.001100110011[0011]…2 0.0001100110011[0011]…2
Limitation #2
▪ Just one setting of binary point within the w bits
▪ Limited range of numbers (very small values? very large?)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
8
Carnegie Mellon
Today: Floating Point
Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
Rounding, addition, multiplication
Floating point in C Summary
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
9
Carnegie Mellon
IEEE Floating Point
IEEE Standard 754
▪ Established in 1985 as uniform standard for floating point arithmetic
▪ Before that, many idiosyncratic formats
▪ Supported by all major CPUs
▪ Some CPUs don’t implement IEEE 754 in full e.g., early GPUs, Cell BE processor
Driven by numerical concerns
▪ Nice standards for rounding, overflow, underflow ▪ Hard to make fast in hardware
▪ Numerical analysts predominated over hardware designers in defining standard
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
10
Carnegie Mellon
This is important!
Ariane 5 explodes on maiden voyage: $500 MILLION dollars lost ▪ 64-bit floating point number assigned to 16-bit integer (1996)
▪ Legacy code from Ariane 4 with a lower top speed
▪ Causes rocket to get incorrect value of horizontal velocity and crash
Patriot Missile defense system misses scud – 28 people die
▪ System tracks time in tenths of second
▪ Converted from integer to floating point number.
▪ Accumulated rounding error causes drift. 20% drift over 8 hours.
▪ Eventually (on 2/25/1991 system was on for 100 hours) causes range mis-
estimation sufficiently large to miss incoming missiles. Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
11
Carnegie Mellon
(Binary) Scientific Notation
What are the parts of a number in scientific notation?
1.11011011011012 x 213
Significand
Exponent
What value does the significand always begin with in scientific
notation?
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
12
Carnegie Mellon
Floating Point Representation Numerical Form:
(–1)s M 2E
▪ Sign bit s determines whether number is negative or positive
▪ Significand M normally a fractional value in range [1.0,2.0). ▪ Exponent E weights value by power of two
Encoding
▪ MSB s is sign bit s
▪ exp field encodes E (but is not equal to E)
▪ frac field encodes M (but is not equal to M)
Example:
1521310 = (-1)0 x 1.11011011011012 x 213
s
exp
frac
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
13
Carnegie Mellon
Precision options Single precision: 32 bits
7 decimal digits, 10±38 1 8-bits
Double precision: 64 bits 16 decimal digits, 10±308
1 11-bits
23-bits
s
exp
frac
s
exp
frac
Other formats: half precision, quad precision Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
14
52-bits
Carnegie Mellon
Three “kinds” of floating point numbers
s
exp
frac
1 e-bits
f-bits
00…00
denormalized
normalized
special
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
15
exp ≠ 0 and exp ≠ 11…11
11…11
Carnegie Mellon
“Normalized” Values
When: exp ≠ 000…0 and exp ≠ 111…1
Exponent coded as a biased value: E = exp – Bias
▪ exp: unsigned value of exp field
▪ Bias = 2k-1 – 1, where k is number of exponent bits
▪ Single precision: 127 (exp: 1…254, E: -126…127)
▪ Double precision: 1023 (exp: 1…2046, E: -1022…1023)
Significand coded with implied leading 1: M = 1.xxx…x2 ▪ xxx…x: bits of frac field
▪ Minimum when frac=000…0 (M = 1.0)
▪ Maximum when frac=111…1 (M = 2.0 – ε)
▪ Get extra leading bit for “free” Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
16
v = (–1)s M 2E
Carnegie Mellon
Normalized Encoding Example
Value: float F = 15213.0; ▪ 1521310 = 111011011011012
= 1.11011011011012 x 213
Significand
M = 1.11011011011012
frac=
110110110110100000000002
Exponent
E = 13
Bias = 127
exp = 140 = 100011002
Result:
0 10001100 11011011011010000000000
s exp frac
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
17
v = (–1)s M 2E E =exp–Bias
Carnegie Mellon
Denormalized Values Condition: exp = 000…0
Exponent value: E = 1 – Bias (instead of exp – Bias) (why?) Significand coded with implied leading 0: M = 0.xxx…x2
▪ xxx…x: bits of frac
Cases
▪ exp = 000…0, frac = 000…0
▪ Represents zero value
▪ Note distinct values: +0 and –0 (why?) ▪ exp = 000…0, frac ≠ 000…0
▪ Numbers closest to 0.0 ▪ Equispaced
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
18
v = (–1)s M 2E E = 1–Bias
Carnegie Mellon
Special Values
Condition: exp = 111…1
Case: exp = 111…1, frac = 000…0
▪ Represents value (infinity)
▪ Operation that overflows
▪ Both positive and negative
▪ E.g., 1.0/0.0 = −1.0/−0.0 = +, 1.0/−0.0 = −
Case: exp = 111…1, frac ≠ 000…0 ▪ Not-a-Number (NaN)
▪ Represents case when no numeric value can be determined ▪ E.g., sqrt(–1), − , 0
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
19
Carnegie Mellon
C float Decoding Example float: 0xC0A00000
binary:
Bias = 2k-1 – 1 = 127
1 8-bits
23-bits
E = 129
S = 1 -> negative number
M = 1.010 0000 0000 0000 0000 0000 M = 1 + 1/4 = 1.25
v = (–1)s M 2E =
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
20
v = (–1)s M 2E E =exp–Bias
Carnegie Mellon
C float Decoding Example #1
float: 0xC0A00000
binary: 1100 0000 1010 0000 0000 0000 0000 0000
1 8-bits 23-bits
E = 129
S = 1 -> negative number
M = 1.010 0000 0000 0000 0000 0000 M = 1 + 1/4 = 1.25
v = (–1)s M 2E =
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
21
v = (–1)s M 2E E =exp–Bias
1
1000 0001
010 0000 0000 0000 0000 0000
Carnegie Mellon
C float Decoding Example #1 float: 0xC0A00000
Bias = 2k-1 – 1 = 127 binary: 1100 0000 1010 0000 0000 0000 0000 0000
1
1000 0001
010 0000 0000 0000 0000 0000
1 8-bits 23-bits E =exp– Bias = 129 – 127 = 2 (decimal)
S = 1 -> negative number
M = 1.010 0000 0000 0000 0000 0000
M = 1 + 1/4 = 1.25
v = (–1)s M 2E = (-1)1 * 1.25 * 22 = -5
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
22
v = (–1)s M 2E E =exp–Bias
Carnegie Mellon
C float Decoding Example #2
float: 0x001C0000
binary: 0000 0000 0001 1100 0000 0000 0000 0000
1 8-bits 23-bits
E = 129
S = 1 -> negative number
M = 0.010 0000 0000 0000 0000 0000 M = 1 + 1/4 = 1.25
v = (–1)s M 2E =
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
23
v = (–1)s M 2E E =1–Bias
0
0000 0000
001 1100 0000 0000 0000 0000
Carnegie Mellon
C float Decoding Example #2 float: 0x001C0000
Bias = 2k-1 – 1 = 127 binary: 0000 0000 0001 1100 0000 0000 0000 0000
v = (–1)s M 2E E =1–Bias
0
0000 0000
001 1100 0000 0000 0000 0000
1 8-bits 23-bits
E =1– Bias = 1 – 127 = –126 (decimal) S = 0 -> positive number
M = 0.001 1100 0000 0000 0000 0000 M = 1/8 + 1/16 + 1/32 = 7/32 = 7*2–5
v = (–1)s M 2E = (-1)0 * 7*2–5 * 2–126 = 7*2–131
v ≈ 2.571393892 X 10–39
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
24
Carnegie Mellon
Visualization: Floating Point Encodings
− NaN
−Normalized
−Denorm +Denorm −0 +0
+Normalized
+
NaN
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
25
Carnegie Mellon
Today: Floating Point
Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
Rounding, addition, multiplication
Floating point in C Summary
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
26
Carnegie Mellon
Tiny Floating Point Example
1 4-bits 3-bits
8-bit Floating Point Representation ▪ the sign bit is in the most significant bit
▪ the next four bits are the exp, with a bias of 7 ▪ the last three bits are the frac
Same general form as IEEE Format ▪ normalized, denormalized
▪ representation of 0, NaN, infinity
s
exp
frac
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
27
Carnegie Mellon
v = (–1)s M 2E norm: E = exp – Bias denorm: E = 1 – Bias
Normalized numbers
0 0001 000
0 0001 001
…
0 0110 110 0 0110 111 0 0111 000 0 0111 001 0 0111 010 …
0 1110 110 0 1110 111
-6 8/8*1/64
-6 9/8*1/64
-1 14/8*1/2
-1 15/8*1/2
0 8/8*1
0 9/8*1
0 10/8*1
7 14/8*128
7 15/8*128
= 8/512
= 9/512
= 14/16 = 15/16 =1
= 9/8
= 10/8
= 224 = 240
smallest norm
(-1)0(1+1/8)*2-6
closest to 1 below closest to 1 above
largest norm
0 1111 000 n/a inf
Dynamic Range (s=0 only)
Denormalized numbers
0 0000 000 0 0000 001 0 0000 010 …
0 0000 110
0 0000 111
-6 0
-6 1/8*1/64 = 1/512 -6 2/8*1/64 = 2/512
-6 6/8*1/64 = 6/512
-6 7/8*1/64 = 7/512
closest to zero
(-1)0(0+1/4)*2-6
largest denorm
sexpfracE Value
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
28
Carnegie Mellon
Distribution of Values
6-bit IEEE-like format ▪ e = 3 exponent bits
▪ f = 2 fraction bits ▪ Bias is 23-1-1 = 3
1 3-bits
2-bits
s
exp
frac
Notice how the distribution gets denser toward zero. 8 values
-15 -10 -5 0 5 10 15
Denormalized Normalized Infinity
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
29
Carnegie Mellon
Distribution of Values (close-up view)
6-bit IEEE-like format ▪ e = 3 exponent bits
▪ f = 2 fraction bits ▪ Bias is 3
1 3-bits 2-bits
s
exp
frac
-1 -0.5 0 0.5 1
Denormalized Normalized Infinity
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
30
Carnegie Mellon
Special Properties of the IEEE Encoding FP Zero Same as Integer Zero
▪All bits = 0
Can (Almost) Use Unsigned Integer Comparison ▪ Must first compare sign bits
▪ Must consider −0 = 0
▪ NaNs problematic
▪ Will be greater than any other values
▪ What should comparison yield? The answer is complicated. ▪ Otherwise OK
▪ Denorm vs. normalized ▪ Normalized vs. infinity
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
31
Carnegie Mellon
Quiz Time!
Check out:
https://canvas.cmu.edu/courses/17808
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
32
Carnegie Mellon
Today: Floating Point
Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
Rounding, addition, multiplication
Floating point in C Summary
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
33
Carnegie Mellon
Floating Point Operations: Basic Idea
x +f y = Round(x + y)
x f y = Round(x y)
Basic idea
▪ First compute exact result
▪ Make it fit into desired precision
▪ Possibly overflow if exponent too large ▪ Possibly round to fit into frac
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
34
Carnegie Mellon
Rounding
Rounding Modes (illustrate with $ rounding)
$1.40 $1.60 $1.50 $2.50 –$1.50
▪ Towards zero $1 ▪ Round down (−) $1 ▪ Round up (+) $2 ▪ Nearest Even* (default) $1
$1$1$2–$1 $1$1$2–$2 $2$2$3–$1 $2$2$2–$2
*Round to nearest, but if half-way in-between then round to nearest even
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
35
Carnegie Mellon
Closer Look at Round-To-Even
Default Rounding Mode
▪ Hard to get any other kind without dropping into assembly
▪ C99 has support for rounding mode management ▪ All others are statistically biased
▪ Sum of set of positive numbers will consistently be over- or under- estimated
Applying to Other Decimal Places / Bit Positions ▪ When exactly halfway between two possible values
▪ Round so that least significant digit is even ▪ E.g., round to nearest hundredth
7.8949999 7.8950001 7.8950000 7.8850000
7.89 (Less than half way) 7.90 (Greater than half way) 7.90 (Half way—round up) 7.88 (Half way—round down)
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
36
Carnegie Mellon
Rounding Binary Numbers
Binary Fractional Numbers
▪ “Even” when least significant bit is 0
▪ “Half way” when bits to right of rounding position = 100…2
Examples
▪ Round to nearest 1/4 (2 bits right of binary point)
Value 2 3/32 2 3/16 2 7/8 2 5/8
Binary 10.000112 10.001102 10.111002 10.101002
Rounded Action
10.002 (<1/2—down) 10.012 (>1/2—up) 11.002 ( 1/2—up) 10.102 ( 1/2—down)
Rounded Value 2
2 1/4
3
2 1/2
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
37
Carnegie Mellon
Rounding
1.BBGRXXX
Sticky bit: OR of remaining bits
Round up conditions
▪ Round = 1, Sticky = 1 ➙ > 0.5
▪Guard = 1, Round = 1, Sticky = 0 ➙ Round to even
Guard bit: LSB of result Round bit: 1st bit removed
Fraction
1.0000000 1.1010000 1.0001000 1.0011000 1.0001010 1.1111100
GRS Incr?
000 N 100 N 010 N 110 Y 011 Y 111 Y
Rounded
1.000
1.101
1.000
1.010
1.001
10.000
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
38
Carnegie Mellon
FP Multiplication
(–1)s1 M1 2E1 x (–1)s2 M2 2E2 Exact Result: (–1)s M 2E
▪Sign s: ▪SignificandM: ▪ Exponent E:
s1 ^ s2 M1x M2 E1 + E2
Fixing
▪ If M ≥ 2, shift M right, increment E
▪ If E out of range, overflow
▪ Round M to fit frac precision
Implementation
▪ Biggest chore is multiplying significands
4 bit significand: 1.010*22 x 1.110*23 = 10.0011*25 = 1.00011*26 = 1.001*26
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
39
Carnegie Mellon
Floating Point Addition
(–1)s1 M1 2E1 + (-1)s2 M2 2E2 ▪Assume E1 > E2
Get binary points lined up
Exact Result: (–1)s M 2E ▪Sign s, significand M:
▪ Result of signed align & add ▪Exponent E: E1
(–1)s1 M1 +
E1–E2 (–1)s2 M2
Fixing
▪If M ≥ 2, shift M right, increment E
▪if M < 1, shift M left k positions, decrement E by k ▪Overflow if E out of range
▪Round M to fit frac precision
(–1)s M
1.010*22 + 1.110*23 = (0.1010 + 1.1100)*23 = 10.0110 * 23 = 1.00110 * 24 = 1.010 * 24
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
40
Carnegie Mellon
Mathematical Properties of FP Add
Compare to those of Abelian Group
▪ Closed under addition? Yes
▪ But may generate infinity or NaN
▪ Commutative? Yes ▪ Associative? No
▪ Overflow and inexactness of rounding
▪ (3.14+1e10)-1e10 = 0, 3.14+(1e10-1e10) = 3.14
▪ 0 is additive identity?
▪ Every element has additive inverse?
▪ Yes, except for infinities & NaNs
Monotonicity
▪ a ≥ b ⇒ a+c ≥ b+c?
▪ Except for infinities & NaNs
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Yes Almost
Almost
41
Carnegie Mellon
Mathematical Properties of FP Mult
Compare to Commutative Ring
▪ Closed under multiplication? Yes
▪ But may generate infinity or NaN
▪ Multiplication Commutative? Yes ▪ Multiplication is Associative? No
▪ Possibility of overflow, inexactness of rounding
▪ Ex: (1e20*1e20)*1e-20= inf, 1e20*(1e20*1e-20)= 1e20 ▪ 1 is multiplicative identity? Yes
▪ Multiplication distributes over addition? No
▪ Possibility of overflow, inexactness of rounding
▪ 1e20*(1e20-1e20)= 0.0, 1e20*1e20 – 1e20*1e20 = NaN
Monotonicity
▪ a ≥ b & c ≥ 0 ⇒ a * c ≥ b *c?
▪ Except for infinities & NaNs
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
Almost
42
Carnegie Mellon
Today: Floating Point
Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties
Rounding, addition, multiplication
Floating point in C Summary
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
43
Carnegie Mellon
Floating Point in C
C Guarantees Two Levels ▪ float single precision ▪ double double precision
Conversions/Casting
▪ Casting between int, float, and double changes bit representation ▪ double/float → int
▪ Truncates fractional part
▪ Like rounding toward zero
▪ Not defined when out of range or NaN: Generally sets to TMin
▪ int → double
▪ Exact conversion, as long as int has ≤ 53 bit word size
▪ int → float
▪ Will round according to rounding mode
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
44
Carnegie Mellon
Floating Point Puzzles
For each of the following C expressions, either: ▪ Argue that it is true for all argument values
▪ Explain why not true
• x == (int)(float) x
• x == (int)(double) x
• f == (float)(double) f • d == (double)(float) d • f == -(-f);
• 2/3 == 2/3.0
• d < 0.0 ⇒
• d > f ⇒
• d * d >= 0.0
• (d+f)-d == f
int x = …;
float f = …;
double d = …;
Assume neither dnorfis NaN
((d*2) < 0.0) -f > -d
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
45
Carnegie Mellon
Summary
IEEE Floating Point has clear mathematical properties
Represents numbers of form M x 2E
One can reason about operations independent of implementation
▪ As if computed with perfect precision and then rounded
Not the same as real arithmetic
▪ Violates associativity/distributivity
▪ Makes life difficult for compilers & serious numerical applications programmers
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
46
Carnegie Mellon
Additional Slides
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
47
Carnegie Mellon
Creating Floating Point Number
Steps
▪ Normalize to have leading 1
▪ Round to fit within fraction
▪ Postnormalize to deal with effects of rounding
Case Study
▪ Convert 8-bit unsigned numbers to tiny floating point format
Example Numbers
128 10000000 15 00001101 33 00010001 35 00010011
138 10001010
63 00111111
3-bits
s
exp
frac
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
48
1 4-bits
Carnegie Mellon
s
exp
frac
Normalize
Requirement
▪ Set binary point so that numbers of form 1.xxxxx ▪ Adjust all to have leading one
▪ Decrement exponent as shift left
Value Binary
128 10000000 15 00001101 17 00010001 19 00010011
138 10001010 63 00111111
Fraction Exponent
1.0000000 7 1.1010000 3 1.0001000 4 1.0011000 4 1.0001010 7 1.1111100 5
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
49
1 4-bits
3-bits
Carnegie Mellon
Postnormalize
Issue
▪ Rounding may have caused overflow
▪ Handle by shifting right once & incrementing exponent
Value Rounded Exp
128 1.000 7 15 1.101 3 17 1.000 4 19 1.010 4
138 1.001 7 63 10.000 5
Adjusted
Numeric Result
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
50
128 15 16 20 134 1.000/6 64
Carnegie Mellon
Interesting Numbers
Description exp
Zero 00…00
Smallest Pos. Denorm. 00…00 ▪ Single ≈ 1.4 x 10–45
▪ Double ≈ 4.9 x 10–324
Largest Denormalized 00…00 ▪ Single ≈ 1.18 x 10–38
▪ Double ≈ 2.2 x 10–308
Smallest Pos. Normalized 00…01 ▪ Just larger than largest denormalized
{single,double} frac Numeric Value
00…00 0.0
00…01 2– {23,52} x 2– {126,1022}
11…11 (1.0 – ε) x 2– {126,1022} 00…00 1.0 x 2– {126,1022}
One 01…11 00…00
Largest Normalized 11…10 11…11 ▪ Single ≈ 3.4 x 1038
▪ Double ≈ 1.8 x 10308
1.0
(2.0 – ε) x 2{127,1023}
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition
51