PowerPoint Presentation
Introduction to Computer Systems
More on Numbers:
Representing Real Numbers
Slide #2 of 49
Lecture Objectives
To introduce how real numbers are represented in
computer systems.
To explain some of the limitations of floating point
representations
Slide #3 of 49
Lecture Outline
More on Numbers:
Binary Arithmetic using 2’s Complement
Fixed Point Decimal to Binary
Fixed Point Binary to Decimal
Fixed Point Arithmetic
Floating Point Numbers
Numerical Precision
Slide #4 of 49
Binary Arithmetic – Observations
Add a “sign” bit – assume on the left
Then +5 = 0101 and -5 = 1101
We know that +5 + -5 = 0, so this should hold in binary!
But:
0 1 0 1
1 1 0 1
1 0 0 1 0
We need to be a bit smarter about this …
We need to Fix It!
Slide #5 of 49
Binary Arithmetic – 2’s Complement
In 2’s Complement the convention is:
All positive numbers start with 0
All negative numbers start with 1
Negation is achieved by:
Flipping all the bits
Adding 1 to the least significant (right-most)
Lets see the same example again:
+5 = 0101 and -5 = 1011 (in 2C notation)
0 1 0 1
1 0 1 1
1 0 0 0 0
This carry will be
discarded
How about Binary
Arithmetic? Subtraction
etc.
Slide #6 of 49
How Real Numbers are Represented?
How do we represent real numbers in computer systems?
Two of the common ways to achieve this are:
Fixed Point: binary point is fixed e.g.
1101101.0001001
Floating Point: binary point floats to the right of the
most significant 1 and an exponent is used e.g.
1.1011010001001 x 26
IEEE 754 – https://en.wikipedia.org/wiki/IEEE_754
https://en.wikipedia.org/wiki/IEEE_754
Slide #7 of 49
Fixed Point Decimal-to-Binary
Integer part convert as before (repeated division by 2)
Non-integer part follows opposite process
Repeated multiplication by 2, keeping integer part:
0.537 x 2 = 1.074
0.074 x 2 = 0.148
0.148 x 2 = 0.296
0.296 x 2 = 0.592
0.592 x 2 = 1.184
0.184 x 2 = 0.368
So 0.537
10
= 0.100010
2
0.625?
0.512?
Slide #8 of 49
Fixed Point Binary-to-Decimal
Allocate subset of bits to integer part, and the remainder
to the non-integer part.
For example, 4+4 bits:
1101.0101
1×23 + 1×22 + 0x21 + 1×20 + 0x2-1 +1×2-2 + 0x2-3 + 1×2-4
= 8 + 4 + 0 + 1 + 0.0 + 0.25 + 0.0 + 0.0625
= 13.3125
101.110?
010.001?
Slide #9 of 49
Fixed Point Arithmetic
Everything is the same as for whole numbers
Example: 01001.010 – 00010.100
Take 2C and add:
0 1 0 0 1 0 1 0
+ 1 1 1 0 1 1 0 0
(1) 0 0 1 1 0 1 1 0
00110.101 – 10110.010?
Slide #10 of 49
Decimal Fractions (Base 10)
Slide #11 of 49
Infinite Decimal Fractions (Base 10)
Slide #12 of 49
Binary Fractions (Base 2)
Slide #13 of 49
Floating Point Representation (for fractions)
General principle
– like “scientific notation”, but in binary
Numbers represented as: ±m*2e
Sign (±) Mantissa (m) Exponent (e)
1 bit
0 for +
1 for –
Actual significant digits 2s complement signed binary
Shows where binary point goes
Slide #14 of 49
Choice of Representations
Slide #15 of 49
Floating Point Representation in Java
take bits of m
put “1.” at start, so normalized
move binary point right e places
(or left for negative e)
note – e is stored with an “offset” added to it
S Offset e Mantissa m
Sign
0, 1 for +, –
Exponent
2’s complement
signed binary
Mantissa
Leading mantissa bit (1.) left out
Slide #16 of 49
Java Types for Floating Point
No of Bits
Type Sign Mantissa Exponent Total Bytes
float 1 23 8 32 4
double 1 52 11 64 8
52 bit mantissa : 253 ≈ 8 x 1015
We get 15 significant decimal digits in double data type.
Slide #17 of 49
Java Types for Floating Point
0 1000 0100 010 1010 1000 0000 0000 0000
Sign
0 for +
5 + offset 127 = 132
e + 127
Mantissa
Without the leading 1.
42 5/
8
= 101010.101 = 1.01010101 x 25
0 0111 1011 100 1100 1100 1100 1100 1101
Sign
0 for +
-4 + offset 127 = 123
e + 127
Mantissa
Without the leading 1.
1/
10
= 0.000110011001100…
= 1.10011001100… x 2-4
Rounded Up
Rounding Error!
Slide #18 of 49
Money as Floating Point?
Floating Point value for amount in pounds with pence as
fraction?
Not a good practice!
Pence need infinite binary fractions
e.g. 10p is 0.0001100110011001100…
so we get rounding errors
Always use int or long (or BigInteger) for money.
Slide #19 of 49
Factorial with Double
/**
* Calculate factorial.
* requires: 0 <= n * @param n number whose factorial is to be calculated * @return factorial of n */ public static double dfact(int n){ double a = 1; for (int i = 1; i <= n; i++){ a = a * i; } return a; } n, n! 165, 5.423910666131586E295 166, 9.003691705778433E297 167, 1.5036165148649983E300 168, 2.526075744973197E302 169, 4.2690680090047027E304 170, 7.257415615307994E306 171, Infinity 172, Infinity 173, Infinity 174, Infinity Slide #20 of 49 Accuracy Issues – Even in Double Floating Point n = 170 n! = 7.257415615307994E306 = 7257415615307994E291 Windows calculator says: 7.2574156153079989673967282111293e+306 Floating point arithmetic loses accuracy in least significant digits. Most significant, and overall size, are OK. This digit is wrong! 291 more digits after it Slide #21 of 49 Why is 171! too big? Slide #22 of 49 Floating Point Overflow In Java: If result is too big for datatype, it's a special value POSITIVE_INFINITY Other special values: If too big but negative: NEGATIVE_INFINITY Indistinguishable from 0 but known to be negative: -0.0 Impossible number (e.g. Sqrt(-1)) NaN These special values allow you to check for overflow in a program - unlike the case for integer arithmetic More precisely: Float.POSITIVE_INFINITY or Double.POSITIVE_INFINITY "Not a Number" Slide #23 of 49 Summary – Java Floating Point It normalizes where possible Unnormalised for smallest numbers (offset e = o) Special representations for NaN etc. Details in API for: java.lang.Float.intBitsToFloat java.lang.Double.longBitsToDouble Slide #24 of 49 Numerical Precision Fixed point is convenient and intuitive but has two problems 1)Numerical precision Only values that are an integer multiple of the smallest power of two can be represented exactly 2)Numerical range Increased precision of non-integer part is at the expense of numerical range Floating point representation effectively addresses these issues. Slide #25 of 49 Summary – Numbers Representing numbers in the computer Whole numbers in binary and hexadecimal notation Positive real numbers in fixed-point binary Binary arithmetic is like decimal arithmetic Our lack of binary practice makes it hard! Negative numbers are tricky things But we can use a few of our own tricks – 2C Floating point is an alternative, but is very unnatural for us!