程序代写代做代考 scheme PowerPoint Presentation

PowerPoint Presentation

Number representation

Bit Pattern 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Unsigned 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Sign &

Magnitude
+0 +1 +2 +3 +4 +5 +6 +7 −0 −1 −2 −3 −4 −5 −6 −7

1s

Complement
+0 +1 +2 +3 +4 +5 +6 +7 -7 -6 -5 -4 -3 -2 -1 -0

2s

Complement
+0 +1 +2 +3 +4 +5 +6 +7 -8 -7 -6 -5 -4 -3 -2 -1

Excess-8 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

BCD 0 1 2 3 4 5 6 7 8 9 – – – – – –

Number representation Excess-n
• Excess-n TO Decimal number: convert to decimal, substract the n from the decimal

• Decimal number TO Excess-n: add the n to the decimal and convert result to binary

-3 in Excess-8?

-3 + 8 = 5

5 in unsigned = 0101 = 5 in one-complement = 5 in two-complement = -3 in Excess 8

5 in Excess-8?

5 + 8 = 13

13 in unsigned: 1101 (beyond 2s complement range but positive (shift like a circular linked list
in 2s complement!). No further processing necessary)

-7 in excess-6?

-7 + 6 = -1 = -1 in 1s complement: 0001 -> (negative number bit inversion rule) -> 1110 =

-1 in 2s complement: 1110 + 1 = 1111 = -7 in excess-6

-8 in excess-6?

-8+6 = -2 ->

2 in unsigned: 0010 -> 1s complement = 0010 -> (negative number bit inversion rule) -> 1101

In 2s complement = 1110 = -8 in excess-6

Number representation

Bit Pattern 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Unsigned 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Sign &

Magnitude
+0 +1 +2 +3 +4 +5 +6 +7 −0 −1 −2 −3 −4 −5 −6 −7

1s

Complement
+0 +1 +2 +3 +4 +5 +6 +7 -7 -6 -5 -4 -3 -2 -1 -0

2s

Complement
+0 +1 +2 +3 +4 +5 +6 +7 -8 -7 -6 -5 -4 -3 -2 -1

Excess-8 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

BCD 0 1 2 3 4 5 6 7 8 9 – – – – – –

Excess-6 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 -8 -7

FLOATING POINT

NUMBERS

Bernhard Kainz (with thanks to A. Gopalan, N. Dulay and E.

Edwards)

b.kainz@imperial.ac.uk

Introduction

mailto:b.kainz@imperial.ac.uk

Why do we need this: large, small and

fractional numbers
World population >7, 200, 000, 000 people

One light year 9, 130, 000, 000, 000 km

One solar mass 2, 000, 000, 000, 000, 000, 000, 000, 000, 000 kg

Electron diameter 0.000, 000, 000, 000, 000, 000, 01 m

Electron mass 0.000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 9 kg

Smallest measurable 0.000, 000, 000, 000, 000, 000, 000, 000,
length of time 000, 000, 000, 000, 000, 000, 1 sec

Pi (to 14 decimal places) 3.14159 26535 8979…

Standard rate of VAT 20%

Googol 1 followed by a 100 zeros ☺

Large integers

Example: How can we represent integers up to 30 decimal

digits long?

• Binary: 2𝑋 = 1030 ⇒ 𝑋 = log2(10
30) ≈ 100 bits (1

decimal digit ≈ 3.32 bits)

• BCD: 30 × 4 = 120 bits

• ASCII: 30 × 8 = 240 bits

Floating point numbers

Recall scientific notation:

𝑀 × 10𝐸 Decimal

𝑀 × 2𝐸 Binary

This is the basis for most floating point representation schemes

• M is the coefficient (aka. significand, fraction or mantissa)

• E is the exponent (aka. characteristic)

• 10 (or for binary, 2) is the radix (aka. base)

• No. of bits in exponent determines the range (bigness/smallness)

• No. of bits in coefficient determines the precision (exactness)

Real vs. floating point numbers

Mathematical real Floating point number

Range −∞…+∞ Finite

No. of values (Uncountably) infinite Finite

Spacing ? Gap between numbers

varies

Errors ? Incorrect results are

possible

Some questions (assume signed 3-digit coefficient and a signed 2-

digit exponent as before):

• What are the closest floating point numbers to . 𝟎𝟎𝟏 × 𝟏𝟎−𝟗𝟗 ? What is
the gap between this number and them?

• What about . 𝟎𝟎𝟏 × 𝟏𝟎−𝟓𝟎 ?

Zones of expressibility

• Example: assume numbers are formed with a signed 3-

digit coefficient and a signed 2-digit exponent

• Zones of expressibility:

Normalised floating point numbers

• Depending on how you interpret the coefficient, floating

point numbers can have multiple forms, e.g.:

0.023 × 104 = 0.230 × 103

= 2.3 × 102

= 0.0023 × 105

• For hardware implementations it is desirable for each

number to have a unique floating point representation, a

normalised form

• We’ll normalise coefficients in the range [𝟏, …𝑹) where 𝑹
is the base, e.g.:

1,… , 10 for decimal

1,… , 2 for binary

Normalised forms (base 10)

Number Normalised form

23.2 × 104

Normalised forms (base 10)

Number Normalised form

23.2 × 104 2.32 × 105

Normalised forms (base 10)

Number Normalised form

23.2 × 104 2.32 × 105

−4.01 × 10−3

Normalised forms (base 10)

Number Normalised form

23.2 × 104 2.32 × 105

−4.01 × 10−3 −4.01 × 10−3

Normalised forms (base 10)

Number Normalised form

23.2 × 104 2.32 × 105

−4.01 × 10−3 −4.01 × 10−3

343 000 × 100 3.43 × 105

0.000 000 098 9 × 100

Normalised forms (base 10)

Number Normalised form

23.2 × 104 2.32 × 105

−4.01 × 10−3 −4.01 × 10−3

343 000 × 100 3.43 × 105

0.000 000 098 9 × 100 9.89 × 10−8

Normalised forms (base 2)

Number Normalised form

100.01 × 21 1.0001 × 23

1010.11 × 22 1.01011 × 25

0.00101 × 2−2 1.01 × 2−5

1100101 × 2−2 1.100101 × 24

Binary fractions

Binary Decimal

0.1

Binary fractions

Binary Decimal

0.1 0.5

Binary fractions

Binary Decimal

0.1 0.5

0.01

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111 0.875

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111 0.875

0.011

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111 0.875

0.011 0.375

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111 0.875

0.011 0.375

0.101

Binary fractions

Binary Decimal

0.1 0.5

0.01 0.25

0.001 0.125

0.11 0.75

0.111 0.875

0.011 0.375

0.101 0.625

Binary fraction to decimal fraction

• What is the binary value 0.01101 in decimal?


1

4
+

1

8
+

1

32
=

13

32
= 0.40625


8+4+1

25
=

13

32

• What about 0.000 110 011?

• Answer:
32+16+2+1

29
=

51

512
= 0.099 609 375

𝟐−𝟏 𝟐−𝟐 𝟐−𝟑 𝟐−𝟒 𝟐−𝟓

. 0 1 1 0 1

𝟑𝟐 𝟏𝟔 𝟖 𝟒 𝟐 𝟏

. 0 1 1 0 1

Decimal fraction to binary fraction

• What is the decimal value 0.6875 in binary?

0.6875 =
1.375

2
=
1

2
+
0.375

2
=
1

2
+
0.75

4
=
1

2
+
1.5

8

=
1

2
+
1

8
+
0.5

8
=
1

2
+
1

8
+

1

16

So the answer is 0.1011

• What is the decimal value 0.1 in binary?

0.1 =
1.6

16
=

1

16
+
0.6

16
=

1

16
+
1.2

32
=

1

16
+

1

32
+
0.2

32
=

1

16
+

1

32
+
1.6

256

Floating point multiplication

𝑁1 × 𝑁2 = 𝑀1 × 10
𝐸1 × 𝑀2 × 10

𝐸2

= 𝑀1 ×𝑀2 × 10
𝐸1 × 10𝐸2

= 𝑀1 ×𝑀2 × 10
𝐸1+𝐸2

• That is, we multiply the coefficients and add the

exponents

• Example:

2.6 × 106 × 5.4 × 10−3 = 2.6 × 5.4 × 103

= 14.04 × 103

• We must also normalise the result, so final answer is

1.404 × 104

Truncation and rounding

• For many computations, the result of a floating point

operation is too large to store in the coefficient

• Example (with a 2-digit coefficient):

2.3 × 101 × 2.3 × 101 = 5.29 × 102

• Truncation  5.2 × 102 (biased error)

• Rounding  5.3 × 102 (unbiased error)

Floating point addition

• A floating point addition such as 4.5 × 103 + 6.7 × 102 is not a
simple coefficient addition, unless the exponents are the same.
Otherwise, we need to align them first

𝑁1 +𝑁2 = 𝑀1 × 10
𝐸1 + 𝑀2 × 10

𝐸2

= 𝑀1 +𝑀2 × 10
𝐸2−𝐸1 × 10𝐸1

• To align, choose the number with the smaller exponent and
shift its coefficient the corresponding number of digits to the
right

4.5 × 103 + 6.7 × 102 = 4.5 × 103 + 0.67 × 103

= 5.17 × 103 = 5.2 × 103

(rounded)

Exponent overflow and underflow

• Exponent overflow occurs when the result is too large i.e. when the
result’s exponent > maximum exponent

• Example: if max exponent is 99 then 1099 × 1099 = 10198 (overflow)

To handle overflow, set value as infinity or raise an exception

• Exponent underflow occurs when the result is too small i.e. when
the result’s exponent < smallest exponent • Example: if min exponent is -99 then 10−99 × 10−99 = 10−198 (underflow) To handle underflow, set value as zero or raise an exception Comparing floating point values • Because of the potential for producing inexact results, comparing floating point values should account for close results • If we know the desired magnitude and precision of results, we can adjust for closeness (epsilon). For example: 𝑎 = 𝑏 𝑏 − 𝜖 < 𝑎 < (𝑏 + 𝜖) 𝑎 = 1 1 − 0.000005 < 𝑎 < 1 + 0.000005 0.999995 < 𝑎 < 1.000005 • A more general approach is to calculate closeness of two numbers based on the relative size of the two numbers being compared