代写代考 CSU11022 – Introduction to Computing II

4.1: Floating Point Numbers
CSU11022 – Introduction to Computing II
Dr / School of Computer Science and Statistics
D.A.Patterson, J.L.Hennessy, “Computer Organisation and Design: ARM Edition”, Morgan-Kaufmann, 2016.

(Section 3.5: Floating Point, available in the Library, doesn’t have to be the ARM Edition)

// some really small numbers and one large number
float [] vals = {
3.7e-5f, 4.8e-5f, 1.7e-5f, 2.4e-5f,
3.7e-5f, 4.8e-5f, 1.7e-5f, 2.4e-5f,
3.7e-5f, 4.8e-5f, 1.7e-5f, 2.4e-5f,
3.7e-5f, 4.8e-5f, 1.7e-5f, 2.4e-5f,
float result;
// add the numbers first-to-last
result = 0;
for (int i = 0; i < vals.length; i++) { result += vals[i]; System.out.println("sum first-to-last: " + String.format("%.8f",result)); // output: sum first-to-last: 12345.00097656 // add the numbers last-to-first result = 0; for (int i = vals.length - 1; i >= 0; i–) {
result += vals[i];
System.out.println(“sum last-to-first: ” + String.format(“%.8f”,result));
// output: sum first-to-last: 12345.00000000

Binary number representation
32-bits … 232 unique values that we can use to represent different things
e.g. unsigned integers
0 … 232–1 (or 0 … 4,294,967,295)
e.g. signed integers using 2’s complement
–231 … 0 … +231–1 (or –2,147,483,648 … 0 … +2,147,483,647)
How do we represent real numbers like 21⁄2 or 3.14159265… ? Also, how do we represent values with really large or really small
magnitudes?
e.g. 2.2 x 1011 e.g. 1.3 x 10–8
Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023

Scientific notation – decimal
The values 2.2 x 1011 and 1.3 x 10-8 are examples of (normalized) scientific notation in decimal form
Values expressed in normalised scientific notation satisfy the condition:
𝟣 ≤ |𝖿| < 𝟣𝟢 Normalized scientific notation give us one canonical form in which to express a value using scientific notation and allows quick, visual comparison of magnitude As computer scientists, we avoid expressing the same thing in different ways (a==b?) 𝟥𝟩𝟤.𝟫𝟪 𝟥𝟩.𝟤𝟫𝟪×𝟣𝟢𝟣 𝟥𝟩𝟤𝟫.𝟪×𝟣𝟢−𝟣 𝟥.𝟩𝟤𝟫𝟪×𝟣𝟢𝟤 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Binary Floating-Point Numbers Convert the following binary numbers to decimal numbers with fractions 10010101 = 1x27 + 1x24 + 1x22 + 1x20 = 149 1.1 101000.01 = 1x20 + 1x2-1 = 11⁄2 = 1.5 = 1x25 + 1x23 + 1x2-2 = 401⁄4 = 40.25 Convert the following decimal numbers to binary floating point numbers 0.75 x 2 = 1.5 0.5 x 2 = 1.0 7.75 = 111.11 2.1 = 10.000110011001100 ... 0.3125 x 2 0.625 x 2 0.25 x 2 0.5 x 2 = 0.625 = 1.25 = 0.5 9.3125 = 1001.0101 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Scientific notation – binary Like decimal values, we can express binary values using scientific notation (again, in normalized form) 1010.1 = 1.0101 x 23 0.00101 = 1.01 x 2-3 The general form is again: and in normalised form, f satisfies the following condition: 12 ≤ | f | < 102 5.7510 = 101.112 × 20 = 1.01112 × 22 The normalized form of a binary number expressed using scientific notation forms the basis for its representation in a computer Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 4.2: IEEE-754 CSU11022 – Introduction to Computing II Dr / School of Computer Science and Statistics https://www.h-schmidt.net/FloatConverter/IEEE754.html IEEE 754 Floating-Point representation Use a different interpretation of a 32-bit value to represent floating point numbers, e.g. IEEE 754 exponent (e) fraction (f) 31 30 23 22 How can we represent ... ... positive and negative values? ... values with positive and negative exponents? Where is the binary (radix) point? (−𝟣)𝗌 ×𝖿×𝟤𝖾 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Sign of Exponent and Fraction 0 ⇒ positive floating-point number 1 ⇒ negative floating-point number Positive and negative exponents? Option 1: 2’s Complement exponents Option 2: Biased exponents Subtract a constant bias (b = 127) from stored exponent to obtain signed exponent exponent (e) fraction (f) 31 30 23 22 (−1)s × f × 2e−b Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Storing the fraction The following two representations are of the same value (3.510) 31 30 23 22 0 +1.11 × 2(128 - 127) = 11.12 = 3.510 31 30 23 22 0 +0.111 × 2(129 - 127) = 11.12 = 3.510 (same value!) We don’t want multiple representations of the same value! if (a == b) ... 1.1100000000000000000000 0.11100000000000000000000 Storing floating-point numbers in normalized form avoids this problem: 𝟣𝟤 ≤ |𝖿| < 𝟣𝟢𝟤,so𝖿isintheform𝟣.𝖽𝖽𝖽𝖽𝖽... Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Normalization and the Hidden Bit With normalisation ... becomes ... 0.0101 x 2-4 1.0100 x 2-6 adjust fraction so there is a single 1 to left of radix point compensate by adjusting exponent accordingly If there is always going to be a 1 to the left of the radix point, we don’t need to store it! Increases precision by one bit! Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Final IEEE 754 Floating-Point Representation exponent (e) fraction (f) 31 30 23 22 (−𝟣)𝗌 ×𝟣.𝖿×𝟤(𝖾−𝖻) https://www.h-schmidt.net/FloatConverter/IEEE754.html Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 4.3: IEEE-754 Examples CSU11022 – Introduction to Computing II Dr / School of Computer Science and Statistics https://www.h-schmidt.net/FloatConverter/IEEE754.html 1.25 = 1.012 (already normalised) e = 0 + 127 = 127 = 011111112 f = 1.012 or .012 after removing the hidden bit 0 01111111 01000000000000000000000 0011 1111 1010 0000 0000 0000 0000 0000 3FA00000 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 10.75 = 1010.112 x 20 = 1.010112 x 23 e = 3 + 127 = 130 = 100000102 f = 1.010112 or .010112 after removing the hidden bit 0 10000010 01011000000000000000000 0100 0001 0010 1100 0000 0000 0000 0000 412C0000 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 -0.125 = -0.0012 x 20 = 1.02 x 2-3 e = -3 + 127 = 124 = 011111002 f = 1.02 or .02 after removing the hidden bit 1 01111100 00000000000000000000000 1011 1110 0000 0000 0000 0000 0000 0000 BE000000 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 decode 0x414a0000 0 10000010 10010100000000000000000 s = 0 (positive) e = 130 (𝟤𝟣𝟥𝟢−𝟣𝟤𝟩 = 𝟤𝟥) f = 1.100101 (after adding the hidden bit) +1.100101 x 23 = +1100.101 = +12.625 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Special Values Special bit patterns, e.g. Zero (±) 00000000000000000000000 Infinity (±) 00000000000000000000000 Not a Number (NaN) ??????????????????????? ( != 0 ) Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Single and Double Precision 32-Bit Single Precision (bias = 127) exponent (e) fraction (f) 31 30 23 22 0 64-Bit Double Precision (bias = 1023) exponent (e) fraction (f) fraction (f) Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 4.4: Floating Point addition CSU11022 – Introduction to Computing II Dr / School of Computer Science and Statistics https://www.h-schmidt.net/FloatConverter/IEEE754.html Floating Point Addition We can add the fractions of two floating point values if their exponents are the same If their exponents are not the same to begin, shift the fraction of the value with the smaller exponent to compensate 1.01101 x 23 + 1.00110 x 2-2 = 1.01101 x 23 + 0.0000100110 x 23 = 1.0111000110 x 23 1.0110100000 + 0.0000100110 1.0111000110 Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Compare the exponents of the two numbers; shift the fraction of the smaller number to the right until its exponent would match the larger exponent Add the fractions Normalize the result by either shifting right and incrementing the exponent or shifting left and decrementing the exponent Round the fraction to the appropriate number of bits Overflow / Underflow Still Normalised ? Exception! 1.5 + 1.75 = 3.25 A = 0x3fc00000 (1.5) B = 0x3fe00000 (1.75) 0 01111111 10000000000000000000000 0 01111111 11000000000000000000000 sefsef e 01111111 (127-127=0, 20) f 1.10000 (remember hidden bit!) 1.100000 x 20 A e 01111111 (127-127=0, 20) f 1.110000 (remember hidden bit!) 1.110000 x 20 B 11.010000 x 20 Result (not normalised) 1.1010000 x 21 Result (normalised) 0 10000000 10100000000000000000000 (encoding s e f) 0x40500000 (3.25) Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 1.5 + 10.5 = 12.0 A = 0x3fc00000 (1.5) B = 0x41280000 (10.5) 0 01111111 10000000000000000000000 0 10000010 01010000000000000000000 sefsef e 01111111 (127-127=0, 20) f 1.10000 (remember hidden bit!) e 10000010 (130-127=3, 23) f 1.0101000 (remember hidden bit!) 0.0011000 x 23 A (adjust fraction so exponents are equal) 1.0101000 x 23 B 1.1000000 x 23 Result (already normalised) 0 10000010 10000000000000000000000 (encoding s e f) 0x41400000 (12.0) Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 Floating Point Addition What about adding negative values (S==1)? Proceed as before but before adding, the fractions of values with S==1 should be converted to their 2’s Compliment Trinity College Dublin, The University of Dublin © / Trinity College Dublin 2015-2023 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts