CSCI 2021: Binary Floating Point Numbers
Last Updated:
Wed Oct 6 01:22:12 PM CDT 2021
1
Logistics
Reading Bryant/O’Hallaron
▶ Ch 2.4-5: Floats, Wed/Fri
▶ 2021 Quick Guide to GDB
▶ Next week: Ch 3.1-7: Assembly Intro
Goals this Week
▶ Discuss Bitwise ops (Integer Slides)
▶ Floating Point layout
▶ gdb introduction
Feedback Survey
▶ Open on Canvas
▶ Anonymous: be honest!
▶ Due Fri 10/08 for 1 EP 60% response rate so far
Labs/HW
▶ Lab05: Bit operations
▶ HW05: Bits, Floats, GDB
Projects
▶ P1: Ontime on Mon, Last Call on Wed
▶ P2: Release Thursday
2
Don’t Give Up, Stay Determined!
▶ If Project 1 / Exam 1 went awesome, count yourself lucky
▶ If things did not go well, Don’t Give Up
▶ Spend some time contemplating why things didn’t go well,
talk to course staff about it, learn from any mistakes
▶ There is a LOT of semester left and plenty of time to recover
from a bad start
3
Parts of a Fractional Number
The meaning of the “decimal point” is as follows:
123.40610 =1×102 +2×101 +3×100+ 123 = 100+20+3 4×10−1 +0×10−2 +6×10−3 0.406= 4 + 6
10 1000
=123.40610
Changing to base 2 induces a “binary point” with similar meaning:
110.1012 =1×22 +1×21 +0×20+ 6 = 4+2 1×2−1 +0×2−2 +1×2−3 0.625 = 21 + 18
=6.62510
One could represent fractional numbers with a fixed point e.g. ▶ 32 bit fractional number with
▶ 10 bits left of Binary Point (integer part)
▶ 22 bits right of Binary Point (fractional part)
BUT most applications require a more flexible scheme
4
Scientific Notation for Numbers
“Scientific” or “Engineering” notation for numbers with a fractional part is
Standard 123.456 50.01 3.14159 0.54321 0.00789
Scientific 1.23456 × 102 5.001 × 101 3.14159 × 100 5.4321 × 10-1 7.89 × 10-3
printf(“%.4e”,x); 1.2346e+02 5.0010e+01 3.1416e+00 5.4321e-01 7.8900e-03
▶ Always includes one non-zero digit left of decimal place ▶ Has some significant digits after the decimal place
▶ Multiplies by a power of 10 to get actual number
Binary Floating Point Layout Uses Scientific Convention
▶ Some bits for integer/fractional part ▶ Some bits for exponent part
▶ All in base 2: 1’s and 0’s, powers of 2
5
Conversion Example
Below steps convert a decimal number to a fractional binary number equivalent then adjusts to scientific representation.
float fl = -248.75;
76543210 -1-2 -248.75 = -(128+64+32+16+8+0+0+0).(1/2+1/4)
= -11111000.11 *2^0
76543210 12
= -1111100.011 *2^1
6543210 123
= -111110.0011 *2^2
543210 1234
…
MANTISSA EXPONENT
= -1.111100011 * 2^7
0 123456789
Mantissa ≡ Significand ≡ Fractional Part
6
Principle and Practice of Binary Floating Point Numbers
▶ In early computing, computer manufacturers used similar principles for floating point numbers but varied specifics
▶ Example of Early float data/hardware
▶ Univac: 36 bits, 1-bit sign, 8-bit exponent, 27-bit significand1 ▶ IBM: 32 bits, 1-bit sign, 7-bit exponent, 24-bit significand2
▶ Manufacturers implemented circuits with different rounding behavior, with/without infinity, and other inconsistencies
▶ Troublesome for reliability: code produced different results on different machines
▶ This was resolved with the adoption of the IEEE 754 Floating Point Standard which specifies
▶ Bit layout of 32-bit float and 64-bit double ▶ Rounding behavior, special values like Infinity
▶ Turing Award to for his work on the standard
1Floating Point Arithmetic 2IBM Hexadecimal Floats
7
IEEE 754 Format: The Standard for Floating Point
float double
Property
Total bits
Bits for sign (1 neg / 0 pos)
Bits for Exponent multiplier (power of 2) Bits for Fractional part or mantissa Decimal digits of accuracy3
32 1 8 23 7.22
64 1 11 52 15.95
▶ Most
numbers in hardware to do arithmetic: processor has physical circuits to add/mult/etc. for this bit layout of floats
commonly implemented format for floating point ▶ Numbers/Bit Patterns divided into three categories
Category
Normalized Denormalized Special
Description
most common like 1.0 and -9.56e37 very close to zero and 0.0 extreme/error values like Inf and NaN
Exponent
mixed 0/1 all 0’s
all 1’s
3Wikipedia: IEEE 754
8
Example float Layout of -248.75: float_examples.c
Source: IEEE-754 Tutorial, www.puntoflotante.net
Color: 8-bit blocks, Negative: highest bit, leading 1
Exponent: high 8 bits, 27 encoded with bias of -127
1000_0110 – 0111_1111
Fractional/Mantissa portion is
1.111100011…
^ |||||||||
| explicit low 23 bits
|
implied leading 1
not in binary layout
= 128+4+2
= 134 – 127
= 7
– 127
9
Normalized Floating Point: General Case
▶ A “normalized” floating point number is in the standard range for float/double, bit layout follows previous slide
▶ Example:-248.75=-1.111100011*2^7 Exponent is in Bias Form (not Two’s Complement)
▶ Unsigned positive integer minus constant bias number
▶ Consequence: exponent of 0 is not bitstring of 0’s
▶ Consequence: tiny exponents like -125 close to bitstring of 0’s; this makes resulting number close to 0
▶ 8-bit exponent 1000 0110 = 128+4+2 = 134 so exponent value is 134 – 127 = 7
Integer and Mantissa Parts
▶ The leading 1 before the binary point is implied so does not show up in the bit string
▶ Remaining fractional/mantissa portion shows up in the low-order bits
10
Fixed Bit Standards for Floating Point IEEE Standard Layouts
Kind
float double
Sign Bit
31 (1) 63 (1)
Exponent
Bits
30-23 (8 bits) 62-52 (11 bits)
Bias Exp Range
-127 -126 to +127 -1023 -1022 to +1023
Mantissa
Bits
22-0 (23 bits) 51-0 (52 bits)
Standard allows hardware to be created that is as efficient as possible to do calculation on these numbers
Consequences of Fixed Bits
▶ Since a fixed # of bit is used, some numbers cannot be exactly represented, happens in any numbering system:
▶ Base 10 and Base 2 cannot represent 13 in finite digits ▶ Base 2 cannot represent 1 in finite digits
10
float f = 0.1;
printf(“0.1 = %.20e\n”,f);
0.1 = 1.00000001490116119385e-01
Try show_float.c to see this in action
11
Exercise: Quick Checks
1. What distinct parts are represented by bits in a floating point number (according to IEEE)
2. What is the “bias” of the exponent for 32-bit floats
3. Represent 7.125 in binary using “binary point” notation
4. Lay out 7.125 in IEEE-754 format
5. What does the number 1.0 look like as a float?
Source: IEEE-754 Tutorial, www.puntoflotante.net
The diagram above may help in recalling IEEE 754 layout
12
Special Cases: See float_examples.c Denormalized values: Exponent bits all 0
▶ Fractional/Mantissa portion evaluates without implied leading one, still an unsigned integer though
▶ Exponent is Bias + 1: 2-126 for float
▶ Result: very small numbers close to zero, smaller than any
other representation, degrade uniformly to 0
▶ Zero: bit string of all 0s, optional leading 1 (negative zero);
Special Values
▶ Infinity: exponent bits all 1, fraction all 0, sign bit indicates +∞ or −∞
▶ Infinity results from overflow/underflow or certain ops like float x = 1.0 / 0.0;
▶ #include
▶ NaN: not a number, exponent bits all 1, fraction has some 1s
▶ Errors in floating point like 0.0 / 0.0
13
Other Float Notes
Approximations and Roundings
▶ Approximate 23 with 4 digits, usually 0.6667 with standard rounding in base 10
▶ Similarly, some numbers cannot
be exactly represented with fixed
number of bits: 1 approximated 10
▶ IEEE 754 specifies various rounding modes to approximate numbers
▶ IEEE 754 allows floating point numbers to sort using signed integer sorting routines
▶ Bit patterns for float follows are ordered nearly the same as bit patterns for signed int
▶ Integer comparisons are usually fewer clock cycles than floating comparisons
Source: XKCD #217
14
Sidebar: The Weird and Wonderful Union
▶ Bitwise operations like & are not valid for float/double
▶ Can use pointers/casting to get around this OR…
▶ Use a union: somewhat unique construct to C
// union.c
typedef union { // shared memory
▶
▶ BUT fields occupy the same
int main(){
flint_t flint;
flint.in = 0xC378C000;
printf(“%.4f\n”, flint.fl);
printf(“%08x %d\n”,flint.in,flint.in);
for(int i=0; i<4; i++){
unsigned char c = flint.ch[i];
printf("%d: %02x '%c'\n",i,c,c);
}
Defined like a struct with several fields
memory location (!?!)
▶ Allows one to treat a byte position as multiple different types, ex: int / float / char[]
▶ Memory size of the union is the max of its fields
}
|Symbol |-------------------+-------+------|
float fl;
int in;
char ch[4];
} flint_t;
// an float
// a int
// char array
// 4 bytes total
| flint.ch[3]
| flint.ch[2]
| flint.ch[1]
| flint.in/fl/ch[0] |i
|#1027|0xC3| |#1026|0x78| |#1025|0xC0| | #1024 | 0x00 | |#1020|? |
|Mem | Val|
15
Floating Point Operation Efficiencies
▶ Floating Point Operations per Second, FLOPS is a major measure for numerical code/hardware efficiency
▶ Often used to benchmark and evaluate scientific computer resources, (e.g. top super computers in the world)
▶ Tricky to evaluate because of
▶ A single FLOP (add/sub/mul/div) may take 3 clock cycles to
finish: latency 3
▶ Another FLOP can start before the first one finishes:
pipelined
▶ Enough FLOPs lined up can get average 1 FLOP per cycle
▶ FP Instructions may automatically operate on multiple FPs
stored in memory to feed pipeline: vectorized ops
▶ Generally referred to as superscalar
▶ Processors schedule things out of order too
▶ All of this makes micro-evaluation error-prone and pointless
▶ Run a real application like an N-body simulation and compute
FLOPS = number of floating ops done time taken in seconds
16
Top 5 Super Computers Worldwide, June 2021
Rank System
1 Fugaku, Japan / Fujitsu
Fujitsu A64FX 2.2GhZ (Arm)
2 Summit United States
IBM POWER9 22C 3.07GHz (Power)
3 Sierra United States
IBM POWER9 22C 3.1GHz (Power)
4 Light China Sunway SW26010
(custom RISC)
5 Perlmutter, United States AMD EPYC 2.45GHz, Cray (x86-64)
#Cores 7,630,848
2,414,592
1,572,480
10,649,600
706,304
Rmax (TFlop/s) 442,010.0
148,600.0
94,640.0
93,014.6
64,590.0
Rpeak (TFlop/s) 537,212.0
200,794.9
125,712.0
125,435.9
89,794.5
Power (kW) 29,899
10,096
7,438
15,371
2,528
https://www.top500.org/lists/top500/2020/06/
17
Top 5 Super Computers Worldwide, Nov 2020
Rank System
1 Fugaku, Japan / Fujitsu
Fujitsu A64FX 2.2GhZ (Arm)
2 Summit United States
IBM POWER9 22C 3.07GHz (Power)
3 Sierra United States
IBM POWER9 22C 3.1GHz (Power)
4 Light China Sunway SW26010
(custom RISC)
5 Selene USA, NVIDIA/AMD AMD EPYC 7742 64C 2.25GHz (x86-64)
#Cores 7,299,072
2,397,824
1,572,480
10,649,600
555,520
Rmax (TFlop/s) 415,530.0
143,500.0
94,640.0
93,014.6
63,460.0
Rpeak (TFlop/s) 513,854.7
200,794.9
125,712.0
125,435.9
79,215.0
Power (kW) 28,335
10,096
7,438
15,371
2,646
https://www.top500.org/lists/top500/2020/06/
18
Top 5 Super Computers Worldwide, June 2020
Rank System
1 Fugaku, Japan / Fujitsu
Fujitsu A64FX 2.2GhZ (Arm)
2 Summit United States
IBM POWER9 22C 3.07GHz (Power)
3 Sierra United States
IBM POWER9 22C 3.1GHz (Power)
4 Light China Sunway SW26010
(custom RISC)
5 Tianhe-2A China Intel Xeon 2.2GHz (x86-64)
#Cores 7,299,072
2,397,824
1,572,480
10,649,600
4,981,760
Rmax (TFlop/s) 415,530.0
143,500.0
94,640.0
93,014.6
61,444.5
Rpeak (TFlop/s) 513,854.7
200,794.9
125,712.0
125,435.9
100,678.7
Power (kW) 28,335
10,096
7,438
15,371
18,482
https://www.top500.org/lists/top500/2020/06/
19
Top 5 Super Computers Worldwide, Nov 2019
Rank System
1 Summit United States
IBM POWER9 22C 3.07GHz
2 Sierra United States
IBM POWER9 22C 3.1GHz,
3 Light China Sunway MPP
4 Tianhe-2A China Xeon 2.2GHz
5 Frontera, United States Dell 6420, Xeons 2.7GHz
#Cores 2,397,824
1,572,480 10,649,600 4,981,760 448,448
Rmax (TFlop/s) 143,500.0
94,640.0 93,014.6 61,444.5 23,516.4
Rpeak (TFlop/s) 200,794.9
125,712.0 125,435.9 100,678.7 38,745.9
Power (kW) 9,783
7,438 15,371 18,482 ??
https://www.top500.org/list/2019/11/
20
Top 5 Super Computers Worldwide, Nov 2018
Rank System
1 Summit United States
IBM POWER9 22C 3.07GHz
2 Sierra United States
IBM POWER9 22C 3.1GHz,
3 Light China Sunway MPP
4 Tianhe-2A China TH-IVB-FEP Cluster
5 Switzerland
Cray XC50, Xeon E5-2690v3
#Cores 2,397,824
1,572,480 10,649,600 4,981,760 387,872
Rmax (TFlop/s) 143,500.0
94,640.0 93,014.6 61,444.5 21,230.0
Rpeak (TFlop/s) 200,794.9
125,712.0 125,435.9 100,678.7 27,154.3
Power (kW) 9,783
7,438 15,371 18,482 2,384
https://www.top500.org/list/2018/11/
21
Top 5 Super Computers Worldwide, Nov 2017
Rank System
1 Light China
Sunway MPP
2 Tianhe-2 (MilkyWay-2) China TH-IVB-FEP Cluster
3 Switzerland Cray XC50
4 Gyoukou Japan ZettaScaler-2.2 HPC system
5 Titan USA Cray XK7
Rmax #Cores (TFlop/s)
10,649,600 93,014.6 3,120,000 33,862.7 361,760 19,590.0 19,860,000 19,135.8 560,640 17,590.0
Rpeak (TFlop/s) 125,435.9
54,902.4 25,326.3 28,192.0 27,112.5
Power (kW) 15,371
17,808 2,272 1,350 8,209
https://www.top500.org/lists/2017/11/
22