Computer Systems Organization
Sessions 2-4 Main Theme Bits, Bytes, and Integers Dr. Jean-Claude Franchitti
New York University
Computer Science Department Courant Institute of Mathematical Sciences
Presentation material partially based on textbook slides Ce Se: A Pae Peece
b Rada Ba ad Dad OHaa
Slides copyright © 2020
1
Agenda
1
2
3
Instructor and Course Introduction Bits, Bytes, and Integers
Summary and Conclusion
2
What is the class about?
Course description and syllabus:
» http://www.nyu.edu/classes/jcf/CSCI-UA.0201-005/
» http://www.cs.nyu.edu/courses/fall20/CSCI-UA.0205–001/ » Accessible via NYU Classes
Textbooks:
» Comper Ssems: A Programmers Perspecie
Randal Bryant and David OHallaron Pearson
ISBN-10: 013409266X, ISBN-13: 978-0134092669, 3rd Edition (03/2015) » Recommended:
» The C Programmning Language, Brian W Kernighan and Dennis Ritchie
Prentice Hall; ISBN-10: 0131103628; ISBN-13: 978-01311103627, 2nd Edition (04/88)
3
Icons / Metaphors
Information
Common Realization Knowledge/Competency Pattern Governance
Alignment
Solution Approach
4
4
Bits, Bytes, and Integer: Detailed Agenda
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
5
Agenda
1
2
3
Instructor and Course Introduction Bits, Bytes, and Integers
Summary and Conclusion
6
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
7
Data Representation
A computer is constructed using digital circuits with two states: on and off
You need to have the ability to translate between different representations to examine the content of the machine
Common number systems: binary, octal, decimal and hexadecimal
8
Bits and Bytes
Representing information as bits
How are bits manipulated?
Type of data: » Integers
» Floating points » others
9
O Fi Se H d e eee daa i a ce?
How do we represent data using electrical signals?
At the lowest level, a computer is an electronic machine
Easy to recognize two conditions:
» presence of a voltage we call this state 1 » absence of a voltage we call this state 0
10
Everything is Bits
Each bit is 0 or 1
By encoding/interpreting sets of bits in various ways
» Computers determine what to do (instructions)
» and represent and manipulate numbers, sets, strings, etc
Why bits? Electronic Implementation
» Easy to store with bistable elements
» Reliably transmitted on noisy and inaccurate wires
0
10
1.1V 0.9V
0.2V 0.0V
11
A Computer is a Binary Digital Machine
• Basic unit of information is the binary digit, or bit.
• Values with more than two states require multiple
bits.
A collection of two bits has four possible states:
00, 01, 10, 11
A collection of three bits has eight possible states: 000, 001, 010, 011, 100, 101, 110, 111
A collection of n bits has 2n possible states.
12
Everything is represented as Bits and Bytes
English: Display the sum of A times B plus C.
High-Level Language
Assembly Language
Operating System
Instruction Set Architecture
Microarchitecture
Digital Logic
Level 5
Level 4
Level 3
Level 2 Level 1 Level 0
C++: (level 5)
cout << (A * B + C);
Assembly Language: (level 4)
mov eax,A
mul B
add eax,C
call WriteInt
Intel Machine Language: (level 2)
A1 00000000
F7 25 00000004
03 05 00000008
E8 00500000
13
George Boole
(1815-1864)
English mathematician and
philosopher
Inventor of Boolean Algebra
Now we can use things like: AND, OR, NOT, XOR, XNOR, NAND, NOR, .
Source: http://history-computer.com/ModernComputer/thinkers/Boole.html
14
Claude Shannon
(19162001)
American mathematician and
electronic engineer
His work is the foundation for using switches (mainly transistors now), and hence binary numbers, to implement Boolean functions
Source: http://history-computer.com/ModernComputer/thinkers/Shannon.html
15
Sil eakig
So, we use transistors to implement logic gates.
Logic gates manipulate binary numbers to implement Boolean
functions.
Boolean functions solve problems.
. Simply Speaking
16
For example, can count in binary
Base 2 Number Representation
» Represent 1521310 as 111011011011012
» Represent 1.2010 as 1.0011001100110011[0011]2
» Represent 1.5213 X 104 as 1.11011011011012 X 213
17
Encoding Byte Values
Byte = 8 bits
» Binary 000000002 to 111111112 » Decimal: 010 to 25510
» Hexadecimal 0016 to FF16
• Base 16 number representation
• Use characters 0 to 9 and A to F
• Write FA1D37B16 in C as 0xFA1D37B
0xfa1d37b
0
0
0000
1
1
0001
2
2
0010
3
3
0011
4
4
0100
5
5
0101
6
6
0110
7
7
0111
8
8
1000
9
9
1001
A
10
1010
B
11
1011
C
12
1100
D
13
1101
E
14
1110
F
15
1111
18
Example Data Representations
C Data Type
char
short
int
long
float double long double pointer
Typical 32-bit 1
2
4
4
4
8
4
Typical 64-bit 1
2
4
8
4
8
x86-64 1
2
4
8
4
8 10/16 8
I
8
19
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
20
Boolean Algebra (and Gates): How to Manipulate Bits
Developed by George Boole in 19th Century » Algebraic representation of logic
• Encode True as 1 and False as 0
And Or
A&B = 1 when both A=1 and B=1 A|B = 1 when either A=1 or B=1
Not Exclusive-Or (Xor)
~A = 1 when A=0 A^B = 1 when either A=1 or B=1, but not both
AND
OR
NOT
21
Boolean Algebra: How to Manipulate Bits (continued)
NAND
The reverse of AND
ABA&B
001 011 101 110
NOR
The reverse of OR
Exclusive-NOR (Xor)
The reverse of XOR
AB A^B
001 010 100 111
AB A|B
001 010 100 110
22
Many Possible Implementations for Gates & Switches
23
Many Possible Implementations for Gates & Switches (continued)
Fluid Switch (http://www.cs.princeton.edu/introcs/lectures/fluid-computer.swf)
24
General Boolean Algebras
Operate on Bit Vectors
» Operations applied bitwise on bit vectors (e.g.
an integer is a bit vector of 4 bytes = 32 bits)
01101001 01101001 01101001
& 01010101 | 01010101 ^ 01010101 ~ 01010101
01000001 01111101 00111100 10101010
01000001 01111101 00111100 10101010
All of the Properties of Boolean Algebra Apply
25
Example: Representing & Manipulating Sets
Representation
» Width w bit vector represents subsets of A {0, , w1} » aj = 1 if j ∈ A
• 01101001 { 0, 3, 5, 6 }
76543210
• 01010101 { 0, 2, 4, 6 }
76543210
Operations on the binary numbers above:
» &
» |
» ^
» ~
Intersection 01000001 Union 01111101 Symmetric difference 00111100 Complement 10101010
{ 0, 6 }
{ 0, 2, 3, 4, 5, 6 } { 2, 3, 4, 5 }
{ 1, 3, 5, 7 }
26
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C » Apply to any integral data type
• long, int, short, char, unsigned
» View arguments as bit vectors » Arguments applied bit-wise
Examples (Char data type)
» ~0x41 = 0xBE
• ~010000012 = 101111102
» ~0x00 = 0xFF
• ~000000002 = 111111112
» 0x69 & 0x55 = 0x41
• 011010012 & 010101012 = 010000012
» 0x69 | 0x55 = 0x7D
• 011010012 | 010101012 = 011111012
27
Contrast: Logic Operations in C
Contrast to Logical Operators
» &&, ||, !
• View 0 as False
• Anything nonzero as True • Always return 0 or 1
• Early termination
Examples (char data type)
» !0x41 == 0x00 » !0x00 == 0x01 » !!0x41 == 0x01
» 0x69 && 0x55 == 0x01 » 0x69 || 0x55 == 0x01
T ANDT TAND T
» p && *p (avoids null pointer access)
28
Contrast: Logic Operations in C
Contrast to Logical Operators
» &&, ||, !
• View 0 as False
• Anything nonzero as True
Watch ot for && s. & (and s. )
• Always return 0 or 1
one of the more common oopsies in • Early termination C programming
Examples (char data type)
» !0x41 == 0x00 » !0x00 == 0x01 » !!0x41 == 0x01
» 0x69 && 0x55 == 0x01 » 0x69 || 0x55 == 0x01
» p && *p (avoids null pointer access)
29
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed » Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
Summary
30
Two Types of Integers
Unsigned
» positive numbers and 0
Signed numbers
» negative numbers as well as positive numbers and 0
31
Encoding Integers
Unsigned w1 B2U(X) xi2i
Two’s Complement w2 B2T(X) xw12w1+xi2i
i0
short int x = 15213;
short int y = -15213; C short 2 bytes long
i0
Sign Bit
Decimal Hex
Binary
x
y
Sign Bit
15213 3B 6D 00111011 01101101
-15213 C4 93 11000100 10010011 I
Inversethebitsand 1 I
» For 2s complement, most significant bit indicates sign • 0 for non-negative
• 1 for negative
32
Unsigned Binary Arithmetic
Base-2 addition just like base-10 » add from right to left, propagating carry
carry
10010 10010 +1001 +1011 11011 11101
10111 + 111
1111 + 1 10000
33
What About Negative Numbers?
People have tried several options:
Sign Magnitude:
000 = +0 Iftthas I
001 = +1 thennegative
010 = +2 011 = +3 100 = -0 101 = -1 110 = -2 111 = -3
One's Complement
000 = +0Compotament
001 = +1 arenegative 010 = +2
011 = +3 100 = -3 101 = -2 110 = -1 111 = -0
Two's Complement
000 = +0 001 = +1 010 = +2 011 = +3 100 = -4 101 = -3 110 = -2 111 = -1
Add 2number 0 • Issues: balance, number of zeros, ease of operations
• Which one is best? Why?
34
Signed Integers
With n bits, we have 2n distinct values.
» assign about half to positive integers and about
half to negative
Positive integers
» just like unsigned: zero in most significant (MS) bit
00101 = 5
Negative integers
» In twos complement form
In general:
» a 0 at the MS bit indicates positive » and a 1 indicates negative
35
T Clee
T cmlemen representation developed to make circuits easy for arithmetic.
» for each positive number (X), assign value to its negative (-X),
such that X + (-X) = 0 with normal addition, ignoring carry out
00101 (5) +11011 (-5)
00000 (0)
01001 (9) +10111 (-9)
00000 (0)
36
T Clee Siged Iege
MS bit is sign bit.
Range of an n-bit number: -2n-1 through 2n-1 1.
» The most negative number (-2n-1) has no positive counterpart.
-23 22 21 20 00000 00011 00102 00113 01004 01015 01106 01117
-23 22 21 20 1000-8 1001-7 1010-6 1011-5 1100-4 1101-3 1110-2 1111-1
37
Ceig Bia (2 C) Decial
1. If MS bit is one (i.e. number is negative), take twos complement to get a positive number.
2. Get the decimal as if the number is unsigned (using power of 2s).
3. If original number was negative, add a minus sign.
n 2n 01 12 24 38
4 16 5 32 6 64 7 128 8 256 9 512
10 1024
38
T Clee Ecdig Eale
positive543 2 I 0
X = 00100111
A
two
= 25+22+21+20 = 32+4+2+1
n 2n 01 12 24 38
4 16 5 32 6 64 7 128 8 256 9 512
10 1024
= 39ten
negative
X = 11100110
D 432 10two
-X = 00011010
= 24+23+21 = 16+8+2
=26
X = -26ten
ten
39
Two-Complement Encoding Example (continued)
x = 15213: 00111011 01101101 y = -15213: 11000100 10010011
Weight
15213
-15213
1 2 4 8
16 32 64
128 256 512
1024
2048
4096
8192
16384 -32768
11 00 14 18 00 1 32 1 64 00 1 256 1 512 00 1 2048 1 4096 1 8192 00 00
11 12 00 00 1 16 00 00 1 128 00 00 1 1024 00 00 00 1 16384 1 -32768
Sum 15213 -15213
40
Shift Operations
Binary
Decimal
Left Shift: x << y
» Shift x left by y positions
• Throw away extra bits on left
• Fill with 0s on right
Right Shift: x >> y
» Shift x right y positions
• Throw away extra bits on right
» type1: Logical shift • Fill with 0s on left
» type 2: Arithmetic shift (covered later)
• Replicate most significant bit on right
Undefined Behavior
» Shift amount < 0 or size of x
Argument x
01100010
<< 3
00010000
Log. >> 2
00011000
Arith. >> 2
00011000
Argument x
10100010
<< 3
00010000
Log. >> 2
00101000
Arith. >> 2
11101000
41
Numeric Ranges
Unsigned Values
»UMin =0 0000
»UMax = 2w1 1111
Twos Complement Values
Values for W = 16
» TMin = 1000
» TMax = 0111
Other Values » Minus 1
1111
2w1 2w1 1
Decimal
Hex
Binary
UMax
65535
FF FF
11111111 11111111
TMax
i 32767 -32768
7F FF
01111111 11111111
TMin
80 00
10000000 00000000
-1
-1
FF FF
11111111 11111111
0
0
00 00
00000000 00000000
42
Values for Different Word Sizes
W 81632 64
UMax 255 TMax 127 TMin -128
Observations » |TMin|
65,535
32,767 -32,768
4,294,967,295
2,147,483,647 -2,147,483,648
18,446,744,073,709,551,615 9,223,372,036,854,775,807 -9,223,372,036,854,775,808
=
• Asymmetric range
» UMax =
TMax+1
2 * TMax + 1
C Programming
#include
ULONG_MAX LONG_MAX LONG_MIN
Values platform specific
43
Range of Signed Integers
The highest bit is reserved for the sign, which limits the range
44
Unsigned & Signed Numeric Values
X B2U(X) B2T(X)
0000 0 0
0001 1 1
0010 2 2
0011 3 3
0100 4 4
0101 5 5
0110 6 6
0111 7 7
1000 8 –8
1001 9 –7
1010 10 –6
1011 11 –5
1100 12 –4
1101 13 –3
1110 14 –2
1111 15 –1
Equivalence
» Same encodings for
nonnegative values
Uniqueness
» Every bit pattern represents
unique integer value
» Each representable integer has unique bit encoding
Can Invert Mappings
» U2B(x) = B2U-1(x)
• Bit pattern for unsigned integer
» T2B(x) = B2T-1(x)
• Bit pattern for twos comp integer
45
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
46
Mapping Between Signed & Unsigned
To Comlemen
Unsigned
T2U
x X ux
Maintain Same Bit Pattern
Unsigned U2T To Comlemen
ux X x
Maintain Same Bit Pattern
Mappings between unsigned and twos complement numbers:
Keep bit representations and reinterpret
T2B
B2U
U2B
B2T
47
Mapping Signed Unsigned
Bits
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Signed
0
1
2
3
4
5
6
7
-8
-7
-6
-5
-4
-3
-2
-1
Unsigned
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Two’sComplament T2U
U2T
48
Mapping Signed Unsigned
Bits
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Signed
0
1
2
3
4
5
6
7
-8
-7
-6
-5
-4
-3
-2
-1
Unsigned
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
=
+/- 16
49
Relation between Signed & Unsigned
To Comlemen
Unsigned
T2U
x X ux
Maintain Same Bit Pattern
T2B
B2U
w–1 0 ux
x
Large negative weight
becomes
Large positive weight
+
+
+
•••
+
+
+
–
+
+
•••
+
+
+
50
Conversion Visualized
2s Comp. Unsigned » Ordering Inversion
» Negative Big Positive
UMax UMax – 1
’s Complement Range
00 –1
–2
TMax + 1 TMax TMax
Unsigned Range
TMin
51
Signed vs. Unsigned in C
Constants
» By default are considered to be signed integers
» Unsigned if have U as suffix 0U, 4294967259U
Casting (i.e., changing the type of a variable)
» Explicit casting between signed & unsigned same as U2T and T2U int tx, ty;
unsigned ux, uy;
tx = (int) ux;
uy = (unsigned) ty;
» Implicit casting also occurs via assignments and procedure calls tx = ux;
uy = ty;
52
General Rule for Casting: Signed <-> Unsigned
Follow these two steps:
1. Keep the bit presentation 2. Re-interpret
Effect:
Numerical value may change. Bit pattern stays the same.
53
Casting Surprises
Expression Evaluation
» If there is a mix of unsigned and signed in single expression,
signed values implicitly cast to unsigned
» Including comparison operations <, >, ==, <=, >=
» Examples for W = 32: Constant1
00
-1 -1
-1 -1 21474836241747483647 2147483624174U7483647U -1 -1 (unsigned)-1
TMIN = -2,147,483,648 ,
Constant2 0U0U
00
0U0U
-214-72418437644873-6148
-21-427148734684376-418
-2 -2
-2
TMAX = 2,147,483,647
(unsigned) -1
-2
Relation ==
< >
> <
> >
< >
Evaluation
unsigned signed unsigned signed unsigned signed unsigned unsigned signed
2147483647
2147483647
2147483647
2147483647
2147483648U
2147483648U
(int) 2147483648U (int) 2147483648U
54
Casting Surprises (continued)
If there is an expression that has many types, the compiler follows these rules
#include
int i = -7; unsigned j = 5;
Condition is TRUE!
7 converttounsign
if( i > j )
printf(Surprise!\n”);
}
Unsign positive to sign negative
positive to
55
Sa: Caig Siged Uiged: Baic Rle
Bit pattern is maintained
But reinterpreted
Can have unexpected effects: adding or subtracting 2w
Expression containing signed and unsigned int
» int is cast to unsigned!!
56
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
57
Sign Extension
Task:
» Given w-bit signed integer x
» Convert it to w+k-bit integer with same value
Rule:
» Convert signed: Make k copies of sign bit: » X= xw1,,xw1,xw1,xw2,,x0
k copies of MSB » Convert unsigned:
• pad k 0 bits in front
X
w
X
•••
•••
•••
kw
•••
58
Sign Extension Example
short int x = 15213
int ix = (int) x;
short int y = -15213;
int iy = (int) y;
Decimal
Hex
Binary
x
15213
3B 6D
positive to
00111011 01101101 00000000 00000000 00111011 01101101
ix
15213
00 00 3B 6D
y
-15213
C4 93
11000100 10010011
11111111 11111111 11000100 10010011
negative11
iy
-15213
FF FF C4 93
Converting from smaller to larger integer data type C automatically performs sign extension
59
Summary: Expanding, Truncating: Basic Rules
Expanding (e.g., short int to int) » Unsigned: zeros added
» Signed: sign extension
» Both yield expected result
Truncating (e.g., unsigned to unsigned short) » Unsigned/signed: bits are truncated
» Result reinterpreted
» Unsigned: mod operation
» Signed: similar to mod
» For small numbers yields expected behavior
» Example: from int to short (i.e. from 32-bit to 16-bit)
• High-order bits are truncated
• Value is alteredmust reinterpret
• This non-intuitive behavior can lead to buggy code!So dont do it!
60
Today: Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting
Representations in memory, pointers, strings
Summary
61
Negation: Complement & Increment
The complement of x satisfies TwosComp(x) + x = 0
TwosComp(x) = ~x + 1
Proof sketch
» Observation: ~x + x == 1111111 == -1 ~x + x + 1 = 0
(~x + 1) + x = 0
TwosComp(x) + x = 0
x + ~x
-1
10011101 01100010
1111 11111111
62
Unsigned Addition
Operands: w bits True Sum: w+1 bits
Discard Carry: w bits
u ••• +v ••• u+v •••
UAddw(u , v) • • •
Standard Addition Function » Ignores carry output
Implements Modular Arithmetic
s = UAddw(u,v) = u+v mod2w
Hardware Rules for addition/subtraction
• The hardware must work with two operands of the same length.
• The hardware produces a result of the same length as the operands.
• The hardware does not differentiate between signed and unsigned.
63
Unsigned Subtraction
When subtracting A B, convert B to its two’s complement
Add A to (B)
0 1 0 10 0 1 0 10
0 1 0 1 1 1 0 1 00 1 1 1 11
Adanage for 2 complemen: No two 0s
Sign bit
Remove the need for separate circuits for add
and sub
64
Unsigned Subtraction Examples
1000101 0000101 —
0000101
_________
1000000
1000101
_________
0111010
???
65
Visualizing (Mathematical) Integer Addition
Integer Addition
»4-bit integers u, v
»Compute true sum Add4(u , v)
»Values increase linearly with u and v
»Forms planar surface
Add4(u , v)
Integer Addition
32 28 24
20 16 12
8 4 0
14
v
12
10 8
6 4
0246 2
u81012 0 14
66
Visualizing Unsigned Addition
Wraps Around »If true sum 2w » At most once
True Sum
2w+1
Overflow
Overflow
UAdd4(u , v)
16 14 12
10 8
2w 0
14 6 12
4
10
Modular Sum
28v 6
0
0246 2
u 81012 0 14
4
67
T Clee Addii
Operands: w bits u
•••
•••
•••
True Sum: w+1 bits Discard Carry: w bits
+v u+v
TAddw(u , v)
TAdd and UAdd have Identical Bit-Level Behavior
» Signed vs. unsigned addition in C: int s, t, u, v;
s = (int) ((unsigned) u + (unsigned) v); t=u+v
» Willgive s==t
If sum 2w1, becomes negative (positive overflow) If sum < 2w1, becomes positive (negative overflow)
•••
68
TAdd Overflow
Functionality
» True sum
requires w+1 bits
» Drop off MSB
» Treat remaining bits as 2s comp. integer
0 ... 0 ... 0 ... 1 ... 1 ...
True Sum
2w–1 2w –1–1
0 –2w –1
–2w
PosOver
TAdd Result
... ... ...
NegOver
69
Vialiig 2 Clee Addii
NegOver
Values
» 4-bit twos comp.
» Range from -8 to +7
Wraps Around
» If sum 2w1
• Becomes negative
• At most once
» If sum < 2w1
• Becomes positive
• At most once
TAdd4(u , v)
8 6 4
2 0
6 -2 4
-4 -6 -8
2 0
-2 -4
PosOver
-8 -6 -4 -2 -6
v
02 -8
u4
6
70
Multiplication
Goal: Computing Product of w-bit numbers x, y » Either signed or unsigned
But, exact results can be bigger than w bits
» Unsigned: up to 2w bits
• Result range: 0 x * y (2w 1) 2 = 22w 2w+1 + 1
» Twos complement min (negative): Up to 2w-1 bits
• Result range: x * y (2w1)*(2w11) = 22w2 + 2w1
» Twos complement max (positive): Up to 2w bits, but only for (TMinw)2
• Result range: x * y (2w1) 2 = 22w2 So, maintaining exact results
» would need to keep expanding word size with each product computed
» is done in software, if needed
• e.g., by arbitrary precision arithmetic packages
71
Unsigned Multiplication in C
Operands: w bits
True Product: 2*w bits
*
••• •••
u
v
•••
u · v
•••
•••
UMultw(u , v) Standard Multiplication Function
» Ignores high order w bits
Implements Modular Arithmetic UMultw(u,v) = u ·v mod2w
Discard w bits: w bits
72
Signed Multiplication in C
u
v
TMultw(u , v)
Standard Multiplication Function
» Ignores high order w bits
» Some of which are different for signed vs. unsigned multiplication
» Lower bits are the same
•••
•••
•••
Operands: w bits
True Product: 2*w bits
*
••• •••
u·v Discard w bits: w bits
73
Power-of-2 Multiply with Shift
Operation
» u << k gives u * 2k
» Both signed and unsigned
k
u •••
Operands: w bits TrueProduct:w+k bits
Discard k bits: w bits
Examples »u<<3
*
2k
0 ••• 010 ••• 00
u·2k
UMultw(u , 2k)
TMultw(u , 2k)
••• •••
0 ••• 0 0
== u*8 »(u<<5)(u<<3)== u*24
UX 23
0 •••
0 0
4 23 » Most machines shift and add faster than multiply
4 25
• Compiler generates this code automatically
74
Unsigned Power-of-2 Divide with Shift
Quotient of Unsigned by Power of 2 »u>>kgives u/ 2k
» Uses logical shift
Operands:
Division: Result:
u •••
Binary Point
•••
k
•••
0 ••• 010 ••• 00
/ 2k u / 2k
••• .
0 ••• 0 0
u/2k 0 ••• 0 0 •••
Division Computed Hex
Binary
x
x >> 1
x >> 4
x >> 8 59.4257813 59
3B 6D 00111011 01101101 1D B6 00011101 10110110 03 B6 00000011 10110110 00 3B 00000000 00111011
15213 15213 7606.5 7606 950.8125 950
75
Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
76
Arithmetic: Basic Rules
Addition:
» Unsigned/signed: Normal addition followed by truncate,
same operation on bit level
» Unsigned: addition mod 2w
• Mathematical addition + possible subtraction of 2w
» Signed: modified addition mod 2w (result in proper range)
• Mathematical addition + possible addition or subtraction of 2w
Multiplication:
» Unsigned/signed: Normal multiplication followed by truncate,
same operation on bit level
» Unsigned: multiplication mod 2w
» Signed: modified multiplication mod 2w (result in proper range)
77
Why Should I Use Unsigned?
Dn use without understanding implications
» Easy to make mistakes unsigned i;
for (i = cnt-2; i >= 0; i–)
a[i] += a[i+1];
» Can be very subtle
#define DELTA sizeof(int)
int i;
for (i = CNT; i-DELTA >= 0; i-= DELTA)
.. .
78
Counting Down with Unsigned
Proper way to use unsigned as loop index unsigned i;
for (i = cnt-2; i < cnt; i--)
a[i] += a[i+1];
See Robert Seacord, Secure Coding in C and C++
» C Standard guarantees that unsigned addition will behave like
modular arithmetic • 01UMax
Even better size_t i;
for (i = cnt-2; i < cnt; i--)
a[i] += a[i+1];
» Data type size_t defined as unsigned value with length = word size
» Code will work even if cnt = UMax
» What if cnt is signed and < 0?
79
Why Should I Use Unsigned? (cont.)
Do Use When Performing Modular Arithmetic » Multiprecision arithmetic
Do Use When Using Bits to Represent Sets » Logical right shift, no sign extension
80
Today: Bits, Bytes, and Integers
Representing information as bits
Bit-level manipulations
Integers
» Representation: unsigned and signed
» Conversion, casting
» Expanding, truncating
» Addition, negation, multiplication, shifting » Summary
Representations in memory, pointers, strings
81
Byte-Oriented Memory Organization
•••
Programs refer to data by address
» Conceptually, envision it as a very large array of bytes • In reality, its not, but can think of it that way
» An address is like an index into that array • and, a pointer variable stores an address
Note: system provides private address spaces to each process » Think of a process as a program being executed
» So, a program can clobber its own data, but not that of others
82
Machine Words
Any given computer has a Word Size » Nominal size of integer-valued data
• and of addresses
» Until recently, most machines used 32 bits (4 bytes) as word size • Limits addresses to 4GB (232 bytes)
» Increasingly, machines have 64-bit word size
• Potentially, could have 18 EB (exabytes) of addressable memory
• Thats 18.4 X 1018
» Machines still support multiple data formats
• Fractions or multiples of word size
• Always integral number of bytes
83
Word-Oriented Memory Organization
Addresses Specify Byte Locations
» Address of first byte in word
» Addresses of successive words differ by 4 (32-bit) or 8 (64-bit)
32-bit 64-bit Words Words
Bytes Addr.
0000
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
Addr =
0000 ??
Addr =
0008 ??
Addr =
0000 ??
Addr =
0004 ??
Addr =
0008 ??
Addr =
0012 ??
84
Example Data Representations
C Data Type
Typical 32-bit
Typical 64-bit
x86-64
char
1
1
1
short
2
2
2
int
4
4
4
long
O
4
O
8
8
float
4
4
4
double
8
8
8
long double
10/16
pointer
4
8
8
85
Byte Ordering
So, how are the bytes within a multi-byte word ordered in memory?
Conventions
» Big Endian: Sun, PPC Mac, Internet
• Least significant byte has highest address
» Little Endian: x86, ARM processors running Android, iOS, and Windows
• Least significant byte has lowest address
86
Byte Ordering Example
Example
» Variable x has 4-byte value of 0x01234567 » Address given by &x is 0x100
Big Endian Little Endian
0x100 0x101 0x102 0x103
0x100 0x101 0x102 0x103
01
23
45
67
67
45
23
01
87
Representing Integers
int A = 15213;
Littleendian BigEndian IA32, x86-64 Sun
long int C = 15213;
IA32 x86-64 Sun
00
00
3B
6D
6D
6D
3B
3B
00
00
00
00
00
00
00
00
6D
3B
00
00
00
00
3B
6D
int B = -15213;
IA32, x86-64 Sun
93
C4
FF
FF
FF
FF
C4
93
Twos complement representation
88
Decimal: 15213
Binary: 0011 1011 0110 1101 Hex: 3B6D
Examining Data Representations
Code to Print Byte Representation of Data
» Casting pointer to unsigned char * allows treatment as a byte array
Printf directives:
%p: Print pointer
%x: Print Hexadecimal
typedef unsigned char *pointer;
void show_bytes(pointer start, size_t len){
size_t i;
for (i = 0; i < len; i++)
printf(%p\t0x%.2x\n",start+i, start[i]); printf("\n");
}
89
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((pointer) &a, sizeof(int));
Result (Linux x86-64):
int a = 15213; 0x7fffb7f71dbc 6d 0x7fffb7f71dbd 3b 0x7fffb7f71dbe 00 0x7fffb7f71dbf 00
90
Representing Pointers
int B = -15213;
int *P = &B;
Sun IA32 x86-64
EF
FF
FB
2C
AC
28
F5
FF
3C
1B
FE
82
FD
7F
00
00
Different compilers & machines assign different locations to objects Even get different results each time run program
91
Representing Strings
Strings in C
» Represented by array of characters
» Each character encoded in ASCII format
• Standard 7-bit encoding of character set
• Character 0 has code 0x30 digit0 has 0 30 Digit i has code 0x30+i HUH 0 has 0 00
» String should be null-terminated • Final character = 0
Compatibility
» Byte ordering not an issue
IA32 Sun
31
31
38
38
32
32
31
31
33
33
00
00
92
char S[6] = "18213";
Integer C Puzzles
• x<0
• ux>=0
• x&7==7
• ux>-1
• x > y
• x*x>=0
• x>0&&y>0 -> x+y>0
-> ((x*2)<0)
unsigned 70
Initialization
• x>=0
• x<=0
• (x|-x)>>31 == -1 • ux>>3==ux/8 • x>>3==x/8
• x&(x-1)!=0
-> -x<=0 -> -x>=0
-> (x<<30)<0 -> -x < -y
int x = foo();
int y = bar();
unsigned ux = x;
unsigned uy = y;
93
Application of Boolean Algebra
Applied to Digital Systems by Claude Shannon
» 1937 MIT Masters Thesis
» Reason about networks of relay switches • Encode closed switch as 1, open switch as 0
A&~B
A ~B
~A B
~A&B
Connection when A&~B | ~A&B
= A^B
94
Binary Number Property
Claim
w = 0: » 1 = 20
... w-1 = 2w
Assume true for w-1:
» 1 + 1 + 2 + 4 + 8 + + 2w-1 + 2w =
= 2w
2w + 2w
= 2w+1
w-1 1+ å2i
i=0
= 2w
95
Code Security Example
/* Kernel memory region holding user-accessible data */ #define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len);
return len;
}
Similar to code found in FreeBSDs implementation of getpeername
There are legions of smart people trying to find vulnerabilities in programs
96
Typical Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(%s\n, mbuf);
}
97
Malicious Usage
/* Declaration of library function memcpy */
void *memcpy(void *dest, void *src, size_t n);
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
not0k
. .. }
run
98
Mathematical Properties
Modular Addition Forms an Abelian Group » Closed under addition
0 UAddw(u,v) 2w 1 » Commutative
UAddw(u , v) = UAddw(v , u) » Associative
UAddw(t, UAddw(u , v)) = UAddw(UAddw(t, u ), v) » 0 is additive identity
UAddw(u , 0) = u
» Every element has additive inverse
• Let UCompw(u) =2w u UAddw(u , UCompw (u )) = 0
99
Mathematical Properties of TAdd
Isomorphic Group to unsigneds with UAdd » TAddw(u , v) = U2T(UAddw(T2U(u ), T2U(v)))
• Since both have identical bit patterns
Twos Complement Under TAdd Forms a Group » Closed, Commutative, Associative, 0 is additive identity » Every element has additive inverse
TCompw (u) u u TMinw
TMin u TMin ww
100
Characterizing TAdd
Tf
Functionality
» True sum requires w+1
bits
» Drop off MSB
» Treat remaining bits as 2s comp. integer
Positive Overflow
TAdd(u , v) > 0
v
u + v + 2w 1 2w
TAddw(u,v) u+v
w1 u+v22w
u + v TMin
TMaxw u+v (PosOver)
Negative Overflow
<0
<0
> 0
u
w (NegOver) TMinw u+vTMaxw
101
Negation: Complement & Increment
Claim: Following Holds for 2s Complement ~x + 1 == -x NOTX 1I X
Complement
» Observation: ~x + x == 1111111 == -1
x + ~x
-1
1
0
0
1
1
1
0
1
0
1
1
0
0
0
1
0
1
1
1
1
1
1
1
1
Complete Proof?
102
Complement & Increment Examples
x = 15213
Decimal
Hex
Binary
x
15213
3B 6D
00111011 01101101
~x
-15214
C4 92
11000100 10010010
~x+1
-15213
C4 93
11000100 10010011
y
-15213
C4 93
11000100 10010011
x= 0
Decimal
Hex
Binary
0
0
00 00
00000000 00000000
~0
-1
FF FF
11111111 11111111
~0+1
0
00 00
00000000 00000000
103
Code Security Example #2
SUN XDR library
» Widely used library for transferring data between machines
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);
ele_src
malloc(ele_cnt * ele_size)
104
XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) {
/*
* Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src
*/
void *result = malloc(ele_cnt * ele_size);
if (result == NULL)
/* malloc failed */
return NULL;
void *next = result;
int i;
for (i = 0; i < ele_cnt; i++) {
/* Copy object i to destination */
memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */
next += ele_size;
}
return result;
}
105
XDR Vulnerability
malloc(ele_cnt * ele_size)
What if:
» ele_cnt
» ele_size » Allocation
= 220 + 1
= 4096 = 212 = ??
How can I make this function secure?
106
Compiled Multiplication Code
C Function
long mul12(long x)
{
return x*12;
}
Compiled Arithmetic Operations Explanation
leaq (%rax,%rax,2), %rax salq $2, %rax
C compiler automatically generates shift/add code when multiplying by constant
t <- x+x*2 return t << 2;
107
Compiled Unsigned Division Code
C Function
unsigned long udiv8
(unsigned long x)
{
return x/8;
}
Compiled Arithmetic Operations
shrq $3, %rax
Uses logical shift for unsigned
For Java Users
» Logical shift written as >>>
Explanation
# Logical shift
return x >> 3;
108
Signed Power-of-2 Divide with Shift
Quotient of Signed by Power of 2
» x>>kgives x/ 2k
» Uses arithmetic shift
» Rounds wrong direction when u < 0
Operands:
Division: Result:
y
y >> 1
y >> 4
y >> 8 -59.4257813
k
x •••
•••
Binary Point
•••
/ 2k x / 2k
RoundDown(x / 2k)
0 ••• 0 1 0 ••• 0 0
0 •••
0 •••
••• . •••
Binary
Division -15213
Computed -15213 -7607 -951 -60
Hex
-7606.5 -950.8125
C4 93 11000100 10010011 E2 49 11100010 01001001 FC 49 11111100 01001001 FF C4 11111111 11000100
109
Correct Power-of-2 Divide
Quotient of Negative Number by Power of 2 » Want x / 2k (Round Toward 0)
» Compute as (x+2k-1)/ 2k
• In C: (x + (1<
• Biases dividend toward 0
Case 1: No rounding
Dividend:
Divisor:
u +2k –1
/ 2k u / 2k
k
1 ••• 0•••00 0 ••• 001 ••• 11
1 ••• 1•••11 0 ••• 010 ••• 00
Binary Point
Biasing has no effect
10 ••• 111 ••• .1 ••• 11
110
Correct Power-of-2 Divide (Cont.)
Case 2: Rounding
Dividend: x
k
1 •••
0 ••• 001 ••• 11
1 ••• •••
Incremented by 1
0 ••• 010 ••• 00 01 • • • 1 1 1 • • • .
+2k –1
•••
Divisor:
/ 2k x / 2 k
Binary Point
• • •
Biasing adds 1 to final result Incremented by 1
111
Compiled Signed Division Code
C Function
long idiv8(long x)
{
return x/8; }
Compiled Arithmetic Operations
Explanation
testq %rax, %rax
js L4 L3:
sarq $3, %rax
ret L4:
addq $7, %rax jmp L3
if x < 0
x += 7;
# Arithmetic shift
return x >> 3;
Uses arithmetic shift for int
For Java Users
» Arith. shift written as >>
112
Arithmetic: Basic Rules
Unsigned ints, 2s complement ints are isomorphic rings: isomorphism = casting
Left shift
» Unsigned/signed: multiplication by 2k » Always logical shift
Right shift
» Unsigned: logical shift, div (division + round to zero) by 2k
» Signed: arithmetic shift
• Positive numbers: div (division + round to zero) by 2k
• Negative numbers: div (division + round away from zero) by 2k Use biasing to fix
113
Properties of Unsigned Arithmetic
Unsigned Multiplication with Addition Forms Commutative Ring
» Addition is commutative group
» Closed under multiplication 0 UMultw(u,v) 2w 1
» Multiplication Commutative UMultw(u , v) = UMultw(v , u)
» Multiplication is Associative
UMultw(t, UMultw(u , v)) = UMultw(UMultw(t, u ), v)
» 1 is multiplicative identity UMultw(u , 1) = u
» Multiplication distributes over addtion
UMultw(t, UAddw(u , v)) = UAddw(UMultw(t, u ), UMultw(t, v))
114
Peie f T C. Aiheic
Isomorphic Algebras
» Unsigned multiplication and addition • Truncating to w bits
» Twos complement multiplication and addition • Truncating to w bits
Both Form Rings
» Isomorphic to ring of integers mod 2w
Comparison to (Mathematical) Integer Arithmetic » Both are rings
» Integers obey ordering properties, e.g., u>0 u+v>v
u > 0, v > 0 u · v > 0
» These properties are not obeyed by twos comp. arithmetic TMax + 1 == TMin
15213 * 30426 == -10030 (16-bit words)
115
Reading Byte-Reversed Listings
Disassembly
» Text representation of binary machine code
» Generated by program that reads the machine code
Example Fragment
Address Instruction Code
8048365: 5b
8048366: 81 c3 ab 12 00 00
804836c: 83 bb 28 00 00 00 00
Assembly Rendition
add cmpl
pop %ebx
$0x12ab,%ebx
$0x0,0x28(%ebx)
Deciphering Numbers » Value:
» Pad to 32 bits:
» Split into bytes:
» Reverse:
0x12ab 0x000012ab 00 00 12 ab ab 12 00 00
116
Agenda
1
2
3
Instructor and Course Introduction Bits, Bytes, and Integers
Summary and Conclusion
117
Assignments & Readings
Readings
» Slides and Handouts posted on the course web site
» Syllabus, make sure you understand the rules associated with grading, due dates, lateness, etc.
» Textbook: Chapter 2.1-2.3 Assignment #1 / Lab #1
» TBD in later sessions
Development environment
» Setup as per session 2
» Create GitHub account
» See Announcements regarding invitation to join the course GitHub and related instructions to access the materials for the course and recitations (including private versions of them that you will work on during recitations)
118
Next Session: Bits, Bytes, and Floats
119
Questions, Comments, Discussions ?
120
Misc: Large Measurements
Kilobyte (KB), 210 bytes
Megabyte (MB), 220 bytes Gigabyte (GB), 230 bytes Terabyte (TB), 240 bytes Petabyte
Exabyte
Zettabyte
Yottabyte
121
Misc: Powers of 2 and 16
1
1
1
1
1
1
1
1
27 26 25 24 23 22 21 20
122
Misc: Translating Binary to Decimal
Weighted positional notation shows how to calculate the decimal value of each binary bit:
dec = (Dn-1 2n-1) + (Dn-2 2n-2) + … + (D1 21) + (D0 20)
D = binary digit
binary 00001001 = decimal 9: (1 23) + (1 20) = 9
123
Misc: Translating Unsigned Decimal to Binary
Repeatedly divide the decimal integer by 2. Each remainder is a binary digit in the translated value:
37 = 100101
124
Misc: Translating Binary to Hexadecimal
Each hexadecimal digit corresponds to 4 binary bits.
Example: Translate the binary integer 000101101010011110010100 to hexadecimal:
125
Misc: Converting Hexadecimal to Decimal
Multiply each digit by its corresponding power of 16:
dec = (D3 163) + (D2 162) + (D1 161) + (D0 160) Hex 1234 equals (1 163) + (2 162) + (3
161) + (4 160), or decimal 4,660
Hex 3BA4 equals (3 163) + (11 * 162) + (10 161) + (4 160), or decimal 15,268
126
Misc: Converting Hexadecimal to Decimal – Examples
decimal 422 = 1A6 hexadecimal
127
Misc: Binary and Hexadecimal Additions
carry: 1
(4) + (7)
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
0
0
0
0
1
0
1
1
bitposition: 7 6 5 4 3 2 1 0
11
36 28 28 6A 42 45 58 4B 78 6D 80 B5
1
C6 75 A2 47 24 2E
Divide the sum of two digits by the number base (16). The quotient becomes the carry value, and the remainder is the sum digit.
When a borrow is required from the digit to the left, add 10h to the current digit’s value:
(11)
Starting with the LSB, add each pair of digits, include the carry if present.
Important skill: Programmers frequently add and subtract the addresses of variables and instructions. So if the address of var1 is 00400020. The address of the next variable after var1 is 0040006A. How many bytes are used by var1?
128
Misc: More on Characters
Character sets
» Standard ASCII (0 127) » Extended ASCII (0 255) » ANSI (0 255)
»Unicode (065,535)
Null-terminated String
» Array of characters followed by a null byte
ASCII table are available in textbooks and on the Internet
129