Programming Meets Hardware
High-Level Language Program
Compiler
Assembly Language Program
Assembler
Machine Language Program
#include
int x, y, temp;
x=1; y=2;
temp =x; x=y; y=temp; printf(“%d %d %d\n”,x,y,temp);
}
How do you get performance?
7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01
00 00 00 f0 82 04 08 34 00 00 00 c4 0c 00 00 00 00 00 00 34 00
movl $1, -8(%ebp)
movl $2, -12(%ebp)
movl -8(%ebp), %eax
movl %eax, -16(%ebp)
movl -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -16(%ebp), %eax
movl %eax, -12(%ebp)
movl -16(%ebp), %eax ISA movl %eax, 12(%esp)
movl -12(%ebp), %eax movl %eax, 8(%esp) movl -8(%ebp), %eax movl %eax, 4(%esp)
Performance with Programs
(1) Program: Data structures + algorithms (2) Compiler translates code
(3) Instruction set architecture
(4) Hardware Implementation
Rutgers University David Menendez 3
Instruction Set Architecture
(1) Set of instructions that the CPU can execute
(1) What instructions are available?
(2) How the instructions are encoded? Eventually everything is binary.
(2) State of the system (Registers + memory state + program counter)
(1) What instruction is going to execute next
(2) How many registers? Width of each register?
(3) How do we specify memory addresses?
● Addressing modes
(3) Effect of instruction on the state of the system
Rutgers University David Menendez 4
IA32 (X86 ISA)
There are many different assembly languages because they are processor-specific
■
IA32 (x86)
● x86-64 for new 64-bit processors
● IA-64 radically different for Itanium processors
● Backward compatibility: instructions added with time
■ ■
We will focus on IA32/x86-64 because you can generate and run on iLab machines (as well as your own PC/laptop)
■
PowerPC MIPS
IA32 is also dominant in the market although smart phone, eBook readers, etc. are changing this
Rutgers University David Menendez 5
X86 Evolution
8086 – 1978 – 29K transistors – 5-10MHz
I386 – 1985 – 275K transistors – 16-33 MHz Pentium4 – 2005 – 230M transistors – 2800-3800 MHz Haswell – 2013 – > 2B transistors – 3200-3900 MHz
Added features
• Large caches
• Multiple cores
• Support for data parallelism (SIMD) eg AVX extensions
Rutgers University David Menendez 6
CISC vs RISC
CISC: complex instructions : eg X86
• Instructions such as strcpy/AES and others • Reduces code size
• Hardware implementation complex?
RISC: simple instructions: eg Alpha • Instructions are simple add/ld/st
• Increases code size
• Hardware implementation simple?
Rutgers University David Menendez 7
Aside About Implementation of x86
About 30 years ago, the instruction set actually reflected the processor hardware
■
As hardware advanced, industry faced with choice
■ ■
E.g., the set of registers in the instruction set is actually what was present in the processor
Change the instruction set: bad for backward compatibility
Keep the instruction set: harder to exploit hardware advances
● Example: many more registers but only small set introduced circa 1980
Starting with the P6 (PentiumPro), IA32 actually got implemented by Intel using an “interpreter” that translates IA32 instructions into a simpler “micro” instruction set
Rutgers University David Menendez 8
P6 Decoder/Interpreter
Assembly Programming
Brief tour through assembly language programming Why?
■ ■
Why not binary language?
■ ■
Machine interface: where software meets hardware
To understand how the hardware works, we have to understand the interface that it exports
Much easier for humans to read and reason about
Major differences:
● Human readable language instead of binary sequences ● Relative instead of absolute addresses
Rutgers University David Menendez 10
Assembly Programmer’s View
CPU
Memory
ALU
Control Logic
Condition Codes
PC
Registers
(OS code & data) Object Code Program Data
Addresses
Data
Instructions
CPU
Memory
Memory
Address
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Command: R/W
Data
Addresses
Memory Access: Read
CPU
Memory
03
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
R
01010101
Addresses
Memory Access: Write
CPU
Memory
04
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
W
01010101
Addresses
Memory Access: Write
CPU
Memory
04
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
W
01010101
Addresses
C
S
AB
Processor: ALU & Registers
C = FS(A, B)
F includes
Arithmetic: +, -, *, /, ~, etc. Logical: <, >, =, etc.
ALU
Registers
Name Storage
R0 R1 R2 R3
00101100
10001000
11111111
01010101
Multiple Ports
Name Command: R/W Data
Putting It All Together
CPU
Memory
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Condition Codes
Program Counter (PC)
Control Logic
ALU
Registers
Addresses
Putting It All Together
CPU
Memory
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Condition Codes
Program Counter (PC)
1
ALU
Control Logic
Registers
Addresses
Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
1
ALU
Registers
Control Logic
10001000
1
Addresses
Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
+
ALU
Control Logic
1
Registers
R0: x R1: y
R0 R1
10001000
1
Addresses
Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
+
ALU
x
1
y
Registers
R0: x R1: y
R0 R1
Control Logic
10001000
1
Addresses
Putting It All Together
CPU
Memory
Condition Codes
Program Counter (PC)
z
+
ALU
x
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
2
y
Registers
R0: x R1: y
R2: z
R0 R1
Control Logic
R2
10001000
1
Addresses
Sample C Code
int accum;
int sum(int x, int y)
{
gcc -O1 -m32 –S code.c
Generated Assembly
sum:
push %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
addl %eax, accum
popl %ebp
ret
}
int t = x + y;
accum += t;
return t;
Rutgers University
David Menendez 23
C & Assembly Code
Sample C Code
int accum;
int sum(int x, int y){
int t = x + y;
accum += t;
return t;
objdump –d code.o
C & Machine Code
} 3:
6: 03 45 08
gcc -O1 -m32 –c code.c 9:
gdb code.o
(gdb) x/100xb sum
f: 5d 10: c3
pop %ebp ret
0x8
0x10
0000000
0: 55
1: 89 e5
push %ebp
mov %esp,%ebp
mov 0xc(%ebp),%eax add 0x8(%ebp),%eax
8b 45 0c
01 05 00 00 00 00 add %eax, accum
Rutgers University David Menendez 24
Assembly Characteristics
Sequence of simple instructions
Minimal Data Types
■
■ ■
No type checking
■ ■
“Integer” data of 1, 2, or 4 bytes
● Data values
● Addresses (untyped pointers)
Floating point data of 4, 8, or 10 bytes
No aggregate types such as arrays or structures
● Just contiguously allocated bytes in memory
Interpretation of data format depends on instruction No protection against misinterpretation of data
Rutgers University David Menendez 25
Assembly Characteristics
3 types of Primitive Operations
■ ■
■
Perform arithmetic function on register or memory data
Transfer data between memory and register
● Load data from memory into register ● Store register data into memory
Transfer control
● Unconditional jumps to/from procedures ● Conditional branches
Rutgers University David Menendez 26
x86 Characteristics
Variable length instructions: 1-15 bytes
Can address memory directly in most instructions
Uses Little-Endian format (Least significant byte in the lowest address)
Rutgers University David Menendez 27
General format:
opcode operands
Opcode:
■
● movb,addl, etc. Operands:
■ ■
Example:
■
Instruction Format
Short mnemonic for instruction’s purpose
Immediate, register, or memory
Number of operands command-dependent
movl %ebx, (%ecx)
Rutgers University David Menendez 28
Machine Representation
Remember, each assembly instruction translated to a sequence of 1-15 bytes
Opcode
addressing mode
other bytes
First, the binary representation of the opcode Second, instruction specifies the addressing mode
■ ■
Some instructions can be single-byte because operands and addressing mode are implicitly specified by the instruction
■
The type of operands (registers or register and memory) How to interpret the operands
E.g., pushl
Rutgers University David Menendez 29 29
x86 Registers
General purpose registers are 32 bit
■
Originally categorized into two groups with different functionality
■
■
Now, the registers are mostly interchangeable
Segment registers
■
Although operations can access 8-bits or 16-bits portions
Data registers (EAX, EBX, ECX, EDX)
● Holds operands
Pointer and Index registers (EBP, ESP, EIP,ESI,EDI)
● Holds references to addresses as well as indexes
Holds starting address of program segments
● CS, DS, SS, ES
Rutgers University David Menendez 30 30
x86 Registers
16 BITS
8 BITS
EAX AX
AH
AL
ECX CX
CH
CL
EDX DX
DH
DL
EBX BX
BH
BL
ESP –Stack Pointer
EBP — Base register of current stack frame
ESI — Source index for string operations
EDI — Destination index for string operations
32 BITS Rutgers University David Menendez
31
x86 Programming
• Mov instructions to move data from/to memory
•
• Addressing modes
• Understanding swap
• Arithmetic operations
• Condition codes
• Conditional and unconditional branches • Loops and switch statements
Operands and registers
Rutgers University David Menendez 32
Byte: 8 bits
■
Word: 16 bits (2 bytes)
■
E.g., char
E.g., short int
Data Format
Double Word: 32 bits ( 4 bytes)
■
Quad Word: 64 bits (8 bytes)
■
Instructions can operate on any data size
■ movl, movw, movb
● Move double word, word, byte, respectively
■
E.g., int, float
E.g., double
End character specifies what data size to be used
Rutgers University David Menendez 33
MOV instruction
Most common instruction is data transfer instruction
■ Mov SRC, DEST: Move source into destination
■ SRC and DEST are operands
■ DEST is a register or a location
■ SRC can be the contents of register, memory location, constant, or a label.
■ If you use gcc, you will see movl
■ Alltheinstructionsinx86are32-bit
Used to copy data:
■ Constant to register (immediate)
■ Memory to register
■ Register to memory
■ Register to register
Cannot copy memory to memory in a single instruction
Rutgers University
David Menendez
34 34
Immediate Addressing
Operand is immediate
■ ■ ■ ■
Operand value is found immediately following the instruction Encoded in 1, 2, or 4 bytes
$ in front of immediate operand E.g., movl $0x4040, %eax
ADDR
000f
00A1 00A2
memory
movl %eax
4040
Rutgers University David Menendez
35
Register Mode Addressing
Use % to denote register
■
Source operand: use value in specified register
Destination operand: use register as destination for value
Examples:
■
■
■
E.g., %eax
movl %eax, %ebx
● Copy content of %eax to %ebx
movl $0x4040, %eax ! immediate addressing ● Copy 0x4040 to %eax
movl %eax, 0x0000f !Absolute addressing
● Copy content of %eax to memory location 0x0000f
Rutgers University David Menendez 36 36
Indirect Mode Addressing
Content of operand is an address
■
Offset can be specified as immediate mode
Examples:
■
■
Designated as parenthesis around operand
movl (%ebp), %eax
● Copy value from memory location whose address is in ebp into eax
movl -4(%ebp), %eax
● Copy value from memory location whose address is -4 away from content of ebp into eax
Rutgers University David Menendez 37
Indexed Mode Addressing
Add content of two registers to get address of operand
■
■
Useful for dealing with arrays
■ ■
■
movl (%ebp, %esi), %eax
● Copy value at (address = ebp + esi) into eax
movl 8(%ebp, %esi),%eax
● Copy value at (address = 8 + ebp + esi) into eax
If you need to walk through the elements of an array Use one register to hold base address, one to hold index
● E.g., implement C array access in a for loop
Index cannot be ESP
Rutgers University David Menendez 38 38
Scaled Indexed Mode Addressing
Multiply the second operand by the scale (1, 2, 4 or 8)
■
movl 0x80 (%ebx, %esi, 4), %eax
● Copy value at (address = ebx + esi*4 + 0x80) into eax
Where is it useful?
Rutgers University David Menendez 39 39
Address Computation Examples
%edx
0xf000
%ecx
0x100
Expression
Computation
Address
0x8(%edx)
0xf000 + 0x8
0xf008
(%edx,%ecx)
0xf000 + 0x100
0xf100
(%edx,%ecx,4)
0xf000 + 4*0x100
0xf400
0x80(,%edx,2)
2*0xf000 + 0x80
0x1e080
40
movl Operand Combinations
Source
Imm
movl
Destination C Analog
Reg Mem
movl $0x4,%eax temp = 0x4; movl$-147,(%eax) *p=-147;
■
Cannot do memory-memory transfers with single instruction
Reg
Mem Reg
Reg Mem
movl %eax,%edx
movl %eax,(%edx)
movl (%eax),%edx
temp2 = temp1;
*p = temp;
temp = *p;
Rutgers University David Menendez 41
Stack Operations
By convention, %esp is used to maintain a stack in memory
■
%esp contains the address of top of stack
Instructions to push (pop) content onto (off of) the stack
■
■
Where does the stack start? We’ll discuss later
Used to support C function calls
pushl %eax
● esp=esp–4
● Memory[esp] = eax
popl %ebx
● ebx = Memory[esp] ● esp=esp+4
Rutgers University David Menendez 42
Using Simple Addressing Modes
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx
Up
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Body
Finish
Understanding Swap
Offset
12 8 4 0 -4
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
Stack
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
%ebp
# ecx = yp
# edx = xp #eax=*yp(t1) #ebx=*xp(t0) #*xp=eax #*yp=ebx
Register Variable
%ecx yp %edx xp %eax t1 %ebx t0
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
%ecx
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx #
ecx = yp
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# # # # #
edx=xp
eax = *yp (t1) ebx = *xp (t0) *xp = eax
*yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
0x124
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx #
ecx = yp
edx=xp
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
#
# # # #
eax = *yp (t1)
ebx = *xp (t0)
*xp = eax
*yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
456
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx
Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
456
123
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx
Swap in x86-64: 64-bit Registers
rax
eax
rcx
ecx
rdx
edx
rbx
ebx
rsp
esp
rbp
ebp
rsi
esi
rdi
edi
r8
r9
r10
r11
r12
r13
r14
r15
Swap in x86-64 bit
swap:
movl (%rdi), %edx
movl (%rsi), %eax
movl %eax, (%rdi)
movl %edx, (%rsi)
retq
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
Arguments passed in registers
■ ■
No stack operations
What happens with long int?
First, xp in rdi and yp in rsi
64-bit pointers, data values are 32-bit ints, so uses eax/edx
IA32 Stack
■
■
■ Register%espindicates lowest stack address
● address of top element
Stack Pointer
%esp
Stack “Bottom”
Increasing Addresses
Region of memory managed with stack discipline
Grows toward lower addresses
Stack Grows Down
Rutgers University
David Menendez
54
Stack “Top”
IA32 Stack Pushing
Pushing
■ pushl Src
■
■ Decrement%espby4
■
Stack “Bottom”
Increasing Addresses
Fetch operand at Src Write operand at address
given by %esp
Stack Pointer
-4
Stack Grows Down
%esp
Rutgers University
David Menendez
55
Stack “Top”
IA32 Stack Popping
Popping
■ popl Dest
■
■ Increment%espby4
■
Stack “Bottom”
Increasing Addresses
Read operand at address given by %esp
Write to Dest
Stack Pointer
%esp
+4
Stack Grows Down
Rutgers University
David Menendez
56
Stack “Top”
Stack Operation Examples
pushl %eax
popl %edx
123
123
213
123
213
0x110
0x10c
0x108
0x110
0x10c
0x108
0x104
%eax
%edx
%esp
0x110
0x10c
0x108
0x104
%eax
%edx
%esp
213
555
0x108
213
555
0x1084
213
525153
0x1048
%eax
%edx
%esp
Procedure Control Flow
■ Use stack to support procedure call and return Procedure call:
call label Push return address on stack; Jump to label
Return address value
■ Address of instruction beyond call ■ Example from disassembly
804854e: e8 3d 06 00 00 8048553: 50
●Return address = 0x8048553 Procedure return:
call 8048b90
pushl %eax
■ ret Pop address from stack; Jump to address
Rutgers University David Menendez 58
0x110
0x10c
0x108
%esp %eip
0x108
0x804854e
Procedure Call Example
804854e: e8 3d 06 00 00 call 8048b90
8048553: 50
pushl %eax
call 8048b90
0x110
0x10c
0x108
0x104
%esp 0x1084 %eip 0x80485b49e0
123
123
0x8048553
%eip is program counter
8048591: c3
Procedure Return Example
ret
0x110
0x10c
0x108
0x104
%esp %eip
0x110
0x10c
0x108
ret
123
0x8048553
123
0x8048553
0x104
0x8048591
%esp 0x1048 %eip 0x804855931
%eip is program counter
Address Computation Instruction
leal: compute address using addressing mode without accessing memory
leal src, dest
■ ■
Use
■
Example:
■
src is address mode expression Set dest to address specified by src
Computing address without doing memory reference
● E.g., translation of p = &x[i];
leal 7(%edx, %edx, 4), %eax
● eax=4*edx+edx+7=5*edx+7
Rutgers University David Menendez 61
Some Arithmetic Operations
Instruction addl Src,Dest subl Src,Dest imull Src,Dest sall Src,Dest sarl Src,Dest xorl Src,Dest andl Src,Dest orl Src,Dest
Computation
Dest = Dest + Src
Dest = Dest – Src
Dest = Dest * Src
Dest = Dest << Src (left shift) Dest = Dest >> Src (right shift) Dest = Dest ^ Src
Dest = Dest & Src Dest = Dest | Src
Rutgers University
David Menendez 62
Some Arithmetic Operations
Instruction
incl Dest decl Dest negl Dest notl Dest
Computation
Dest = Dest + 1 Dest = Dest – 1 Dest = – Dest Dest = ~ Dest
Rutgers University
David Menendez 63
Using leal for Arithmetic Expressions
arith:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax
movl %ebp,%esp
popl %ebp
ret
Set Up
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
Body
Finish
Understanding arith Offset
16 12 8 4 0
Stack
• • •
z
y
x
Rtn adr
Old %ebp
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax
# edx = 48*y (t4)
# ecx = z+t1 (t2)
# eax = 4+t4+x (t5)
# eax = t5*t2 (rval)
# eax = x
# edx = y
# ecx = x+y (t1)
%ebp
Understanding arith
# eax = x
movl 8(%ebp),%eax
# edx = y
movl 12(%ebp),%edx
# ecx = x+y (t1)
leal (%edx,%eax),%ecx
# edx = 3*y
leal (%edx,%edx,2),%edx
# edx = 48*y (t4)
sall $4,%edx
# ecx = z+t1 (t2)
addl 16(%ebp),%ecx
# eax = 4+t4+x (t5)
leal 4(%edx,%eax),%eax
# eax = t5*t2 (rval)
imull %ecx,%eax
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
Another Example
logical:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
xorl 12(%ebp),%eax
sarl $17,%eax
andl $8185,%eax
movl %ebp,%esp
popl %ebp
ret
eax = x
eax = x^y (t1) eax = t1>>17 (t2) eax = t2 & 8185
Set Up
int logical(int x, int y)
{
int t1 = x^y;
int t2 = t1 >> 17;
int mask = (1<<13) - 7;
int rval = t2 & mask;
return rval;
}
Body Finish
213 =8192,213 –7=8185
movl 8(%ebp),%eax
xorl 12(%ebp),%eax
sarl $17,%eax
andl $8185,%eax
Mystery Function
What does the following piece of code do?
A. Add two variables
B. Subtract two variables
C. Swap two variables
D. No idea
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
Rutgers University
David Menendez 68
What does this function do?
.globl foo
.type foo, @function
foo:
pushl %ebp
movl
movl
imull
addl
popl %ebp ret
%esp, %ebp
16(%ebp), %eax 12(%ebp), %eax 8(%ebp), %eax
Control Flow/Conditionals
How do we represent conditionals in assembly?
A conditional branch can implement all control flow constructs in higher level language
■ Examples: if/then, while, for
A unconditional branch for constructs like break/ continue
Condition Codes
Single Bit Registers
CF Carry Flag SF Sign Flag
ZF Zero Flag OF Overflow Flag
Can be set either implicitly or explicitly.
■ Implicitly by almost all logic and arithmetic operations ■ Explicitly by specific comparison operations
Not Set by leal instruction
■ Intended for use in address computation only
Rutgers University David Menendez 71
jX Instructions
■
Jumping
Jump to different part of code depending on condition codes
jX
Condition
Description
jmp
1
Unconditional
je
ZF
Equal / Zero
jne
~ZF
Not Equal / Not Zero
js
SF
Negative
jns
~SF
Nonnegative
jg
~(SF^OF)&~ZF
Greater (Signed)
jge
~(SF^OF)
Greater or Equal (Signed)
jl
(SF^OF)
Less (Signed)
jle
(SF^OF)|ZF
Less or Equal (Signed)
ja
~CF&~ZF
Above (unsigned)
jb
CF
Below (unsigned)
Rutgers University David Menendez 72
Condition Codes
Implicitly Set By Arithmetic Operations
addl Src,Dest
Canalog: t=a+b
■ CF set if carry out from most significant bit ●Used to detect unsigned overflow
■ ZFsetift==0
■ SFsetift<0
■ OF set if two’s complement overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
Rutgers University David Menendez 73
Setting Condition Codes (cont.)
Explicit Setting by Compare Instruction
cmpl Src2,Src1
cmpl b,a like computing a-b without setting destination
NOTE: The operands are reversed. Source of confusion
■
■
■
■
■
■
CF set if carry out from most significant bit
● Used for unsigned comparisons
ZFsetifa==b
SFsetif(a-b)<0
OF set if two’s complement overflow
(a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-
b)>0)
Rutgers University David Menendez 74
Setting Condition Codes (cont.)
Explicit Setting by Test instruction
testl Src2,Src1
■ SetsconditioncodesbasedonvalueofSrc1&Src2
● Useful to have one of the operands be a mask
■ testl b,a like computing a&b without setting destination ■ ZF set whena&b == 0
■ SFsetwhena&b<0
Rutgers University David Menendez 75
Conditional Branch Example
_max:
pushl %ebp
movl %esp,%ebp
Set Up
Body
movl 8(%ebp),%edx
movl 12(%ebp),%eax
cmpl %eax,%edx
jle L9
movl %edx,%eax
L9:
movl %ebp,%esp
popl %ebp
ret
Finish
Conditional Branch Example
_max:
pushl %ebp
movl %esp,%ebp
Set Up
Body
int max(int x, int y)
{
if (x <= y)
return y;
else
return x; }
movl 8(%ebp),%edx
movl 12(%ebp),%eax
cmpl %eax,%edx
jle L9
movl %edx,%eax
L9:
movl %ebp,%esp
popl %ebp
ret
Finish
Conditional Branch Example (Cont.)
int max(int x, int y)
{
if (x <= y)
return y;
else
return x; }
int goto_max(int x, int y)
{
int rval = y;
int ok = (x <= y);
if (ok)
goto done;
rval = x;
done:
return rval;
}
movl 8(%ebp),%edx
movl 12(%ebp),%eax # eax =
x y
goto L9
x
cmpl %eax,%edx
jle L9
movl %edx,%eax
L9:
# x : y
# if <=
# eax =
# Done:
■
Generally considered bad coding style
# edx =
■
C allows “goto” as means of transferring control
● Closer to machine- level programming style
Rutgers University
David Menendez 78
Skipped when x ≤ y
.LC0:
.string "%d”
.text .globl foo
.type foo, @function foo:
pushl %ebp
movl
subl
leal
movl
movl
call scanf
cmpl $4, -12(%ebp) je .L3
call explode_bomb .L3:
leave .p2align 4,,3 ret
Rutgers University
David Menendez 79
%esp, %ebp $40, %esp
-12(%ebp), %eax %eax, 4(%esp) $.LC0, (%esp)
Mystery Function
C Code
“Do-While” Loop Example
int fact_do(int x)
{
int result = 1;
do {
result *= x;
x = x-1;
} while (x > 1);
return result;
}
Rutgers University David Menendez 80
“Do-While” Loop Example
C Code Goto Version
int fact_do(int x)
{
int result = 1;
do {
result *= x;
x = x-1;
} while (x > 1);
return result;
}
int fact_goto(int x)
{
int result = 1;
loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; return result;
}
■ ■
Use backward branch to continue looping Only take branch when “while” condition holds
Rutgers University David Menendez 81
“Do-While” Loop Compilation
Goto Version
Assembly
int fact_goto(int x)
{
int result = 1;
loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; return result;
}
Registers
%edx x
%eax result
_fact_goto:
pushl %ebp
movl %esp,%ebp
movl $1,%eax
movl 8(%ebp),%edx
L11:
imull %edx,%eax
decl %edx
cmpl $1,%edx
jg L11
movl %ebp,%esp
popl %ebp
ret
# Setup
# Setup
# eax = 1
# edx = x
# result *= x
# x–
# Compare x : 1
# if > goto loop
# Finish
# Finish
# Finish
Rutgers University
David Menendez
82
C Code
Goto Version
General “Do-While” Translation
do
Body
while (Test);
loop: Body
if (Test) goto loop
■ Body can be any C statement ●Typically compound statement:
{
Statement1; Statement2;
… Statementn;
}
■ Test is expression returning integer
= 0 interpreted as false ≠0 interpreted as true
Rutgers University David Menendez 83
C Code
“While” Loop Example #1
int fact_while(int x)
{
int result = 1;
while (x > 1) {
result *= x;
x = x-1; };
return result;
}
Rutgers University David Menendez 84
Actual “While” Loop Translation
C Code
Goto Version
int fact_while(int x)
{
int result = 1;
while (x > 1) {
result *= x;
x = x-1; };
return result;
}
int fact_while_goto2
(int x) {
int result = 1;
if (!(x > 1))
goto done; loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; done:
return result;
}
■
■
Uses same inner loop as do-while version
Guards loop entry with extra test
Rutgers University
David Menendez
85
General “While” Translation
C Code
while (Test) Body
Do-While Version
Goto Version
if (!Test) goto done;
loop: Body
if (Test) goto loop;
done:
if (!Test)
goto done; do
Body
while(Test); done:
Switch Statements
Implementation Options
■ Series of conditionals ● Good if few cases ● Slow if many
■ Jump Table
● Lookup branch target
● Avoids conditionals
● Possible when cases are small integer constants
■ GCC
● Picks one based on case
structure
■ Bug in example code ● No default given
typedef enum
{ADD, MULT, MINUS, DIV, MOD, BAD}
op_type;
char unparse_symbol(op_type op)
{
switch (op) {
case ADD :
return ‘+’;
case MULT:
return ‘*’;
case MINUS:
return ‘-‘;
case DIV:
return ‘/’;
case MOD:
return ‘%’;
case BAD:
return ‘?’; }
}
Rutgers University
David Menendez
87
Switch Form
Jump Table
Jump Targets
Targ0:
Targ1:
Targ2:
Jump Table Structure
Targ0
Targ1
Targ2
• • •
Targn-1
switch(op) {
case val_0:
Block 0 case val_1:
Block 1
•••
case val_n-1:
Block n–1 }
Approx. Translation
• • •
target = JTab[op];
goto *target;
jtab:
Targn-1:
Code Block 0
Code Block 1
Code Block 2
Code Block n–1
Switch Statement Example
Branching Possibilities
Enumerated Values
ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4 BAD 5
typedef enum
{ADD, MULT, MINUS, DIV, MOD, BAD}
op_type;
char unparse_symbol(op_type op)
{
switch (op) {
••• }
}
Setup:
unparse_symbol:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
cmpl $5,%eax
ja .L49
jmp *.L57(,%eax,4)
# Setup
# Setup
# eax = op
# Compare op : 5
# If > goto done
# goto Table[op]
Rutgers University
David Menendez
89
Assembly Setup Explanation
Table Structure
■
■ Baseaddressat.L57
Each target requires 4 bytes
Jumping
jmp .L49
● Jump target is denoted by label .L49
jmp *.L57(,%eax,4)
● Start of jump table denoted by label .L57 ● Register %eax holds op
● Must scale by factor of 4 to get offset into table
● Fetch target from effective Address .L57 + op*4
Rutgers University David Menendez 90
Table Contents
Targets & Completion
Jump Table
.L51:
movl $43,%eax # ’+’ jmp .L49
.L52:
movl $42,%eax # ’*’ jmp .L49
.L53:
movl $45,%eax # ’-’ jmp .L49
.L54:
movl $47,%eax # ’/’ jmp .L49
.L55:
movl $37,%eax # ’%’ jmp .L49
.L56:
movl $63,%eax # ’?’
# Fall Through to .L49
.section .rodata
.align 4
.L57:
.long .L51 #Op = 0 .long .L52 #Op = 1 .long .L53 #Op = 2 .long .L54 #Op = 3 .long .L55 #Op = 4 .long .L56 #Op = 5
Enumerated Values
ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4 BAD 5
Rutgers University
David Menendez 91
Switch Statement Completion
.L49:
movl %ebp,%esp
popl %ebp ret
# Done:
# Finish
# Finish
# Finish
Puzzle
■ Whatvaluereturnedwhenopisinvalid?
Answer
■ Register%eaxsettoopatbeginningofprocedure
■
Advantage of Jump Table
■
This becomes the returned value
Can do k-way branch in O(1) operations
Rutgers University David Menendez 92
Reading Condition Codes
SetX Instructions
■
Set single byte based on combinations of condition codes
SetX
Condition
Description
sete
ZF
Equal / Zero
setne
~ZF
Not Equal / Not Zero
sets
SF
Negative
setns
~SF
Nonnegative
setg
~(SF^OF)&~ZF
Greater (Signed)
setge
~(SF^OF)
Greater or Equal (Signed)
setl
(SF^OF)
Less (Signed)
setle
(SF^OF)|ZF
Less or Equal (Signed)
seta
~CF&~ZF
Above (unsigned)
setb
CF
Below (unsigned)
Rutgers University David Menendez 93
SetX Instructions
■
■
Reading Condition Codes (Cont.)
Set single byte based on combinations of condition codes
%eax
%ah
%al
%edx
%dh
%dl
%ecx
%ch
%cl
%ebx
%bh
%bl
%esi
%edi
%esp
%ebp
One of 8 addressable byte registers
● Embedded within first 4 integer registers
● Does not alter remaining 3 bytes ● Typically use movzbl to finish job
int gt (int x, int y) {
return x > y; }
Body
movl 12(%ebp),%eax # eax = y
cmpl %eax,8(%ebp)
setg %al
movzbl %al,%eax
# Compare x : y
# al = x > y
# Zero rest of %eax
Note inverted ordering!
Rutgers University David Menendez
94
Can you write the C code for this assembly?
.globl test
.type test, @function
test:
pushl %ebp
movl %esp, %ebp pushl %ebx
movl 8(%ebp), %edx movl 12(%ebp), %ecx movl $1, %eax
cmpl %ecx, %edx jge .L3
What does this function do? What is the C code?
.L6:
leal imull
addl cmpl
jg .L6 .L3:
popl %ebx popl %ebp ret
(%edx,%ecx), %ebx %ebx, %eax
$1, %edx %edx, %ecx
Stack-Based Languages
Languages that Support Recursion
■ ■
■
Stack Discipline
■
■
Stack Allocated in Frames (Activation records)
■
e.g., C, Pascal, Java Code must be “Reentrant”
● Multiple simultaneous instantiations of single procedure
Need some place to store state of each instantiation
● Arguments, local variables, return pointer
State for given procedure needed for limited time
● From when called to when return
Callee returns before caller does
state for single procedure instantiation
Rutgers University David Menendez 96
Call Chain Example
Code Structure
Call Chain
yoo(…)
{
•
• who(); •
•
}
yoo
who
amI amI
amI
amI
who(…)
{
•••
amI(); ••• amI(); •••
}
amI(…)
{
•
• amI(); •
•
}
■ Procedure amI recursive
Rutgers University David Menendez
97
Stack Frames
Contents
■ ■
Management
■
■
Pointers
Local variables, return value Temporary space
Space allocated when enter procedure
● “Set-up” code
Frame Pointer
Deallocated when return
● “Finish” code %ebp
■ Stack pointer %esp : stack top
■ Frame pointer %ebp : start of current
frame
Stack “Top”
Rutgers University David Menendez
98
yoo
Stack Pointer
%esp
who
amI
proc
• • •
yoo
Stack Operation
Call Chain
yoo
Frame Pointer
%ebp
Stack
Pointer
%esp
yoo(…)
{
•
• who(); •
•
}
Rutgers University
David Menendez
99
• • •
yoo
who
Stack Operation
Call Chain
yoo who
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}
• • •
yoo
who
amI
Stack Operation
Call Chain
yoo who amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• amI(); •
•
}
• • •
yoo
who
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• amI(); •
•
}
• • •
yoo
who
amI
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• amI(); •
•
}
• • •
yoo
who
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• amI(); •
•
}
• • •
yoo
who
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• amI(); •
•
}
• • •
yoo
who
Stack Operation
Call Chain
yoo who amI
amI amI
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}
• • •
yoo
who
amI
Stack Operation
Call Chain
yoo
who
amI amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{
•
• • •
}
• • •
yoo
who
Stack Operation
Call Chain
yoo
who
amI amI
amI amI
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}
• • •
yoo
Stack Operation
Call Chain
yoo
who
amI amI amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
yoo(…)
{
•
• who(); •
•
}
IA32/Linux Stack Frame
Current Stack Frame (“Top” to Bottom)
■
■
■ ■
Caller Stack Frame
■
■
Arguments
Return Addr
Old %ebp
Saved Registers + Local Variables
Argument Build
Parameters for function about to call
● “Argument build”
Caller Frame
Frame Pointer (%ebp)
Local variables
● If can’t keep in registers
Saved register context Old frame pointer
Return address
● Pushed by call instruction Arguments for this call
Stack Pointer (%esp)
Rutgers University David Menendez 110
Revisiting swap
Calling swap from call_swap
int zip1 = 15213;
int zip2 = 91125;
void call_swap()
{
swap(&zip1, &zip2);
}
call_swap: •••
pushl $zip2 pushl $zip1 call swap •••
# Global Var
# Global Var
• • •
&zip2
&zip1
Rtn adr
Resulting Stack
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
%esp
Revisiting swap
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx
Up
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Body
Finish
swap Setup #1 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
%ebp
%ebp
%esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
%esp
swap Setup #2 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
%ebp
%esp
%ebp %esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
swap Setup #3 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
%ebp
%esp
%ebp %esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
Effect of swap Setup Entering
Resulting Stack
Stack
• • •
&zip2
12
• • •
yp
&zip1
8
xp
Rtn adr
%esp 4
Rtn adr
Old %ebp
Old %ebx
%ebp
Offset (relative to %ebp)
0
Body
%ebp %esp
movl 12(%ebp),%ecx # get yp movl 8(%ebp),%edx # get xp …
swap Finish #1 swap’s
Stack Offset
Offset
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
12 8 4 0
12 8 4 0 -4
%ebp -4 %esp
%ebp %esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
117
Observation
■ Saved&restoredregister%ebx Rutgers University David Menendez
12 8 4 0
swap Finish #2 swap’s
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
• • •
yp
xp
Rtn adr
Old %ebp
swap’s Stack
Offset
Stack Offset
12 8 4 0
%ebp -4 %esp
%ebp %esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap Finish #3 swap’s
%ebp
• • •
yp
xp
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
swap’s Stack
Offset
12 8 4
0 %ebp %esp
Stack Offset
12 8 4
%esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
swap Finish #4 %ebp
%ebp
Exiting Stack
%esp
• • •
yp
xp
Rtn adr
• • •
&zip2
&zip1
swap’s Stack
Offset
12 8 4
Observation
%esp
■ Saved&restoredregister%ebx
■ Didn’tdosofor%eax,%ecx,or%edx
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Rutgers University David Menendez
120
Register Saving Conventions
When procedure yoo calls who:
■ yooisthecaller,whoisthecallee
Can Register be Used for Temporary Storage?
yoo: •••
movl $15213, %edx call who
addl %edx, %eax •••
ret
who: •••
movl 8(%ebp), %edx addl $91125, %edx •••
ret
■ Contentsofregister%edxoverwrittenbywho
Rutgers University David Menendez 121
Register Saving Conventions
When procedure yoo calls who:
■ yooisthecaller,whoisthecallee
Can Register be Used for Temporary Storage?
Conventions
■
■
“Caller Save”
● Caller saves temporary in its frame before calling
“Callee Save”
● Callee saves temporary in its frame before using
Rutgers University David Menendez 122
IA32/Linux Register Usage
Two have special uses
■ %ebp,%esp
Three managed as callee-save
■ %ebx,%esi,%edi
■
Three managed as caller-save
■ %eax,%edx,%ecx
■
Caller-Save Temporaries
%eax
%edx
%ecx
%ebx
%esi
%edi
%esp
%ebp
Old values saved on stack prior to using
Do what you please, but expect any
Callee-Save Temporaries
callee to do so, as well
Special
Register %eax also stores returned value
Rutgers University David Menendez 123
.globl rfact
.type
rfact,@function
rfact:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),
cmpl $1,%ebx
jle .L78
leal -1(%ebx),%eax
pushl %eax
call rfact
imull %ebx,%eax
jmp .L79
.align 4
.L78:
movl $1,%eax
.L79:
movl %ebp,%esp
popl %ebp
ret
pushl %ebx
movl -4(%ebp),%ebx
%ebx
Recursive Function
Rutgers University David Menendez 124
.globl rfact
.type
rfact,@function
rfact:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),
cmpl $1,%ebx
jle .L78
leal -1(%ebx),%eax
pushl %eax
call rfact
imull %ebx,%eax
jmp .L79
.align 4
.L78:
movl $1,%eax
.L79:
movl %ebp,%esp
popl %ebp
ret
pushl %ebx
movl -4(%ebp),%ebx
%ebx
Recursive Factorial
int rfact(int x)
{
int rval;
if (x <= 1)
return 1;
rval = rfact(x-1);
return rval * x;
}
Registers
■ %eaxusedwithoutfirstsaving ■ %ebxused,butsaveat
beginning & restore at end
Rutgers University David Menendez 125
pre %ebp
pre %ebx
x
Rtn adr
Caller
Rfact Stack Setup
%ebp
Entering Stack
%esp
rfact:
pushl %ebp
movl %esp,%ebp pushl %ebx
pre %ebp
pre %ebx
x
Rtn adr
Old %ebp
Old %ebx
Caller
8 4
Callee 0 %ebp -4 %esp
Rfact Body
movl 8(%ebp),%ebx
cmpl $1,%ebx
jle .L78
leal -1(%ebx),%eax
pushl %eax
# ebx = x
# Compare x : 1
# If <= goto Term
# eax = x-1
# Push x-1
# rfact(x-1)
# rval * x
# Goto done
call rfact
imull %ebx,%eax
jmp .L79
.L78:
movl $1,%eax
.L79:
# Term:
# return val = 1
# Done:
Recursion
Registers
%ebx Stored value of x %eax
● Temporary value of x-1 ●Returned value from rfact(x-1) ●Returned value from this call
int rfact(int x)
{
int rval;
if (x <= 1)
return 1;
rval = rfact(x-1) ;
return rval * x;
}
Rutgers University
David Menendez 127
Rfact Recursion
leal -1(%ebx),%eax
x
Rtn adr
Old %ebp
Old %ebx
pushl %eax
x
Rtn adr
Old %ebp
Old %ebx
x-1
%ebp %esp
call rfact
%ebp %esp
%ebp
%esp
x
Rtn adr
Old %ebp
Old %ebx
x-1
Rtn adr
x-1
x
%eax %ebx
x-1
x
%eax %ebx
x-1
x
%eax %ebx
Return from Call
Rfact Result
x
Rtn adr
Old %ebp
Old %ebx
x-1
x
Rtn adr
Old %ebp
Old %ebx
x-1
%ebp %esp
%ebp %esp
(x-1)!
x
(xx-!1)!
x
%eax %ebx
%eax %ebx
Assume that rfact(x-1)
returns (x-1)! in register %eax
imull %ebx,%eax
movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp
ret
Rfact Completion
pre %ebp
pre %ebx
x
Rtn adr
Old %ebp
Old %ebx
x-1
8
4
0 %ebp
-4
-8 %esp
8 4 0
%eax %ebx
%ebp
%esp
pre %ebp
pre %ebx
x
Rtn adr
Old %ebp
pre %ebp
pre %ebx
x
Rtn adr
x!
x Old %ebx
%eax %ebx
%ebp %esp
%eax %ebx
x!
Old %ebx
x!
Old %ebx
Integral
byte b word w double word l
Floating Point
1 [unsigned] char 2 [unsigned] short 4 [unsigned] int
Basic Data Types
■ Stored & operated on in general registers
■ Signed vs. unsigned depends on instructions used Intel GAS Bytes C
■ Stored & operated on in floating point registers Intel GAS Bytes C
Single s Double l Extended t
4 float
8 double 10/12 long double
Rutgers University
David Menendez 131
Array Allocation
Basic Principle
T A[L];
■ ArrayofdatatypeTandlengthL
■ ContiguouslyallocatedregionofL*sizeof(T)bytes
char string[12];
int val[5];
double a[4];
x x
x + 4 x + 8
x + 16
x + 12
x + 12 x + 16
x + 24
x + 20
x + 32
x
x + 8
char *p[3];
Rutgers University
x x+4 x+8 David Menendez
132
zip_dig cmu;
zip_dig mit;
zip_dig ucb;
Notes
16 20 24 28 32 36
36 40 44 48 52 56
56 60 64 68 72 76
Array Example
typedef int zip_dig[5];
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
1
5
2
1
3
0
2
1
3
9
9
4
7
2
0
■ Declaration “zip_dig cmu” equivalent to “int cmu[5]”
■
Example arrays were allocated in successive 20 byte blocks
● Not guaranteed to happen in general
Rutgers University David Menendez 133
Array Accessing Example
Computatio
■ Register %edx contains starting address of array
■ Register %eax contains array index
■ Desired digit at 4*%eax + %edx
■ Use memory reference (%edx,
%eax,4)
Memory Reference Code
n
int get_digit
(zip_dig z, int dig)
{
return z[dig];
}
# %edx = z
# %eax = dig
movl (%edx,%eax,4),%eax # z[dig]
Rutgers University David Menendez 134
Referencing Examples
zip_dig cmu;
zip_dig mit;
zip_dig ucb;
16 20 24 28 32 36
36 40 44 48 52 56
56 60 64 68 72 76
1
5
2
1
3
0
2
1
3
9
9
4
7
2
0
Code Does Not Do Any Bounds Checking!
Reference
mit[3]
mit[5]
mit[-1]
cmu[15]
Address Value Guaranteed?
36+4*3=48 3 36+4*5=56 9 36 + 4*-1 = 32 3 16 + 4*15 = 76 ??
Yes No
No No
■ Out of range behavior implementation-dependent
● No guaranteed relative allocation of different arrays
Rutgers University David Menendez 135
Array Loop Example
int zd2int(zip_dig z)
{
int i;
int zi = 0;
for (i = 0; i < 5; i++) {
zi = 10 * zi + z[i];
}
return zi; }
Original Source
Transformed Versio
■ As generated by GCC
■ Eliminateloopvariablei
■ Convert array code to pointer code
■ Express in do-while form ● No need to test at entrance
n
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4;
do {
zi = 10 * zi + *z;
z++;
} while(z <= zend);
return zi;
}
Rutgers University
David Menendez 136
Array Loop Implementation
Registers
%ecx z %eax zi %ebx zend
Computations
■ 10*zi+*zimplementedas
*z + 2*(zi+4*zi) z++ increments by 4
int zd2int(zip_dig z)
{
int zi = 0;
int *zend = z + 4; do {
zi = 10 * zi + *z;
z++;
} while(z <= zend); return zi;
}
■
# %ecx = z
xorl %eax,%eax # zi = 0
leal 16(%ecx),%ebx # zend = z+4
.L59:
leal (%eax,%eax,4),%edx # 5*zi
movl (%ecx),%eax # *z
addl $4,%ecx # z++
leal (%eax,%edx,2),%eax # zi = *z + 2*(5*zi) cmpl %ebx,%ecx # z : zend
jle .L59 # if <= goto loop
Rutgers University
David Menendez 137
Multi-Level Array Example
■ Variableunivdenotes array of 3 elements
■ Each element is a pointer
● 4 bytes
■ Each pointer points to array of int’s
univ 160
164 168
cmu
mit16 20 24 28 32 36
ucb36 40 44 48 52 56
56 60 64 68 72 76
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};
1
5
2
1
3
36
16
56
0
2
1
3
9
9
4
7
2
0
Rutgers University
David Menendez 138
Element Access in Multi-Level Array
int get_univ_digit
(int index, int dig)
{
return univ[index][dig];
}
Computation
■
■
Element access
Mem[Mem[univ+4*index]+4*dig]
Must do two memory reads
● First get pointer to row array
● Then access element within array
# %ecx = index
# %eax = dig
leal 0(,%ecx,4),%edx # 4*index
movl univ(%edx),%edx # Mem[univ+4*index] movl (%edx,%eax,4),%eax # Mem[...+4*dig]
Rutgers University David Menendez 139
Concept
■ ■ ■
Structures
Contiguously-allocated region of memory Refer to members within structure by names Members may be of different types
struct rec {
int i;
int a[3];
int *p; };
Memory Layout
20
i
a
p
Accessing Structure Member
04 16
Assembly
void
set_i(struct rec *r,
int val)
{
r->i = val;
}
# %eax = val
# %edx = r
movl %eax,(%edx) # Mem[r] = val
Rutgers University David Menendez
140
Generating Pointer to Struct. Member
r
04 16
r + 4 + 4*idx
struct rec {
int i;
int a[3];
int *p; };
i
a
p
int * find_a
(struct rec *r, int idx)
{
return &r->a[idx];
}
Generating Pointer to Array Element
■
Offset of each structure member determined at compile time
# %ecx = idx
# %edx = r
leal 0(,%ecx,4),%eax # 4*idx
leal 4(%eax,%edx),%eax # r+4*idx+4
Rutgers University
David Menendez
141
C Code
Structure Referencing (Cont.)
i
a
p
04 16
04 16 Element i
struct rec {
int i;
int a[3];
int *p; };
i
a
void
set_p(struct rec *r)
{
r->p =
&r->a[r->i];
}
# %edx = r
movl (%edx),%ecx # r->i
leal 0(,%ecx,4),%eax # 4*(r->i)
leal 4(%edx,%eax),%eax # r+4+4*(r->i)
movl %eax,16(%edx) # Update r->p
Rutgers University David Menendez
142
Alignment
Aligned Data
■
■
■
Motivation for Aligning Data
■
Compiler
■
Primitive data type requires K bytes
Address must be multiple of K
Required on some machines; advised on IA32
● treated differently by Linux and Windows!
Memory accessed by (aligned) double or quad-words
● Inefficient to load or store datum that spans quad word boundaries
Inserts gaps in structure to ensure correct alignment of fields
Rutgers University David Menendez 143
Specific Cases of Alignment
Size of Primitive Data Type:
■ 1 byte (e.g., char)
● no restrictions on address
■ 2 bytes (e.g., short)
● lowest 1 bit of address must be 02
■ 4 bytes (e.g., int, float, char *, etc.) ● lowest 2 bits of address must be 002
■ 8 bytes (e.g., double)
● Windows (and most other OS’s & instruction sets):
» lowest 3 bits of address must be 0002
● Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
■ 12 bytes (long double) ● Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
Rutgers University David Menendez 144
Satisfying Alignment with Structures
Offsets Within Structure
■
Overall Structure Placement
■
■
Example (under Windows):
■ K = 8, due to double element
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8
Multiple of 8
Must satisfy element’s alignment requirement
struct S1 {
char c;
int i[2];
double v;
} *p;
Each structure has alignment requirement K
● Largest alignment of any element
Initial address & structure length must be multiples of K
c
i[0]
i[1]
v
p+24
Multiple of 8
Rutgers University David Menendez
145
Linux vs. Windows
struct S1 {
char c;
int i[2];
double v;
} *p;
Windows (including Cygwin):
■ K=8,duetodoubleelement
p+0 p+4 p+8
Multiple of 4 Multiple of 8
Linux:
p+16
Multiple of 8
p+24
c
i[0]
i[1]
v
■ K=4;doubletreatedlikea4-bytedatatype
Multiple of 8
c
i[0]
i[1]
v
p+0 p+4
Multiple of 4
Multiple of 4
p+8 p+12
Multiple of 4
p+20
Rutgers University
David Menendez
146
Multiple of 4
Overall Alignment Requirement
struct S2 {
double x;
int i[2];
char c;
} *p;
p must be multiple of: 8 for Windows
4 for Linux
p+0 p+8 p+12 p+16
Windows: p+24 Linux: p+20
x
i[0]
i[1]
c
struct S3 {
float x[2];
int i[2];
char c;
} *p;
x[0]
x[1]
i[0]
i[1]
c
p+0 p+4
p must be multiple of 4 (in either OS)
p+8 p+12 p+16 p+20
Ordering Elements Within Structure
struct S4 {
char c1;
double v;
char c2;
int i;
} *p;
10 bytes wasted space in Windows
c1
v
c2
i
p+0 p+8 p+16 p+20 p+24
2 bytes wasted space
p+0 p+8 p+12 p+16
struct S5 {
double v;
char c1;
char c2;
int i;
} *p;
v
c1
c2
i
Arrays of Structures
Principle
■
■
struct S6 {
short i;
float v;
short j;
} a[10];
Allocated by repeating allocation for array type
In general, may nest arrays & structures to arbitrary depth
a[1].i
a[1].v
a[1].j
a+12
a+16 a+20
a+24
a[0]
a[1]
a[2]
Rutgers University
David Menendez
149
a+0
a+12 a+24
a+36
•••
Satisfying Alignment within Structure
Achieving Alignment
■
■
■
Starting address of structure array must be multiple of worst-case alignment for any element
● a must be multiple of 4
Offset of element within structure must be multiple of
element’s alignment requirement
● v’s offset of 4 is a multiple of 4
Overall size of structure must be multiple of worst-case
alignment for any element
● Structure padded with unused space to be 12 bytes
struct S6 {
short i;
float v;
short j;
} a[10];
a[0]
•••
a[i]
•••
a+0
a+12i
a+12i
a[1].i
a[1].v
a[1].j
Multiple of 4
a+12i+4
Rutgers University
David Menendez 150
Multiple of 4
Summary
Arrays in C
■ ■ ■
Compiler Optimizations
■ Compiler often turns array code into pointer code (zd2int)
■ ■
Structures
■ ■
Contiguous allocation of memory Pointer to first element
No bounds checking
Uses addressing modes to scale array indices Lots of tricks to improve array indexing in loops
Allocate bytes in order declared
Pad in middle and at end to satisfy alignment
Rutgers University David Menendez 151