CS计算机代考程序代写 assembly algorithm interpreter cache compiler Java chain assembler x86 data structure mips js Programming Meets Hardware

Programming Meets Hardware
High-Level Language Program
Compiler
Assembly Language Program
Assembler
Machine Language Program
#include int main() {
int x, y, temp;
x=1; y=2;
temp =x; x=y; y=temp; printf(“%d %d %d\n”,x,y,temp);
}
How do you get performance?
7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 02 00 03 00 01
00 00 00 f0 82 04 08 34 00 00 00 c4 0c 00 00 00 00 00 00 34 00
movl $1, -8(%ebp)
movl $2, -12(%ebp)
movl -8(%ebp), %eax
movl %eax, -16(%ebp)
movl -12(%ebp), %eax
movl %eax, -8(%ebp)
movl -16(%ebp), %eax
movl %eax, -12(%ebp)
movl -16(%ebp), %eax ISA movl %eax, 12(%esp)
movl -12(%ebp), %eax movl %eax, 8(%esp) movl -8(%ebp), %eax movl %eax, 4(%esp)

Performance with Programs
(1) Program: Data structures + algorithms (2) Compiler translates code
(3) Instruction set architecture
(4) Hardware Implementation
Rutgers University David Menendez 3

Instruction Set Architecture
(1) Set of instructions that the CPU can execute
(1) What instructions are available?
(2) How the instructions are encoded? Eventually everything is binary.
(2) State of the system (Registers + memory state + program counter)
(1) What instruction is going to execute next
(2) How many registers? Width of each register?
(3) How do we specify memory addresses?
● Addressing modes
(3) Effect of instruction on the state of the system
Rutgers University David Menendez 4

IA32 (X86 ISA)
There are many different assembly languages because they are processor-specific

IA32 (x86)
● x86-64 for new 64-bit processors
● IA-64 radically different for Itanium processors
● Backward compatibility: instructions added with time
■ ■
We will focus on IA32/x86-64 because you can generate and run on iLab machines (as well as your own PC/laptop)

PowerPC MIPS
IA32 is also dominant in the market although smart phone, eBook readers, etc. are changing this
Rutgers University David Menendez 5

X86 Evolution
8086 – 1978 – 29K transistors – 5-10MHz
I386 – 1985 – 275K transistors – 16-33 MHz Pentium4 – 2005 – 230M transistors – 2800-3800 MHz Haswell – 2013 – > 2B transistors – 3200-3900 MHz
Added features
• Large caches
• Multiple cores
• Support for data parallelism (SIMD) eg AVX extensions
Rutgers University David Menendez 6

CISC vs RISC
CISC: complex instructions : eg X86
• Instructions such as strcpy/AES and others • Reduces code size
• Hardware implementation complex?
RISC: simple instructions: eg Alpha • Instructions are simple add/ld/st
• Increases code size
• Hardware implementation simple?
Rutgers University David Menendez 7

Aside About Implementation of x86
About 30 years ago, the instruction set actually reflected the processor hardware

As hardware advanced, industry faced with choice
■ ■
E.g., the set of registers in the instruction set is actually what was present in the processor
Change the instruction set: bad for backward compatibility
Keep the instruction set: harder to exploit hardware advances
● Example: many more registers but only small set introduced circa 1980
Starting with the P6 (PentiumPro), IA32 actually got implemented by Intel using an “interpreter” that translates IA32 instructions into a simpler “micro” instruction set
Rutgers University David Menendez 8

P6 Decoder/Interpreter

Assembly Programming
Brief tour through assembly language programming Why?
■ ■
Why not binary language?
■ ■
Machine interface: where software meets hardware
To understand how the hardware works, we have to understand the interface that it exports
Much easier for humans to read and reason about
Major differences:
● Human readable language instead of binary sequences ● Relative instead of absolute addresses
Rutgers University David Menendez 10

Assembly Programmer’s View
CPU
Memory
ALU
Control Logic
Condition Codes
PC
Registers
(OS code & data) Object Code Program Data
Addresses
Data
Instructions

CPU
Memory
Memory
Address
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Command: R/W
Data
Addresses

Memory Access: Read
CPU
Memory
03
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
R
01010101
Addresses

Memory Access: Write
CPU
Memory
04
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
00000000
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
W
01010101
Addresses

Memory Access: Write
CPU
Memory
04
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
W
01010101
Addresses

C
S
AB
Processor: ALU & Registers
C = FS(A, B)
F includes
Arithmetic: +, -, *, /, ~, etc. Logical: <, >, =, etc.
ALU
Registers
Name Storage
R0 R1 R2 R3
00101100
10001000
11111111
01010101
Multiple Ports
Name Command: R/W Data

Putting It All Together
CPU
Memory
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Condition Codes
Program Counter (PC)
Control Logic
ALU
Registers
Addresses

Putting It All Together
CPU
Memory
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Condition Codes
Program Counter (PC)
1
ALU
Control Logic
Registers
Addresses

Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
1
ALU
Registers
Control Logic
10001000
1
Addresses

Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
+
ALU
Control Logic
1
Registers
R0: x R1: y
R0 R1
10001000
1
Addresses

Putting It All Together
CPU
Memory
Condition Codes
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
Program Counter (PC)
+
ALU
x
1
y
Registers
R0: x R1: y
R0 R1
Control Logic
10001000
1
Addresses

Putting It All Together
CPU
Memory
Condition Codes
Program Counter (PC)
z
+
ALU
x
Storage
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
00101100
10001000
11111111
01010101
01010101
11000001
00000000
11111001
11111000
00110000
00000000
00000000
00000000
11000011
00011001
00000000
2
y
Registers
R0: x R1: y
R2: z
R0 R1
Control Logic
R2
10001000
1
Addresses

Sample C Code
int accum;
int sum(int x, int y)
{
gcc -O1 -m32 –S code.c
Generated Assembly
sum:
push %ebp
movl %esp, %ebp
movl 12(%ebp), %eax
addl 8(%ebp), %eax
addl %eax, accum
popl %ebp
ret
}
int t = x + y;
accum += t;
return t;
Rutgers University
David Menendez 23
C & Assembly Code

Sample C Code
int accum;
int sum(int x, int y){
int t = x + y;
accum += t;
return t;
objdump –d code.o
C & Machine Code
} 3:
6: 03 45 08
gcc -O1 -m32 –c code.c 9:
gdb code.o
(gdb) x/100xb sum
f: 5d 10: c3
pop %ebp ret
: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45
0x8 : 0x08 0x01 0x05 0x00 0x00 0x00 0x00 0x5d
0x10 : 0xc3 Cannot access memory at address 0x11
0000000 :
0: 55
1: 89 e5
push %ebp
mov %esp,%ebp
mov 0xc(%ebp),%eax add 0x8(%ebp),%eax
8b 45 0c
01 05 00 00 00 00 add %eax, accum
Rutgers University David Menendez 24

Assembly Characteristics
Sequence of simple instructions
Minimal Data Types

■ ■
No type checking
■ ■
“Integer” data of 1, 2, or 4 bytes
● Data values
● Addresses (untyped pointers)
Floating point data of 4, 8, or 10 bytes
No aggregate types such as arrays or structures
● Just contiguously allocated bytes in memory
Interpretation of data format depends on instruction No protection against misinterpretation of data
Rutgers University David Menendez 25

Assembly Characteristics
3 types of Primitive Operations
■ ■

Perform arithmetic function on register or memory data
Transfer data between memory and register
● Load data from memory into register ● Store register data into memory
Transfer control
● Unconditional jumps to/from procedures ● Conditional branches
Rutgers University David Menendez 26

x86 Characteristics
Variable length instructions: 1-15 bytes
Can address memory directly in most instructions
Uses Little-Endian format (Least significant byte in the lowest address)
Rutgers University David Menendez 27

General format:
opcode operands
Opcode:

● movb,addl, etc. Operands:
■ ■
Example:

Instruction Format
Short mnemonic for instruction’s purpose
Immediate, register, or memory
Number of operands command-dependent
movl %ebx, (%ecx)
Rutgers University David Menendez 28

Machine Representation
Remember, each assembly instruction translated to a sequence of 1-15 bytes
Opcode
addressing mode
other bytes
First, the binary representation of the opcode Second, instruction specifies the addressing mode
■ ■
Some instructions can be single-byte because operands and addressing mode are implicitly specified by the instruction

The type of operands (registers or register and memory) How to interpret the operands
E.g., pushl
Rutgers University David Menendez 29 29

x86 Registers
General purpose registers are 32 bit

Originally categorized into two groups with different functionality


Now, the registers are mostly interchangeable
Segment registers

Although operations can access 8-bits or 16-bits portions
Data registers (EAX, EBX, ECX, EDX)
● Holds operands
Pointer and Index registers (EBP, ESP, EIP,ESI,EDI)
● Holds references to addresses as well as indexes
Holds starting address of program segments
● CS, DS, SS, ES
Rutgers University David Menendez 30 30

x86 Registers
16 BITS
8 BITS
EAX AX
AH
AL
ECX CX
CH
CL
EDX DX
DH
DL
EBX BX
BH
BL
ESP –Stack Pointer
EBP — Base register of current stack frame
ESI — Source index for string operations
EDI — Destination index for string operations
32 BITS Rutgers University David Menendez
31

x86 Programming
• Mov instructions to move data from/to memory

• Addressing modes
• Understanding swap
• Arithmetic operations
• Condition codes
• Conditional and unconditional branches • Loops and switch statements
Operands and registers
Rutgers University David Menendez 32

Byte: 8 bits

Word: 16 bits (2 bytes)

E.g., char
E.g., short int
Data Format
Double Word: 32 bits ( 4 bytes)

Quad Word: 64 bits (8 bytes)

Instructions can operate on any data size
■ movl, movw, movb
● Move double word, word, byte, respectively

E.g., int, float
E.g., double
End character specifies what data size to be used
Rutgers University David Menendez 33

MOV instruction
Most common instruction is data transfer instruction
■ Mov SRC, DEST: Move source into destination
■ SRC and DEST are operands
■ DEST is a register or a location
■ SRC can be the contents of register, memory location, constant, or a label.
■ If you use gcc, you will see movl ,
■ Alltheinstructionsinx86are32-bit
Used to copy data:
■ Constant to register (immediate)
■ Memory to register
■ Register to memory
■ Register to register
Cannot copy memory to memory in a single instruction
Rutgers University
David Menendez
34 34

Immediate Addressing
Operand is immediate
■ ■ ■ ■
Operand value is found immediately following the instruction Encoded in 1, 2, or 4 bytes
$ in front of immediate operand E.g., movl $0x4040, %eax
ADDR
000f
00A1 00A2
memory
movl %eax
4040
Rutgers University David Menendez
35

Register Mode Addressing
Use % to denote register

Source operand: use value in specified register
Destination operand: use register as destination for value
Examples:



E.g., %eax
movl %eax, %ebx
● Copy content of %eax to %ebx
movl $0x4040, %eax ! immediate addressing ● Copy 0x4040 to %eax
movl %eax, 0x0000f !Absolute addressing
● Copy content of %eax to memory location 0x0000f
Rutgers University David Menendez 36 36

Indirect Mode Addressing
Content of operand is an address

Offset can be specified as immediate mode
Examples:


Designated as parenthesis around operand
movl (%ebp), %eax
● Copy value from memory location whose address is in ebp into eax
movl -4(%ebp), %eax
● Copy value from memory location whose address is -4 away from content of ebp into eax
Rutgers University David Menendez 37

Indexed Mode Addressing
Add content of two registers to get address of operand


Useful for dealing with arrays
■ ■

movl (%ebp, %esi), %eax
● Copy value at (address = ebp + esi) into eax
movl 8(%ebp, %esi),%eax
● Copy value at (address = 8 + ebp + esi) into eax
If you need to walk through the elements of an array Use one register to hold base address, one to hold index
● E.g., implement C array access in a for loop
Index cannot be ESP
Rutgers University David Menendez 38 38

Scaled Indexed Mode Addressing
Multiply the second operand by the scale (1, 2, 4 or 8)

movl 0x80 (%ebx, %esi, 4), %eax
● Copy value at (address = ebx + esi*4 + 0x80) into eax
Where is it useful?
Rutgers University David Menendez 39 39

Address Computation Examples
%edx
0xf000
%ecx
0x100
Expression
Computation
Address
0x8(%edx)
0xf000 + 0x8
0xf008
(%edx,%ecx)
0xf000 + 0x100
0xf100
(%edx,%ecx,4)
0xf000 + 4*0x100
0xf400
0x80(,%edx,2)
2*0xf000 + 0x80
0x1e080
40

movl Operand Combinations
Source
Imm
movl
Destination C Analog
Reg Mem
movl $0x4,%eax temp = 0x4; movl$-147,(%eax) *p=-147;

Cannot do memory-memory transfers with single instruction
Reg
Mem Reg
Reg Mem
movl %eax,%edx
movl %eax,(%edx)
movl (%eax),%edx
temp2 = temp1;
*p = temp;
temp = *p;
Rutgers University David Menendez 41

Stack Operations
By convention, %esp is used to maintain a stack in memory

%esp contains the address of top of stack
Instructions to push (pop) content onto (off of) the stack


Where does the stack start? We’ll discuss later
Used to support C function calls
pushl %eax
● esp=esp–4
● Memory[esp] = eax
popl %ebx
● ebx = Memory[esp] ● esp=esp+4
Rutgers University David Menendez 42

Using Simple Addressing Modes
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx
Up
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Body
Finish

Understanding Swap
Offset
12 8 4 0 -4
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
Stack
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
%ebp
# ecx = yp
# edx = xp #eax=*yp(t1) #ebx=*xp(t0) #*xp=eax #*yp=ebx
Register Variable
%ecx yp %edx xp %eax t1 %ebx t0

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
%ecx
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx #
ecx = yp
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# # # # #
edx=xp
eax = *yp (t1) ebx = *xp (t0) *xp = eax
*yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
%edx
0x124
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx #
ecx = yp
edx=xp
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
#
# # # #
eax = *yp (t1)
ebx = *xp (t0)
*xp = eax
*yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
123
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
456
456
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx

Understanding Swap
Offset
Address
0x124
0x120
0x11c
0x118
0x114
0x110
0x10c
0x108
0x104
0x100
456
123
0x120
0x124
Rtn adr
%eax
456
%edx
0x124
%ecx
0x120
%ebx
123
%esi
%edi
%esp
%ebp
0x104
yp 12 xp 8 4 0 -4
%ebp
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
# ecx = yp
# edx=xp
# eax = *yp (t1) # ebx = *xp (t0) # *xp = eax
# *yp = ebx

Swap in x86-64: 64-bit Registers
rax
eax
rcx
ecx
rdx
edx
rbx
ebx
rsp
esp
rbp
ebp
rsi
esi
rdi
edi
r8
r9
r10
r11
r12
r13
r14
r15

Swap in x86-64 bit
swap:
movl (%rdi), %edx
movl (%rsi), %eax
movl %eax, (%rdi)
movl %edx, (%rsi)
retq
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
Arguments passed in registers
■ ■
No stack operations
What happens with long int?
First, xp in rdi and yp in rsi
64-bit pointers, data values are 32-bit ints, so uses eax/edx

IA32 Stack


■ Register%espindicates lowest stack address
● address of top element
Stack Pointer
%esp
Stack “Bottom”
Increasing Addresses
Region of memory managed with stack discipline
Grows toward lower addresses
Stack Grows Down
Rutgers University
David Menendez
54
Stack “Top”

IA32 Stack Pushing
Pushing
■ pushl Src

■ Decrement%espby4

Stack “Bottom”
Increasing Addresses
Fetch operand at Src Write operand at address
given by %esp
Stack Pointer
-4
Stack Grows Down
%esp
Rutgers University
David Menendez
55
Stack “Top”

IA32 Stack Popping
Popping
■ popl Dest

■ Increment%espby4

Stack “Bottom”
Increasing Addresses
Read operand at address given by %esp
Write to Dest
Stack Pointer
%esp
+4
Stack Grows Down
Rutgers University
David Menendez
56
Stack “Top”

Stack Operation Examples
pushl %eax
popl %edx
123
123
213
123
213
0x110
0x10c
0x108
0x110
0x10c
0x108
0x104
%eax
%edx
%esp
0x110
0x10c
0x108
0x104
%eax
%edx
%esp
213
555
0x108
213
555
0x1084
213
525153
0x1048
%eax
%edx
%esp

Procedure Control Flow
■ Use stack to support procedure call and return Procedure call:
call label Push return address on stack; Jump to label
Return address value
■ Address of instruction beyond call ■ Example from disassembly
804854e: e8 3d 06 00 00 8048553: 50
●Return address = 0x8048553 Procedure return:
call 8048b90


pushl %eax
■ ret Pop address from stack; Jump to address
Rutgers University David Menendez 58

0x110
0x10c
0x108
%esp %eip
0x108
0x804854e
Procedure Call Example
804854e: e8 3d 06 00 00 call 8048b90


8048553: 50
pushl %eax
call 8048b90
0x110
0x10c
0x108
0x104
%esp 0x1084 %eip 0x80485b49e0
123
123
0x8048553
%eip is program counter

8048591: c3
Procedure Return Example
ret
0x110
0x10c
0x108
0x104
%esp %eip
0x110
0x10c
0x108
ret
123
0x8048553
123
0x8048553
0x104
0x8048591
%esp 0x1048 %eip 0x804855931
%eip is program counter

Address Computation Instruction
leal: compute address using addressing mode without accessing memory
leal src, dest
■ ■
Use

Example:

src is address mode expression Set dest to address specified by src
Computing address without doing memory reference
● E.g., translation of p = &x[i];
leal 7(%edx, %edx, 4), %eax
● eax=4*edx+edx+7=5*edx+7
Rutgers University David Menendez 61

Some Arithmetic Operations
Instruction addl Src,Dest subl Src,Dest imull Src,Dest sall Src,Dest sarl Src,Dest xorl Src,Dest andl Src,Dest orl Src,Dest
Computation
Dest = Dest + Src
Dest = Dest – Src
Dest = Dest * Src
Dest = Dest << Src (left shift) Dest = Dest >> Src (right shift) Dest = Dest ^ Src
Dest = Dest & Src Dest = Dest | Src
Rutgers University
David Menendez 62

Some Arithmetic Operations
Instruction
incl Dest decl Dest negl Dest notl Dest
Computation
Dest = Dest + 1 Dest = Dest – 1 Dest = – Dest Dest = ~ Dest
Rutgers University
David Menendez 63

Using leal for Arithmetic Expressions
arith:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax
movl %ebp,%esp
popl %ebp
ret
Set Up
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
Body
Finish

Understanding arith Offset
16 12 8 4 0
Stack
• • •
z
y
x
Rtn adr
Old %ebp
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}
movl 8(%ebp),%eax
movl 12(%ebp),%edx
leal (%edx,%eax),%ecx
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx
addl 16(%ebp),%ecx
leal 4(%edx,%eax),%eax
imull %ecx,%eax
# edx = 48*y (t4)
# ecx = z+t1 (t2)
# eax = 4+t4+x (t5)
# eax = t5*t2 (rval)
# eax = x
# edx = y
# ecx = x+y (t1)
%ebp

Understanding arith
# eax = x
movl 8(%ebp),%eax
# edx = y
movl 12(%ebp),%edx
# ecx = x+y (t1)
leal (%edx,%eax),%ecx
# edx = 3*y
leal (%edx,%edx,2),%edx
# edx = 48*y (t4)
sall $4,%edx
# ecx = z+t1 (t2)
addl 16(%ebp),%ecx
# eax = 4+t4+x (t5)
leal 4(%edx,%eax),%eax
# eax = t5*t2 (rval)
imull %ecx,%eax
int arith
(int x, int y, int z)
{
int t1 = x+y;
int t2 = z+t1;
int t3 = x+4;
int t4 = y * 48;
int t5 = t3 + t4;
int rval = t2 * t5;
return rval;
}

Another Example
logical:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
xorl 12(%ebp),%eax
sarl $17,%eax
andl $8185,%eax
movl %ebp,%esp
popl %ebp
ret
eax = x
eax = x^y (t1) eax = t1>>17 (t2) eax = t2 & 8185
Set Up
int logical(int x, int y)
{
int t1 = x^y;
int t2 = t1 >> 17;
int mask = (1<<13) - 7; int rval = t2 & mask; return rval; } Body Finish 213 =8192,213 –7=8185 movl 8(%ebp),%eax xorl 12(%ebp),%eax sarl $17,%eax andl $8185,%eax Mystery Function What does the following piece of code do? A. Add two variables B. Subtract two variables C. Swap two variables D. No idea movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) Rutgers University David Menendez 68 What does this function do? .globl foo .type foo, @function foo: pushl %ebp movl movl imull addl popl %ebp ret %esp, %ebp 16(%ebp), %eax 12(%ebp), %eax 8(%ebp), %eax Control Flow/Conditionals How do we represent conditionals in assembly? A conditional branch can implement all control flow constructs in higher level language ■ Examples: if/then, while, for A unconditional branch for constructs like break/ continue Condition Codes Single Bit Registers CF Carry Flag SF Sign Flag ZF Zero Flag OF Overflow Flag Can be set either implicitly or explicitly. ■ Implicitly by almost all logic and arithmetic operations ■ Explicitly by specific comparison operations Not Set by leal instruction ■ Intended for use in address computation only Rutgers University David Menendez 71 jX Instructions ■ Jumping Jump to different part of code depending on condition codes jX Condition Description jmp 1 Unconditional je ZF Equal / Zero jne ~ZF Not Equal / Not Zero js SF Negative jns ~SF Nonnegative jg ~(SF^OF)&~ZF Greater (Signed) jge ~(SF^OF) Greater or Equal (Signed) jl (SF^OF) Less (Signed) jle (SF^OF)|ZF Less or Equal (Signed) ja ~CF&~ZF Above (unsigned) jb CF Below (unsigned) Rutgers University David Menendez 72 Condition Codes Implicitly Set By Arithmetic Operations addl Src,Dest Canalog: t=a+b ■ CF set if carry out from most significant bit ●Used to detect unsigned overflow ■ ZFsetift==0 ■ SFsetift<0 ■ OF set if two’s complement overflow (a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
Rutgers University David Menendez 73

Setting Condition Codes (cont.)
Explicit Setting by Compare Instruction
cmpl Src2,Src1
cmpl b,a like computing a-b without setting destination
NOTE: The operands are reversed. Source of confusion






CF set if carry out from most significant bit
● Used for unsigned comparisons
ZFsetifa==b
SFsetif(a-b)<0 OF set if two’s complement overflow (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-
b)>0)
Rutgers University David Menendez 74

Setting Condition Codes (cont.)
Explicit Setting by Test instruction
testl Src2,Src1
■ SetsconditioncodesbasedonvalueofSrc1&Src2
● Useful to have one of the operands be a mask
■ testl b,a like computing a&b without setting destination ■ ZF set whena&b == 0
■ SFsetwhena&b<0 Rutgers University David Menendez 75 Conditional Branch Example _max: pushl %ebp movl %esp,%ebp Set Up Body movl 8(%ebp),%edx movl 12(%ebp),%eax cmpl %eax,%edx jle L9 movl %edx,%eax L9: movl %ebp,%esp popl %ebp ret Finish Conditional Branch Example _max: pushl %ebp movl %esp,%ebp Set Up Body int max(int x, int y) { if (x <= y) return y; else return x; } movl 8(%ebp),%edx movl 12(%ebp),%eax cmpl %eax,%edx jle L9 movl %edx,%eax L9: movl %ebp,%esp popl %ebp ret Finish Conditional Branch Example (Cont.) int max(int x, int y) { if (x <= y) return y; else return x; } int goto_max(int x, int y) { int rval = y; int ok = (x <= y); if (ok) goto done; rval = x; done: return rval; } movl 8(%ebp),%edx movl 12(%ebp),%eax # eax = x y goto L9 x cmpl %eax,%edx jle L9 movl %edx,%eax L9: # x : y # if <= # eax = # Done: ■ Generally considered bad coding style # edx = ■ C allows “goto” as means of transferring control ● Closer to machine- level programming style Rutgers University David Menendez 78 Skipped when x ≤ y .LC0: .string "%d” .text .globl foo .type foo, @function foo: pushl %ebp movl subl leal movl movl call scanf cmpl $4, -12(%ebp) je .L3 call explode_bomb .L3: leave .p2align 4,,3 ret Rutgers University David Menendez 79 %esp, %ebp $40, %esp -12(%ebp), %eax %eax, 4(%esp) $.LC0, (%esp) Mystery Function C Code “Do-While” Loop Example int fact_do(int x) { int result = 1; do { result *= x; x = x-1; } while (x > 1);
return result;
}
Rutgers University David Menendez 80

“Do-While” Loop Example
C Code Goto Version
int fact_do(int x)
{
int result = 1;
do {
result *= x;
x = x-1;
} while (x > 1);
return result;
}
int fact_goto(int x)
{
int result = 1;
loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; return result;
}
■ ■
Use backward branch to continue looping Only take branch when “while” condition holds
Rutgers University David Menendez 81

“Do-While” Loop Compilation
Goto Version
Assembly
int fact_goto(int x)
{
int result = 1;
loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; return result;
}
Registers
%edx x
%eax result
_fact_goto:
pushl %ebp
movl %esp,%ebp
movl $1,%eax
movl 8(%ebp),%edx
L11:
imull %edx,%eax
decl %edx
cmpl $1,%edx
jg L11
movl %ebp,%esp
popl %ebp
ret
# Setup
# Setup
# eax = 1
# edx = x
# result *= x
# x–
# Compare x : 1
# if > goto loop
# Finish
# Finish
# Finish
Rutgers University
David Menendez
82

C Code
Goto Version
General “Do-While” Translation
do
Body
while (Test);
loop: Body
if (Test) goto loop
■ Body can be any C statement ●Typically compound statement:
{
Statement1; Statement2;
… Statementn;
}
■ Test is expression returning integer
= 0 interpreted as false ≠0 interpreted as true
Rutgers University David Menendez 83

C Code
“While” Loop Example #1
int fact_while(int x)
{
int result = 1;
while (x > 1) {
result *= x;
x = x-1; };
return result;
}
Rutgers University David Menendez 84

Actual “While” Loop Translation
C Code
Goto Version
int fact_while(int x)
{
int result = 1;
while (x > 1) {
result *= x;
x = x-1; };
return result;
}
int fact_while_goto2
(int x) {
int result = 1;
if (!(x > 1))
goto done; loop:
result *= x;
x = x-1;
if (x > 1)
goto loop; done:
return result;
}


Uses same inner loop as do-while version
Guards loop entry with extra test
Rutgers University
David Menendez
85

General “While” Translation
C Code
while (Test) Body
Do-While Version
Goto Version
if (!Test) goto done;
loop: Body
if (Test) goto loop;
done:
if (!Test)
goto done; do
Body
while(Test); done:

Switch Statements
Implementation Options
■ Series of conditionals ● Good if few cases ● Slow if many
■ Jump Table
● Lookup branch target
● Avoids conditionals
● Possible when cases are small integer constants
■ GCC
● Picks one based on case
structure
■ Bug in example code ● No default given
typedef enum
{ADD, MULT, MINUS, DIV, MOD, BAD}
op_type;
char unparse_symbol(op_type op)
{
switch (op) {
case ADD :
return ‘+’;
case MULT:
return ‘*’;
case MINUS:
return ‘-‘;
case DIV:
return ‘/’;
case MOD:
return ‘%’;
case BAD:
return ‘?’; }
}
Rutgers University
David Menendez
87

Switch Form
Jump Table
Jump Targets
Targ0:
Targ1:
Targ2:
Jump Table Structure
Targ0
Targ1
Targ2
• • •
Targn-1
switch(op) {
case val_0:
Block 0 case val_1:
Block 1
•••
case val_n-1:
Block n–1 }
Approx. Translation
• • •
target = JTab[op];
goto *target;
jtab:
Targn-1:
Code Block 0
Code Block 1
Code Block 2
Code Block n–1

Switch Statement Example
Branching Possibilities
Enumerated Values
ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4 BAD 5
typedef enum
{ADD, MULT, MINUS, DIV, MOD, BAD}
op_type;
char unparse_symbol(op_type op)
{
switch (op) {
••• }
}
Setup:
unparse_symbol:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),%eax
cmpl $5,%eax
ja .L49
jmp *.L57(,%eax,4)
# Setup
# Setup
# eax = op
# Compare op : 5
# If > goto done
# goto Table[op]
Rutgers University
David Menendez
89

Assembly Setup Explanation
Table Structure

■ Baseaddressat.L57
Each target requires 4 bytes
Jumping
jmp .L49
● Jump target is denoted by label .L49
jmp *.L57(,%eax,4)
● Start of jump table denoted by label .L57 ● Register %eax holds op
● Must scale by factor of 4 to get offset into table
● Fetch target from effective Address .L57 + op*4
Rutgers University David Menendez 90

Table Contents
Targets & Completion
Jump Table
.L51:
movl $43,%eax # ’+’ jmp .L49
.L52:
movl $42,%eax # ’*’ jmp .L49
.L53:
movl $45,%eax # ’-’ jmp .L49
.L54:
movl $47,%eax # ’/’ jmp .L49
.L55:
movl $37,%eax # ’%’ jmp .L49
.L56:
movl $63,%eax # ’?’
# Fall Through to .L49
.section .rodata
.align 4
.L57:
.long .L51 #Op = 0 .long .L52 #Op = 1 .long .L53 #Op = 2 .long .L54 #Op = 3 .long .L55 #Op = 4 .long .L56 #Op = 5
Enumerated Values
ADD 0 MULT 1 MINUS 2 DIV 3 MOD 4 BAD 5
Rutgers University
David Menendez 91

Switch Statement Completion
.L49:
movl %ebp,%esp
popl %ebp ret
# Done:
# Finish
# Finish
# Finish
Puzzle
■ Whatvaluereturnedwhenopisinvalid?
Answer
■ Register%eaxsettoopatbeginningofprocedure

Advantage of Jump Table

This becomes the returned value
Can do k-way branch in O(1) operations
Rutgers University David Menendez 92

Reading Condition Codes
SetX Instructions

Set single byte based on combinations of condition codes
SetX
Condition
Description
sete
ZF
Equal / Zero
setne
~ZF
Not Equal / Not Zero
sets
SF
Negative
setns
~SF
Nonnegative
setg
~(SF^OF)&~ZF
Greater (Signed)
setge
~(SF^OF)
Greater or Equal (Signed)
setl
(SF^OF)
Less (Signed)
setle
(SF^OF)|ZF
Less or Equal (Signed)
seta
~CF&~ZF
Above (unsigned)
setb
CF
Below (unsigned)
Rutgers University David Menendez 93

SetX Instructions


Reading Condition Codes (Cont.)
Set single byte based on combinations of condition codes
%eax
%ah
%al
%edx
%dh
%dl
%ecx
%ch
%cl
%ebx
%bh
%bl
%esi
%edi
%esp
%ebp
One of 8 addressable byte registers
● Embedded within first 4 integer registers
● Does not alter remaining 3 bytes ● Typically use movzbl to finish job
int gt (int x, int y) {
return x > y; }
Body
movl 12(%ebp),%eax # eax = y
cmpl %eax,8(%ebp)
setg %al
movzbl %al,%eax
# Compare x : y
# al = x > y
# Zero rest of %eax
Note inverted ordering!
Rutgers University David Menendez
94

Can you write the C code for this assembly?
.globl test
.type test, @function
test:
pushl %ebp
movl %esp, %ebp pushl %ebx
movl 8(%ebp), %edx movl 12(%ebp), %ecx movl $1, %eax
cmpl %ecx, %edx jge .L3
What does this function do? What is the C code?
.L6:
leal imull
addl cmpl
jg .L6 .L3:
popl %ebx popl %ebp ret
(%edx,%ecx), %ebx %ebx, %eax
$1, %edx %edx, %ecx

Stack-Based Languages
Languages that Support Recursion
■ ■

Stack Discipline


Stack Allocated in Frames (Activation records)

e.g., C, Pascal, Java Code must be “Reentrant”
● Multiple simultaneous instantiations of single procedure
Need some place to store state of each instantiation
● Arguments, local variables, return pointer
State for given procedure needed for limited time
● From when called to when return
Callee returns before caller does
state for single procedure instantiation
Rutgers University David Menendez 96

Call Chain Example
Code Structure
Call Chain
yoo(…)
{

• who(); •

}
yoo
who
amI amI
amI
amI
who(…)
{
•••
amI(); ••• amI(); •••
}
amI(…)
{

• amI(); •

}
■ Procedure amI recursive
Rutgers University David Menendez
97

Stack Frames
Contents
■ ■
Management


Pointers
Local variables, return value Temporary space
Space allocated when enter procedure
● “Set-up” code
Frame Pointer
Deallocated when return
● “Finish” code %ebp
■ Stack pointer %esp : stack top
■ Frame pointer %ebp : start of current
frame
Stack “Top”
Rutgers University David Menendez
98
yoo
Stack Pointer
%esp
who
amI
proc

• • •
yoo
Stack Operation
Call Chain
yoo
Frame Pointer
%ebp
Stack
Pointer
%esp
yoo(…)
{

• who(); •

}
Rutgers University
David Menendez
99

• • •
yoo
who
Stack Operation
Call Chain
yoo who
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}

• • •
yoo
who
amI
Stack Operation
Call Chain
yoo who amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• amI(); •

}

• • •
yoo
who
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• amI(); •

}

• • •
yoo
who
amI
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• amI(); •

}

• • •
yoo
who
amI
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• amI(); •

}

• • •
yoo
who
amI
Stack Operation
Call Chain
yoo who amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• amI(); •

}

• • •
yoo
who
Stack Operation
Call Chain
yoo who amI
amI amI
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}

• • •
yoo
who
amI
Stack Operation
Call Chain
yoo
who
amI amI
amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
amI(…)
{

• • •
}

• • •
yoo
who
Stack Operation
Call Chain
yoo
who
amI amI
amI amI
Frame Pointer
%ebp
Stack
Pointer
%esp
who(…)
{
•••
amI(); ••• amI(); •••
}

• • •
yoo
Stack Operation
Call Chain
yoo
who
amI amI amI
amI
Frame Pointer
%ebp
Stack
Pointer
%esp
yoo(…)
{

• who(); •

}

IA32/Linux Stack Frame
Current Stack Frame (“Top” to Bottom)


■ ■
Caller Stack Frame


Arguments
Return Addr
Old %ebp
Saved Registers + Local Variables
Argument Build
Parameters for function about to call
● “Argument build”
Caller Frame
Frame Pointer (%ebp)
Local variables
● If can’t keep in registers
Saved register context Old frame pointer
Return address
● Pushed by call instruction Arguments for this call
Stack Pointer (%esp)
Rutgers University David Menendez 110

Revisiting swap
Calling swap from call_swap
int zip1 = 15213;
int zip2 = 91125;
void call_swap()
{
swap(&zip1, &zip2);
}
call_swap: •••
pushl $zip2 pushl $zip1 call swap •••
# Global Var
# Global Var
• • •
&zip2
&zip1
Rtn adr
Resulting Stack
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
%esp

Revisiting swap
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx
Up
void swap(int *xp, int *yp)
{
int t0 = *xp;
int t1 = *yp;
*xp = t1;
*yp = t0;
}
movl 12(%ebp),%ecx
movl 8(%ebp),%edx
movl (%ecx),%eax
movl (%edx),%ebx
movl %eax,(%edx)
movl %ebx,(%ecx)
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Body
Finish

swap Setup #1 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
%ebp
%ebp
%esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx
%esp

swap Setup #2 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
%ebp
%esp
%ebp %esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx

swap Setup #3 Entering
Resulting Stack
Stack
• • •
&zip2
&zip1
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
%ebp
%esp
%ebp %esp
swap:
pushl %ebp
movl %esp,%ebp
pushl %ebx

Effect of swap Setup Entering
Resulting Stack
Stack
• • •
&zip2
12
• • •
yp
&zip1
8
xp
Rtn adr
%esp 4
Rtn adr
Old %ebp
Old %ebx
%ebp
Offset (relative to %ebp)
0
Body
%ebp %esp
movl 12(%ebp),%ecx # get yp movl 8(%ebp),%edx # get xp …

swap Finish #1 swap’s
Stack Offset
Offset
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
12 8 4 0
12 8 4 0 -4
%ebp -4 %esp
%ebp %esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
117
Observation
■ Saved&restoredregister%ebx Rutgers University David Menendez

12 8 4 0
swap Finish #2 swap’s
• • •
yp
xp
Rtn adr
Old %ebp
Old %ebx
• • •
yp
xp
Rtn adr
Old %ebp
swap’s Stack
Offset
Stack Offset
12 8 4 0
%ebp -4 %esp
%ebp %esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret

swap Finish #3 swap’s
%ebp
• • •
yp
xp
Rtn adr
• • •
yp
xp
Rtn adr
Old %ebp
swap’s Stack
Offset
12 8 4
0 %ebp %esp
Stack Offset
12 8 4
%esp
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret

swap Finish #4 %ebp
%ebp
Exiting Stack
%esp
• • •
yp
xp
Rtn adr
• • •
&zip2
&zip1
swap’s Stack
Offset
12 8 4
Observation
%esp
■ Saved&restoredregister%ebx
■ Didn’tdosofor%eax,%ecx,or%edx
movl -4(%ebp),%ebx
movl %ebp,%esp
popl %ebp
ret
Rutgers University David Menendez
120

Register Saving Conventions
When procedure yoo calls who:
■ yooisthecaller,whoisthecallee
Can Register be Used for Temporary Storage?
yoo: •••
movl $15213, %edx call who
addl %edx, %eax •••
ret
who: •••
movl 8(%ebp), %edx addl $91125, %edx •••
ret
■ Contentsofregister%edxoverwrittenbywho
Rutgers University David Menendez 121

Register Saving Conventions
When procedure yoo calls who:
■ yooisthecaller,whoisthecallee
Can Register be Used for Temporary Storage?
Conventions


“Caller Save”
● Caller saves temporary in its frame before calling
“Callee Save”
● Callee saves temporary in its frame before using
Rutgers University David Menendez 122

IA32/Linux Register Usage
Two have special uses
■ %ebp,%esp
Three managed as callee-save
■ %ebx,%esi,%edi

Three managed as caller-save
■ %eax,%edx,%ecx

Caller-Save Temporaries
%eax
%edx
%ecx
%ebx
%esi
%edi
%esp
%ebp
Old values saved on stack prior to using
Do what you please, but expect any
Callee-Save Temporaries
callee to do so, as well
Special
Register %eax also stores returned value
Rutgers University David Menendez 123

.globl rfact
.type
rfact,@function
rfact:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),
cmpl $1,%ebx
jle .L78
leal -1(%ebx),%eax
pushl %eax
call rfact
imull %ebx,%eax
jmp .L79
.align 4
.L78:
movl $1,%eax
.L79:
movl %ebp,%esp
popl %ebp
ret
pushl %ebx
movl -4(%ebp),%ebx
%ebx
Recursive Function
Rutgers University David Menendez 124

.globl rfact
.type
rfact,@function
rfact:
pushl %ebp
movl %esp,%ebp
movl 8(%ebp),
cmpl $1,%ebx
jle .L78
leal -1(%ebx),%eax
pushl %eax
call rfact
imull %ebx,%eax
jmp .L79
.align 4
.L78:
movl $1,%eax
.L79:
movl %ebp,%esp
popl %ebp
ret
pushl %ebx
movl -4(%ebp),%ebx
%ebx
Recursive Factorial
int rfact(int x)
{
int rval;
if (x <= 1) return 1; rval = rfact(x-1); return rval * x; } Registers ■ %eaxusedwithoutfirstsaving ■ %ebxused,butsaveat beginning & restore at end Rutgers University David Menendez 125 pre %ebp pre %ebx x Rtn adr Caller Rfact Stack Setup %ebp Entering Stack %esp rfact: pushl %ebp movl %esp,%ebp pushl %ebx pre %ebp pre %ebx x Rtn adr Old %ebp Old %ebx Caller 8 4 Callee 0 %ebp -4 %esp Rfact Body movl 8(%ebp),%ebx cmpl $1,%ebx jle .L78 leal -1(%ebx),%eax pushl %eax # ebx = x # Compare x : 1 # If <= goto Term # eax = x-1 # Push x-1 # rfact(x-1) # rval * x # Goto done call rfact imull %ebx,%eax jmp .L79 .L78: movl $1,%eax .L79: # Term: # return val = 1 # Done: Recursion Registers %ebx Stored value of x %eax ● Temporary value of x-1 ●Returned value from rfact(x-1) ●Returned value from this call int rfact(int x) { int rval; if (x <= 1) return 1; rval = rfact(x-1) ; return rval * x; } Rutgers University David Menendez 127 Rfact Recursion leal -1(%ebx),%eax x Rtn adr Old %ebp Old %ebx pushl %eax x Rtn adr Old %ebp Old %ebx x-1 %ebp %esp call rfact %ebp %esp %ebp %esp x Rtn adr Old %ebp Old %ebx x-1 Rtn adr x-1 x %eax %ebx x-1 x %eax %ebx x-1 x %eax %ebx Return from Call Rfact Result x Rtn adr Old %ebp Old %ebx x-1 x Rtn adr Old %ebp Old %ebx x-1 %ebp %esp %ebp %esp (x-1)! x (xx-!1)! x %eax %ebx %eax %ebx Assume that rfact(x-1) returns (x-1)! in register %eax imull %ebx,%eax movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Rfact Completion pre %ebp pre %ebx x Rtn adr Old %ebp Old %ebx x-1 8 4 0 %ebp -4 -8 %esp 8 4 0 %eax %ebx %ebp %esp pre %ebp pre %ebx x Rtn adr Old %ebp pre %ebp pre %ebx x Rtn adr x! x Old %ebx %eax %ebx %ebp %esp %eax %ebx x! Old %ebx x! Old %ebx Integral byte b word w double word l Floating Point 1 [unsigned] char 2 [unsigned] short 4 [unsigned] int Basic Data Types ■ Stored & operated on in general registers ■ Signed vs. unsigned depends on instructions used Intel GAS Bytes C ■ Stored & operated on in floating point registers Intel GAS Bytes C Single s Double l Extended t 4 float 8 double 10/12 long double Rutgers University David Menendez 131 Array Allocation Basic Principle T A[L]; ■ ArrayofdatatypeTandlengthL ■ ContiguouslyallocatedregionofL*sizeof(T)bytes char string[12]; int val[5]; double a[4]; x x x + 4 x + 8 x + 16 x + 12 x + 12 x + 16 x + 24 x + 20 x + 32 x x + 8 char *p[3]; Rutgers University x x+4 x+8 David Menendez 132 zip_dig cmu; zip_dig mit; zip_dig ucb; Notes 16 20 24 28 32 36 36 40 44 48 52 56 56 60 64 68 72 76 Array Example typedef int zip_dig[5]; zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 }; 1 5 2 1 3 0 2 1 3 9 9 4 7 2 0 ■ Declaration “zip_dig cmu” equivalent to “int cmu[5]” ■ Example arrays were allocated in successive 20 byte blocks ● Not guaranteed to happen in general Rutgers University David Menendez 133 Array Accessing Example Computatio ■ Register %edx contains starting address of array ■ Register %eax contains array index ■ Desired digit at 4*%eax + %edx ■ Use memory reference (%edx, %eax,4) Memory Reference Code n int get_digit (zip_dig z, int dig) { return z[dig]; } # %edx = z # %eax = dig movl (%edx,%eax,4),%eax # z[dig] Rutgers University David Menendez 134 Referencing Examples zip_dig cmu; zip_dig mit; zip_dig ucb; 16 20 24 28 32 36 36 40 44 48 52 56 56 60 64 68 72 76 1 5 2 1 3 0 2 1 3 9 9 4 7 2 0 Code Does Not Do Any Bounds Checking! Reference mit[3] mit[5] mit[-1] cmu[15] Address Value Guaranteed? 36+4*3=48 3 36+4*5=56 9 36 + 4*-1 = 32 3 16 + 4*15 = 76 ?? Yes No No No ■ Out of range behavior implementation-dependent ● No guaranteed relative allocation of different arrays Rutgers University David Menendez 135 Array Loop Example int zd2int(zip_dig z) { int i; int zi = 0; for (i = 0; i < 5; i++) { zi = 10 * zi + z[i]; } return zi; } Original Source Transformed Versio ■ As generated by GCC ■ Eliminateloopvariablei ■ Convert array code to pointer code ■ Express in do-while form ● No need to test at entrance n int zd2int(zip_dig z) { int zi = 0; int *zend = z + 4; do { zi = 10 * zi + *z; z++; } while(z <= zend); return zi; } Rutgers University David Menendez 136 Array Loop Implementation Registers %ecx z %eax zi %ebx zend Computations ■ 10*zi+*zimplementedas *z + 2*(zi+4*zi) z++ increments by 4 int zd2int(zip_dig z) { int zi = 0; int *zend = z + 4; do { zi = 10 * zi + *z; z++; } while(z <= zend); return zi; } ■ # %ecx = z xorl %eax,%eax # zi = 0 leal 16(%ecx),%ebx # zend = z+4 .L59: leal (%eax,%eax,4),%edx # 5*zi movl (%ecx),%eax # *z addl $4,%ecx # z++ leal (%eax,%edx,2),%eax # zi = *z + 2*(5*zi) cmpl %ebx,%ecx # z : zend jle .L59 # if <= goto loop Rutgers University David Menendez 137 Multi-Level Array Example ■ Variableunivdenotes array of 3 elements ■ Each element is a pointer ● 4 bytes ■ Each pointer points to array of int’s univ 160 164 168 cmu mit16 20 24 28 32 36 ucb36 40 44 48 52 56 56 60 64 68 72 76 zip_dig cmu = { 1, 5, 2, 1, 3 }; zip_dig mit = { 0, 2, 1, 3, 9 }; zip_dig ucb = { 9, 4, 7, 2, 0 }; #define UCOUNT 3 int *univ[UCOUNT] = {mit, cmu, ucb}; 1 5 2 1 3 36 16 56 0 2 1 3 9 9 4 7 2 0 Rutgers University David Menendez 138 Element Access in Multi-Level Array int get_univ_digit (int index, int dig) { return univ[index][dig]; } Computation ■ ■ Element access Mem[Mem[univ+4*index]+4*dig] Must do two memory reads ● First get pointer to row array ● Then access element within array # %ecx = index # %eax = dig leal 0(,%ecx,4),%edx # 4*index movl univ(%edx),%edx # Mem[univ+4*index] movl (%edx,%eax,4),%eax # Mem[...+4*dig] Rutgers University David Menendez 139 Concept ■ ■ ■ Structures Contiguously-allocated region of memory Refer to members within structure by names Members may be of different types struct rec { int i; int a[3]; int *p; }; Memory Layout 20 i a p Accessing Structure Member 04 16 Assembly void set_i(struct rec *r, int val) { r->i = val;
}
# %eax = val
# %edx = r
movl %eax,(%edx) # Mem[r] = val
Rutgers University David Menendez
140

Generating Pointer to Struct. Member
r
04 16
r + 4 + 4*idx
struct rec {
int i;
int a[3];
int *p; };
i
a
p
int * find_a
(struct rec *r, int idx)
{
return &r->a[idx];
}
Generating Pointer to Array Element

Offset of each structure member determined at compile time
# %ecx = idx
# %edx = r
leal 0(,%ecx,4),%eax # 4*idx
leal 4(%eax,%edx),%eax # r+4*idx+4
Rutgers University
David Menendez
141

C Code
Structure Referencing (Cont.)
i
a
p
04 16
04 16 Element i
struct rec {
int i;
int a[3];
int *p; };
i
a
void
set_p(struct rec *r)
{
r->p =
&r->a[r->i];
}
# %edx = r
movl (%edx),%ecx # r->i
leal 0(,%ecx,4),%eax # 4*(r->i)
leal 4(%edx,%eax),%eax # r+4+4*(r->i)
movl %eax,16(%edx) # Update r->p
Rutgers University David Menendez
142

Alignment
Aligned Data



Motivation for Aligning Data

Compiler

Primitive data type requires K bytes
Address must be multiple of K
Required on some machines; advised on IA32
● treated differently by Linux and Windows!
Memory accessed by (aligned) double or quad-words
● Inefficient to load or store datum that spans quad word boundaries
Inserts gaps in structure to ensure correct alignment of fields
Rutgers University David Menendez 143

Specific Cases of Alignment
Size of Primitive Data Type:
■ 1 byte (e.g., char)
● no restrictions on address
■ 2 bytes (e.g., short)
● lowest 1 bit of address must be 02
■ 4 bytes (e.g., int, float, char *, etc.) ● lowest 2 bits of address must be 002
■ 8 bytes (e.g., double)
● Windows (and most other OS’s & instruction sets):
» lowest 3 bits of address must be 0002
● Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
■ 12 bytes (long double) ● Linux:
» lowest 2 bits of address must be 002
» i.e., treated the same as a 4-byte primitive data type
Rutgers University David Menendez 144

Satisfying Alignment with Structures
Offsets Within Structure

Overall Structure Placement


Example (under Windows):
■ K = 8, due to double element
p+0 p+4 p+8 p+16
Multiple of 4 Multiple of 8
Multiple of 8
Must satisfy element’s alignment requirement
struct S1 {
char c;
int i[2];
double v;
} *p;
Each structure has alignment requirement K
● Largest alignment of any element
Initial address & structure length must be multiples of K
c
i[0]
i[1]
v
p+24
Multiple of 8
Rutgers University David Menendez
145

Linux vs. Windows
struct S1 {
char c;
int i[2];
double v;
} *p;
Windows (including Cygwin):
■ K=8,duetodoubleelement
p+0 p+4 p+8
Multiple of 4 Multiple of 8
Linux:
p+16
Multiple of 8
p+24
c
i[0]
i[1]
v
■ K=4;doubletreatedlikea4-bytedatatype
Multiple of 8
c
i[0]
i[1]
v
p+0 p+4
Multiple of 4
Multiple of 4
p+8 p+12
Multiple of 4
p+20
Rutgers University
David Menendez
146
Multiple of 4

Overall Alignment Requirement
struct S2 {
double x;
int i[2];
char c;
} *p;
p must be multiple of: 8 for Windows
4 for Linux
p+0 p+8 p+12 p+16
Windows: p+24 Linux: p+20
x
i[0]
i[1]
c
struct S3 {
float x[2];
int i[2];
char c;
} *p;
x[0]
x[1]
i[0]
i[1]
c
p+0 p+4
p must be multiple of 4 (in either OS)
p+8 p+12 p+16 p+20

Ordering Elements Within Structure
struct S4 {
char c1;
double v;
char c2;
int i;
} *p;
10 bytes wasted space in Windows
c1
v
c2
i
p+0 p+8 p+16 p+20 p+24
2 bytes wasted space
p+0 p+8 p+12 p+16
struct S5 {
double v;
char c1;
char c2;
int i;
} *p;
v
c1
c2
i

Arrays of Structures
Principle


struct S6 {
short i;
float v;
short j;
} a[10];
Allocated by repeating allocation for array type
In general, may nest arrays & structures to arbitrary depth
a[1].i
a[1].v
a[1].j
a+12
a+16 a+20
a+24
a[0]
a[1]
a[2]
Rutgers University
David Menendez
149
a+0
a+12 a+24
a+36
•••

Satisfying Alignment within Structure
Achieving Alignment



Starting address of structure array must be multiple of worst-case alignment for any element
● a must be multiple of 4
Offset of element within structure must be multiple of
element’s alignment requirement
● v’s offset of 4 is a multiple of 4
Overall size of structure must be multiple of worst-case
alignment for any element
● Structure padded with unused space to be 12 bytes
struct S6 {
short i;
float v;
short j;
} a[10];
a[0]
•••
a[i]
•••
a+0
a+12i
a+12i
a[1].i
a[1].v
a[1].j
Multiple of 4
a+12i+4
Rutgers University
David Menendez 150
Multiple of 4

Summary
Arrays in C
■ ■ ■
Compiler Optimizations
■ Compiler often turns array code into pointer code (zd2int)
■ ■
Structures
■ ■
Contiguous allocation of memory Pointer to first element
No bounds checking
Uses addressing modes to scale array indices Lots of tricks to improve array indexing in loops
Allocate bytes in order declared
Pad in middle and at end to satisfy alignment
Rutgers University David Menendez 151