程序代写代做代考 graph data structure compiler assembly arm assembler x86 C cache Computer Systems Organization

Computer Systems Organization
Session 10 􏰃 Main Theme Machine-Level Programming Introduction
Dr. Jean-Claude Franchitti
New York University
Computer Science Department Courant Institute of Mathematical Sciences
Presentation material partially based on textbook slides C􏰄m􏰅􏰆􏰇e􏰈 S􏰉􏰊􏰇em􏰊: A P􏰈􏰄g􏰈amme􏰈􏰋􏰊 Pe􏰈􏰊􏰅ec􏰇i􏰌e
by Randal Bryant and David O􏰋Halla􏰈􏰄n
Slides copyright © 2020
1

Agenda
1
2
3
Instructor and Course Introduction Machine-Level Programming I – Introduction Summary and Conclusion
2

What is the class about?
􏰍Course description and syllabus:
» http://www.nyu.edu/classes/jcf/CSCI-GA.3033-012/
» http://www.cs.nyu.edu/courses/spring20/CSCI-UA.0205–001/ » Accessible via NYU Classes
􏰍 Textbooks:
» Comp􏰆􏰇er S􏰉s􏰇ems: A Programmer􏰋s Perspec􏰇i􏰌e
Ra􏰎dal B􏰈􏰉a􏰎􏰇 a􏰎d Da􏰌id O􏰋Halla􏰈􏰄􏰎 Pearson
ISBN-10: 013409266X, ISBN-13: 978-0134092669, 3rd Edition (03/2015) » Recommended:
» The C Programmning Language, Brian W Kernighan and Dennis Ritchie
Prentice Hall; ISBN-10: 0131103628; ISBN-13: 978-01311103627, 2nd Edition (04/88)
3

Icons / Metaphors
Information
Common Realization Knowledge/Competency Pattern Governance
Alignment
Solution Approach
4
4

Machine Programming I 􏰃 Introduction: Detailed Agenda
􏰍 History of Intel Processors and Architectures 􏰍 C, Assembly, Machine Code
􏰍 Assembly Basics: Registers, Operands, Move 􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations 􏰍 Procedure Call Operations
􏰍 Coding Practices
5

Agenda
1
2
3
Instructor and Course Introduction Machine-Level Programming I – Introduction Summary and Conclusion
6

Detailed Agenda: Machine Programming I: Introduction
􏰍 History of Intel processors and architectures 􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands, move 􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations 􏰍 Procedure Call Operations
􏰍 Coding Practices
7

Intel x86 Processors
􏰍 Dominate laptop/desktop/server market
􏰍 Evolutionary design
» Backwards compatible up until 8086, introduced in 1978 » Added more features as time goes on
􏰍 Complex instruction set computer (CISC)
» Many different instructions with many different formats 􏰏 But, only small subset encountered with Linux programs
» Hard to match performance of Reduced Instruction Set Computers (RISC)
􏰏 ARM architecture (in most cell phones and tablets) is RISC » But, Intel has done just that!
􏰏 In terms of speed. Less so for low power.
8

Intel x86 Evolution: Milestones
Name Date Transistors MHz
􏰍 8086 1978 29K 5-10 » First 16-bit Intel processor. Basis for IBM PC & DOS » 1MB address space
􏰍 386 1985 275K 16-33 » First 32 bit Intel processor , referred to as IA32
» Added 􏰐fla􏰇 add􏰈e􏰊􏰊i􏰎g􏰑, ca􏰅able 􏰄f 􏰈􏰆􏰎􏰎i􏰎g U􏰎i􏰒
Set 􏰍Pentium 4E 2004 125M 2800-3800 Instruction
» First 64-bit Intel x86 processor, referred to as x86-64
Architecture (ISA)
􏰍Core 2 2006 291M » First multi-core Intel processor
􏰍Core i7 2008 731M » Four cores (our shark machines)
1060-3500 1700-3900
We will cover x86-64 in the course
9

Intel x86 Processors (continued)
􏰍Machine Evolution
» 386 1985 0.3M » Pentium 1993 3.1M » Pentium/MMX 1997 4.5M » PentiumPro 1995 6.5M » Pentium III 1999 8.2M » Pentium 4 2001 42M » Core 2 Duo 2006 291M » Core i7 2008 731M
􏰍Added Features
» Instructions to support multimedia operations
» Instructions to enable more efficient conditional operations » Transition from 32 bits to 64 bits
» More cores
10

2015 State of the Art 􏰃 Core I7 Broadwell 2015
􏰍Desktop Model » 4 cores
» Integrated graphics » 3.3-3.8 GHz
» 65W
􏰍Server Model » 8 cores
» Integrated I/O » 2-2.6 GHz
» 45W
11

x86 Clones: Advanced Micro Devices (AMD)
􏰍 Historically
»AMD has followed just behind Intel »A little bit slower, a lot cheaper
􏰍 Then
»Recruited top circuit designers from Digital Equipment
Corp. and other downward trending companies »Built Opteron: tough competitor to Pentium 4 »Developed x86-64, their own extension to 64 bits
􏰍Recent Years
»Intel got its act together
􏰏 Leads the world in semiconductor technology
»AMD has fallen behind
􏰏 Relies on external semiconductor manufacturer
12

In􏰇el􏰓s 64-Bit History
􏰍 2001: Intel Attempts Radical Shift from IA32 to IA64 » Totally different architecture (Itanium)
» Executes IA32 code only as legacy
» Performance disappointing
􏰍 2003: AMD Steps in with Evolutionary Solution » x86-64 (􏰎􏰄􏰔 called 􏰐AMD64􏰑)
􏰍 Intel Felt Obligated to Focus on IA64
» Hard to admit mistake or that AMD is better
􏰍 2004: Intel Announces EM64T extension to IA32 » Extended Memory 64-bit Technology
» Almost identical to x86-64!
􏰍 All but low-end x86 processors support x86-64 » But, lots of code still runs in 32-bit mode
13

Course Coverage
􏰍 IA32
» The traditional x86
» For 15/18-213: RIP, Summer 2015
􏰍 x86-64
» The standard
» shark> gcc hello.c
» shark> gcc 􏰕m64 hello.c
􏰍 Presentation
» Course textbook covers x86-64 » Web aside on IA32
» We will only cover x86-64
14

Detailed Agenda: Machine Programming I: Basics
􏰍 History of Intel processors and architectures 􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands, move 􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations 􏰍 Procedure Call Operations
􏰍 Coding Practices
15

Definitions
􏰍 Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand or write assembly/machine code.
» Examples: instruction set specification, registers.
􏰍 Microarchitecture: Implementation of the architecture.
» Examples: cache sizes and core frequency. 􏰍 Code Forms:
» Machine Code: The byte-level programs that a processor executes
» Assembly Code: A text representation of machine code
􏰍 Example ISAs:
» Intel: x86, IA32, Itanium, x86-64
» ARM: Used in almost all mobile phones
16

Assembly/Machine Code View
Addresses Data
Instructions
Programmer-Visible State
» PC: Program counter
􏰏 Address of next instruction 􏰏 Called 􏰐RIP􏰑 (􏰒86-64)
» Register file
􏰏 Heavily used program data
» Condition codes
􏰏 Store status information about most
recent arithmetic or logical operation
􏰏 Used for conditional branching
»Memory
􏰏 Byte addressable array
􏰏 Code and user data
􏰏 Stack to support procedures
17
CPU
PC
Registers
Condition Codes
Memory
Object Code Program Data OS Data Stack

Assembly Characteristics: Data Types
􏰍 􏰐I􏰎􏰇ege􏰈􏰑 da􏰇a 􏰄f 1, 2, 4, 􏰄􏰈 8 b􏰉􏰇e􏰊
» Can be a data values
» Can be an address (e.g., untyped pointer of size 8 bytes in 64-bit machines or 4 bytes in 32-bit machines)
􏰍 Floating point data of 4, 8, or 10 bytes
􏰍 Code: Byte sequences encoding series of
instructions
􏰍 No aggregate types such as arrays or structures or any complex data structure
» Just contiguously allocated bytes in memory
18

Assembly Characteristics: 3 Kinds of Assembly Operations
􏰍 Perform arithmetic function on register or memory data
» Add, subtract, multiply, etc.
􏰍 Transfer data between memory and register » Load data from memory into register
» Store register data into memory
􏰍 Transfer control
» Unconditional jumps to/from procedures » Conditional branches
19

Turning C into Object Code
Optimization level
-Og means do not Output file
» Code in files p1.c p2.c optimize is p » Compile with command: gcc –Og p1.c p2.c -o p
􏰏 Use basic optimizations (-Og) [New to recent versions of GCC] 􏰏 Put resulting binary in file p
text
text binary
binary
C program (p1.c p2.c)
Compiler (gcc 􏰕Og -S)
Asm program (p1.s p2.s)
Assembler (gcc or as)
Object program (p1.o p2.o)
Linker (gcc or ld)
Executable program (p)
Static libraries (.a)
20

Compiling Into Assembly
C Code (sum.c)
Generated x86-64 Assembly
long plus(long x, long y);
void sumstore(long x, long y,
long *dest)
{
long t = plus(x, y);
*dest = t;
}
Obtain (on shark machine) with command
gcc 􏰕Og 􏰕S sum.c Produces file sum.s
Warning: Will get very different results on non-Shark machines (Linux, Mac OS-X, …􏰖 due to different versions of gcc and different compiler settings.
sumstore: pushq %rbx
movq %rdx, %rbx
call plus
movq %rax, (%rbx)
popq %rbx
ret
21

Object Code
Code for sumstore
0x0400595:
0x53
0x48
0x89
0xd3
0xe8
0xf2
0xff
0xff
0xff
0x48
0x89
0x03
0x5b
0xc3
􏰍 Assembler
» Translates.sinto.o
» Binary encoding of each instruction
» Nearly-complete image of executable code
» Missing linkages between code in different files
􏰍 Linker
» Resolves references between files
» Combines with static run-time libraries 􏰏 E.g.,codeformalloc,printf
» Some libraries are dynamically linked
􏰏 Linking occurs when program begins execution
• Total of 14 bytes
• Each instruction 1, 3, or 5 bytes
• Starts at address
0x0400595
22

Machine Instruction Example
*dest = t;
movq %rax, (%rbx)
􏰍 C Code (look at sum.c two slides back) » Store value t where designated
by dest 􏰍 Assembly
» Move 8-byte value to memory 􏰏 Quad words in x86-64 parlance
» Operands:
t: Register %rax dest: Register %rbx *dest: Memory M[%rbx]
􏰍Object Code
» 3-byte instruction
» Stored at address 0x40059e
0x40059e: 48 89 03
23

Disassembling Object Code
Disassembled
0000000000400595 : 400595: 53
400596: 48 89 d3
400599: e8 f2 ff ff ff
40059e: 48 89 03
4005a1: 5b
4005a2: c3
􏰍 Disassembler objdump –d sum
» Useful tool for examining object code
» Analyzes bit pattern of series of instructions
» Produces approximate rendition of assembly code
» Can be run on either a.out (complete executable) or .o file
push %rbx
mov %rdx,%rbx
callq 400590 mov %rax,(%rbx)
pop %rbx
retq
24

Alternate Disassembly
Object
Disassembled
0x0400595: 0x53 0x48 0x89 0xd3 0xe8 0xf2 0xff 0xff 0xff 0x48 0x89 0x03 0x5b 0xc3
􏰍 Within gdb Debugger gdb sum
disassemble sumstore
» Disassemble procedure
x/14xb sumstore
» Examine the 14 bytes starting at sumstore
25
Dump of assembler code for function sumstore:
0x0000000000400595 <+0>: push
0x0000000000400596 <+1>: mov
0x0000000000400599 <+4>: callq
0x000000000040059e <+9>: mov
0x00000000004005a1 <+12>:pop
0x00000000004005a2 <+13>:retq
%rbx
%rdx,%rbx
0x400590 %rax,(%rbx)
%rbx

What Can be Disassembled?
% objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386
No symbols in “WINWORD.EXE”.
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
Reverse engineering forbidden by
30001003: 6a ff push $0xffffffff
Microsoft End User License Agreement
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
􏰍 Anything that can be interpreted as executable code
􏰍 Disassembler examines bytes and reconstructs assembly source
26

Detailed Agenda: Machine Programming I: Basics
􏰍 History of Intel processors and architectures
􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands,
move
􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations
􏰍 Procedure Call Operations
􏰍 Coding Practices
27

X64 Registers
􏰍 x64 assembly code uses 16 64-bit registers
􏰍 The lower bytes of some of these registers may be accessed independently as 32-, 16-, or 8-bit registers
28

x86-64 Integer Registers
%rax
%eax
%rbx
%ebx
%rcx
%ecx
%rdx
%edx
%rsi
%esi
%rdi
%edi
%rsp
%esp
%rbp
%ebp
%r8
%r8d
%r9
%r9d
%r10
%r10d
%r11
%r11d
%r12
%r12d
%r13
%r13d
%r14
%r14d
%r15
%r15d
» Can reference low-order 4 bytes (also low-order 1 & 2 bytes)
29

Some History: IA32 Registers
%eax %ax
%ecx %cx
%edx
%dx
%ebx %bx
%ah
%ch
%al
%cl
%dh
%dl
%bh
%bl
%esi %si
%edi %di
%esp %sp
%ebp %bp
16-bit virtual registers (backwards compatibility)
Origin (mostly obsolete)
accumulate
counter
data
base
source index
destination
index
stack
pointer
base pointer
30
general purpose

Register Usage (1/3)
􏰍 There are sixteen 64-bit registers in x86-64: %rax, %rbx, %rcx, %rdx, %rdi, %rsi, %rbp,%rsp, and %r8-r15
􏰍 %rax, %rcx, %rdx, %rdi, %rsi, %rsp, and %r8- r11 are considered caller-save registers, meaning that they are not necessarily saved across function calls
􏰍 Registers %rbx,%rbp, and %r12-r15 are callee- save registers, meaning that they are saved across function calls
31

Register Usage (2/3)
􏰍 By convention, %rax i􏰊 􏰆􏰊ed 􏰇􏰄 􏰊􏰇􏰄􏰈e a f􏰆􏰎c􏰇i􏰄􏰎􏰋􏰊 􏰈e􏰇􏰆􏰈􏰎 value, if it exists and is no more than 64 bits long (larger return types like structs are returned using the stack)
􏰍 Register %rsp is used as the stack pointer, a pointer to the topmost element in the stack
􏰍 %rdi, %rsi, %rdx, %rcx, %r8, and %r9 are used to pass the first six integer or pointer parameters to called functions (additional parameters or large parameters such as structs passed by value are passed on the stack)
32

Register Usage (3/3)
􏰍 In 32-bit x86, the base pointer (formerly %ebp, now %rbp) was used to keep track of the base of the current stack frame, and a called function would save the base pointer of its caller prior to updating the base pointer to its own stack frame
􏰍 With the advent of the 64-bit architecture, this has been mostly eliminated, save for a few special cases when the compiler cannot determine ahead of time how much stack space needs to be allocated for a particular function (see dynamic stack allocation)
33

x64 Operand Specifiers
34

x64 Data Movement Instructions
35

Moving Data
%rax
􏰍 Moving Data
movq Source, Dest:
􏰍 Operand Types
» Immediate: Constant integer data
􏰏 Example: $0x400, $-533
􏰏 Like C constant, but prefixed with ‘$’
􏰏 Encoded with 1, 2, or 4 bytes
» Register: One of 16 integer registers
􏰏 Example: %rax, %r13
􏰏 But %rsp reserved for special use
􏰏 Others have special uses for particular instructions
» Memory: 8 consecutive bytes of memory at address given by register
􏰏 Simplest example: (%rax)
􏰏 Va􏰈i􏰄􏰆􏰊 􏰄􏰇he􏰈 􏰐address modes􏰑 la􏰇e􏰈
%rcx
%rdx
%rbx
%rsi
%rdi
%rsp
%rbp
%rN
36

movq Operand Combinations
Source Dest
Src,Dest
movq $0x4,%rax movq$-147,(%rax)
movq %rax,%rdx movq %rax,(%rdx)
movq (%rax),%rdx
C Analog
temp = 0x4; *p=-147;
temp2 = temp1; *p = temp;
temp = *p;
Imm movq Reg
Reg Mem
Reg Mem
Mem Reg
Cannot do memory-memory transfer with a single instruction (i.e., no memory-to-memory instruction)
37

movq
C Declaration Intel Data Type Assembly code Size (bytes) suffix
Char Byte b 1
Short Word w 2
Int Double Word l 4
Long Quad Word q 8
Pointer Quad Word q 8
38

Special Type of mov
􏰍 movz S, R 􏰗 R = ZeroExtend(S) » movzbw
» movzbl » movzbq » movzwl » movzwq
􏰍 movs S, R 􏰗 R = SignExtend(S) » movsbw
» movsbl » movsbq » movswl » movswq » movslq
􏰍 S: memory or register
R: register
39

Notes About Special mov Instructions
􏰍 Instructions that move or generate 32-bit register values also set the upper 32 bits of the register to zero.
» So: no need for an instruction movzlq.
􏰍 Similarly, movzbq has the exact same behavior as
movzbl when the destination is a register
» set the upper 56 bits of the destination register to zero.
􏰍 Instructions that generate 8 or 16-bit values, such as movb do not alter the other bits in the register.
40

Yet Another Special Case
􏰍 You can have movq $num, %register » num is an immediate number (signed) » register is any 64-bit register
» num cannot exceed 32 bits
» The rest is sign-extended
􏰍 movabsq $num, %register
» Means move num absolute to register » num is 64-bit
41

Simple Memory Addressing Modes
􏰍 Normal (R) Mem[Reg[R]] » Register R specifies memory address
» Aha! Pointer dereferencing in C
movq (%rcx),%rax
􏰍 Displacement D(R) Mem[Reg[R]+D] » Register R specifies start of memory region
» Constant displacement D specifies offset
movq 8(%rbp),%rdx
42

Example of Simple Addressing Modes
void swap
(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
swap:
􏰘 􏰊ome 􏰊e􏰇􏰆p code
movq (%rdi), %rax movq (%rsi), %rdx movq %rdx, (%rdi) movq %rax, (%rsi) 􏰘 􏰔􏰈ap-up code
ret
43

Understanding Swap()
void swap
(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
Registers
Memory
%rdi
%rsi
%rax
swap: movq
movq
movq
movq
ret
(%rdi), %rax # t0 = *xp
(%rsi), %rdx # t1 = *yp
%rdx, (%rdi) # *xp = t1
%rax, (%rsi) # *yp = t0
%rdx
Register Value
%rdi xp %rsi yp %rax t0 %rdx t1
44

Understanding Swap()
Registers
Memory
Address
0x120
0x118
0x110
0x108
0x100
123
456
%rdi
0x120
%rsi
0x100
%rax
%rdx
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
45

Understanding Swap()
Registers
Memory
Address
0x120
0x118
0x110
0x108
0x100
123
456
%rdi
0x120
%rsi
0x100
%rax
123
%rdx
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
46

Understanding Swap()
Registers
Memory
Address
0x120
0x118
0x110
0x108
0x100
123
456
%rdi
0x120
%rsi
0x100
%rax
123
%rdx
456
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
47

Understanding Swap()
Registers
Memory
Address
0x120
0x118
0x110
0x108
0x100
456
456
%rdi
0x120
%rsi
0x100
%rax
123
%rdx
456
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
48

Understanding Swap()
Registers
Memory
Address
0x120
0x118
0x110
0x108
0x100
456
123
%rdi
0x120
%rsi
0x100
%rax
123
%rdx
456
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
49

Complete Memory Addressing Modes
􏰍Most General Form
» D: » Rb: » Ri:
» S:
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D] C􏰄􏰎􏰊􏰇a􏰎􏰇 􏰐di􏰊􏰅lace􏰙e􏰎􏰇􏰑 1, 2, 􏰄􏰈 4 b􏰉􏰇e􏰊 (ca􏰎􏰎􏰄􏰇 be bigge􏰈) Base register: Any of 16 integer registers
Index register: Any of the 16 registers except %rsp
Scale: 1, 2, 4, or 8 (why these numbers?)
􏰍Special Cases (Rb,Ri)
D(Rb,Ri) (Rb,Ri,S)
Mem[Reg[Rb]+Reg[Ri]] Mem[Reg[Rb]+Reg[Ri]+D] Mem[Reg[Rb]+S*Reg[Ri]]
50

Address Computation Examples
%rdx
0xf000
%rcx
0x0100
Expression
Address Computation
Address
0x8(%rdx)
0xf000 + 0x8
0xf008
(%rdx,%rcx)
0xf000 + 0x100
0xf100
(%rdx,%rcx,4)
0xf000 + 4*0x100
0xf400
0x80(,%rdx,2)
2*0xf000 + 0x80
0x1e080
51

Detailed Agenda: Machine Programming I – Introduction
􏰍 History of Intel processors and architectures
􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands,
move
􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations
􏰍 Procedure Call Operations
􏰍 Coding Practices
52

x64 Arithmetic Operations
􏰍 Unary Operations
􏰍 Binary Operations
53

x64 Arithmetic Operations (continued)
􏰍 Shift Operations
􏰍 Special Arithmetic Operations
54

Address Computation Instruction
􏰍 leaq Src, Dst
» Src is address mode expression
» Set Dst to address calculated for src
􏰍 Uses
» Computing addresses without a memory reference
􏰏 E.g., translation of p = &x[i];
» Computing arithmetic expressions of the form x + k*y
􏰏 k = 1, 2, 4, or 8 􏰍 Example
long m12(long x) {
return x*12;
}
Converted to ASM by compiler:
leaq (%rdi,%rdi,2), %rax # t <- x+x*2 salq $2, %rax # return t<<2 55 Some Arithmetic Operations 􏰍 Two Operand Instructions: Format Computation addq Src,Dest subq Src,Dest imulq Src,Dest salq Src,Dest sarq Src,Dest shrq Src,Dest xorq Src,Dest andq Src,Dest orq Src,Dest 􏰍 Watch out for 􏰍 No distinction Dest = Dest Dest = Dest Dest = Dest Dest = Dest Dest = Dest Dest = Dest Dest = Dest Dest = Dest Dest = Dest + Src 􏰚 Src * Src << Src >> Src >> Src ^ Src & Src | Src
Also called shlq Arithmetic Logical
argument order!
between signed and unsigned int (why?)
56

Some Arithmetic Operations
􏰍 One Operand Instructions
incq Dest decq Dest negq Dest notq Dest
Dest = Dest + 1 Dest = Dest 􏰚 1 Dest = 􏰚 Dest Dest = ~Dest
􏰍 See textbook for more instructions
57

Arithmetic Expression Example
arith:
leaq (%rdi,%rsi), %rax
addq %rdx, %rax
leaq (%rsi,%rsi,2), %rdx
salq $4, %rdx
leaq 4(%rdi,%rdx), %rcx
imulq %rcx, %rax
ret
Interesting Instructions
» leaq: address computation » salq: shift
» imulq: multiplication
􏰏 But, only used once
long arith
(long x, long y, long z)
{
long t1 = x+y;
long t2 = z+t1;
long t3 = x+4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
58

Understanding Arithmetic Expression Example
arith:
leaq (%rdi,%rsi), %rax
addq %rdx, %rax
leaq (%rsi,%rsi,2), %rdx
salq $4, %rdx
# t1 # t2
long arith
(long x, long y, long z)
{
long t1 = x+y;
long t2 = z+t1;
long t3 = x+4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
# t4
leaq 4(%rdi,%rdx), %rcx # t5
imulq %rcx, %rax # rval
ret
Register
Use(s)
%rdi
Argument x
%rsi
Argument y
%rdx
Argument z
%rax
t1, t2, rval
%rdx
t4
%rcx
t5
59

Understanding Arithmetic Expression Example
arith:
leaq (%rdi,%rsi), %rax
addq %rdx, %rax
leaq (%rsi,%rsi,2), %rdx
salq $4, %rdx
leaq 4(%rdi,%rdx), %rcx # t5
imulq %rcx, %rax # rval
ret
# t1
# t2
long arith
(long x, long y, long z)
{
long t1 = x+y;
long t2 = z+t1;
long t3 = x+4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
# t4
􏰕 Instructions in different order from C code
􏰕 Some expressions require multiple instructions
􏰕 Some instructions cover multiple expressions
60

Multiplication
􏰍 Unsigned
» form 1: imulq s, d 􏰏 d = s* d
􏰏 multiply two 64-bit operands and put the result in 64-bit operand » form 2: mulq s
􏰏 one operand is rax
􏰏 The other operand given in the instruction
􏰏 product is stored in rdx (high-order part) and rax (low order part) 􏰗full 128-bit result
􏰍 Signed
» form 1: imulq s, d
􏰏 d = s* d
􏰏 multiply two 64-bit operands and put the result in 64-bit operand » form 2: imulq s
􏰏 one operand is rax
􏰏 The other operand given in the instruction
􏰏 product is stored in rdx (high-order part) and rax (low order part) 􏰗full 128-bit result
61

Division
􏰍 Unsigned
» divq s
􏰏 Dividend given in rdx (high order) and rax (low order) 􏰏 Divisor is s
􏰏 Quotient stored in rax
􏰏 Remainder stored in rdx
􏰍 Signed
» idivq s
􏰏 Dividend given in rdx (high order) and rax (low order) 􏰏 Divisor is s
􏰏 Quotient stored in rax
􏰏 Remainder stored in rdx
62

Useful Instruction for Division
cqto
􏰍 No operands
􏰍 Takes the sign bit from rax and replicates it in rdx
63

Detailed Agenda: Machine Programming I – Introduction
􏰍 History of Intel processors and architectures
􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands,
move
􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations
􏰍 Procedure Call Operations
􏰍 Coding Practices
64

x64 Comparison and Test Instructions
65

x64 Accessing Condition Codes 􏰃 Conditional Set Instructions
66

x64 Accessing Condition Codes 􏰃 Jump Instructions
67

x64 Accessing Condition Codes 􏰃 Conditional Move Instructions
68

Detailed Agenda: Machine Programming I – Introduction
􏰍 History of Intel processors and architectures
􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands,
move
􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations
􏰍 Procedure Call Operations
􏰍 Coding Practices
69

x64 Procedure Call Instructions
70

Detailed Agenda: Machine Programming I – Introduction
􏰍 History of Intel processors and architectures
􏰍 C, assembly, machine code
􏰍 Assembly Basics: Registers, operands,
move
􏰍 Arithmetic Operations
􏰍 Comparison and Test Operations
􏰍 Procedure Call Operations
􏰍 Coding Practices
71

Commenting
􏰍 Each function you write should have a comment at the beginning describing what the function does and any arguments it accepts
􏰍 It is recommended to put comments alongside your assembly code stating what each set of instructions does (in pseudocode or some higher level language)
􏰍 Line breaks are also helpful to group statements into logical blocks for improved readability
72

Arrays
􏰍 Arrays are stored in memory as contiguous blocks of data
» Typically an array variable acts as a pointer to the first element of the array in memory
»To access a given array element, the index value is multiplied by the element size and added to the array pointer
􏰏 For instance, if arr is an array of ints, the statement: can be expressed in x86-64 as follows (assuming the address of arr is stored in %rax and the index i is stored in %rcx):
73

Stack Organization and Function Calls
􏰍 Calling a Function
» To call a function, the program should place the first six integer or pointer parameters in the registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9
» Subsequent parameters (or parameters larger than 64 bits) should be pushed onto the stack, with the first argument topmost
» The program should then execute the call instruction, which will push the return address onto the stack and jump to the start of the specified function
􏰍 Example
􏰍 If the function has a return value, it will be stored in %rax after the function call
74

Stack Organization and Function Calls (1/4)
􏰍 Writing a Function
» An x64 program uses a region of memory called the stack to support
function calls
» As the name suggests, this region is organized as a stack data structure 􏰔i􏰇h 􏰇he 􏰐􏰇􏰄􏰅􏰑 􏰄f 􏰇he 􏰊􏰇ack g􏰈􏰄􏰔i􏰎g 􏰇􏰄􏰔a􏰈d􏰊 l􏰄􏰔e􏰈 􏰙e􏰙􏰄􏰈􏰉 add􏰈e􏰊􏰊e􏰊
» For each function call, new space is created on the stack to store local variables and other data; this is known as a stack frame (to accomplish this, you will need to write some code at the beginning and end of each function to create and destroy the stack frame)
» Setting up:
􏰏 When a call instruction is executed, the address of the following instruction is pushed
onto the stack as the return address and control passes to the specified function
􏰏 If the function is going to use any of the callee-save registers (%rbx, %rbp, or %r12- r15), the current value of each should be pushed onto the stack to be restored at the end; for example:
75

Stack Organization and Function Calls (2/4)
􏰍 Writing a Function (continued)
» Setting up (continued)
􏰏 Additional space may be allocated on the stack for local variables
􏰏 While it is possible to make space on the stack as needed in a function body, it is generally more efficient to allocate this space all at once at the beginning of the function; this can be accomplished using the call subq $N, %rsp where N is the size of the callee􏰋􏰊 stack frame
􏰏 For example:
􏰏 This set-up is called the function prologue
» Using the stack frame
􏰏 Once you have set up the stack frame, you can use it to store and access local variables:
􏰕 Arguments which cannot fit in registers (e.g. structs) will be pushed onto the stack before the call instruction, and can be accessed relative to %rsp; keep in mind that you will need to take the size of the stack frame into account when referencing arguments in this manner
􏰕 If the function has more than six integer or pointer arguments, these will be pushed onto the stack as well
􏰕 For any stack arguments, the lower-numbered arguments will be closer to the stack pointer; that is, arguments are pushed on in right-to-left order when applicable
􏰕 Local variables will be stored in the space allocated in the function prologue, when some amount is subtracted from %rsp. The organization of these is up to the programmer
76

Stack Organization and Function Calls (3/4)
􏰍 Writing a Function (continued) » Cleaning up
􏰏 After the body of the function is finished and the return value (if any) is placed in %rax, the function must return control to the caller, putting the stack back in the state in which it was called with
􏰏 First, the callee frees the stack space it allocated by adding the same amount to the stack pointer:
􏰏 Then, it pops off the registers it saved earlier
􏰏 Finally, the program should return to the call site, using the ret instruction:
77

Stack Organization and Function Calls (4/4)
􏰍 Writing a Function (continued)
» Summary
􏰏 Putting it together, the code for a function should look like this:
78

Dynamic Stack Allocation (1/2)
􏰍 You may find that having a static amount of stack space for your function does not quite cut it
» In this case, we will need to borrow a tradition from 32-bit x86 and save the base of the stack frame into the base pointer register
» Since %rbp is a callee-save register, it needs to be saved before you change it; therefore, the function prologue will now be prefixed with:
» Consequently, the epilogue will contain this right before the ret:
» This can also be done with a single instruction, called leave
􏰏 The epilogue makes sure that no matter what you do to the stack pointer in the function body, you will always return it to the right place when you return
􏰏 Note that this means you no longer need to add to the stack pointer in the epilogue
79

Dynamic Stack Allocation (2/2)
􏰍 Below is an example of a function which allocates between 8-248 bytes of random stack space during its execution:
􏰍 This sort of behavior can be accessed from C code by calling pseudo-f􏰆􏰎c􏰇i􏰄􏰎􏰊 like 􏰐alloca􏰑, 􏰔hich all􏰄ca􏰇e􏰊 􏰊􏰇ack 􏰊􏰅ace according to its argument.
􏰍 More information about the stack frame and function calls can be found in the textbook
80

Agenda
1
2
3
Instructor and Course Introduction Machine-Level Programming I – Introduction Summary and Conclusion
81

Summary: Machine Programming I – Introduction
􏰍 History of Intel processors and architectures
» Evolutionary design leads to many quirks and artifacts
􏰍 C, assembly, machine code
» New forms of visible state: program counter, registers, …
» Compiler must transform statements, expressions, procedures into low-level instruction sequences
􏰍 Assembly Basics: Registers, operands, move
» The x86-64 move instructions cover wide range of data
movement forms 􏰍 Arithmetic
» C compiler will figure out different instruction combinations to carry out computation
82

Assignments & Readings
􏰍 Readings
» Slides and Handouts posted on the course web site
» Recommended Textbook: The C Programming Language (K&R) » C Programming Resources
􏰍 Lab #1
» In progress, due on 03/09/20
􏰍 Development environment
» C development environment on CIMS cluster » GitHubaccount
83

Next Session: Machine-Level Programming II – Control
84

Questions, Comments, Discussions ?
85