PowerPoint Presentation
Roadmap
1
car *c = malloc(sizeof(car));
c->miles = 100;
c->gals = 17;
float mpg = get_mpg(c);
free(c);
Car c = new Car();
c.setMiles(100);
c.setGals(17);
float mpg =
c.getMPG();
Java:
C:
Assembly language:
Machine code:
0111010000011000
100011010000010000000010
1000100111000010
110000011111101000011111
Computer system:
OS:
Memory & data
Arrays and Structs
Integers & floats
RISC V assembly
Procedures & stacks
Executables
Memory & caches
Processor Pipeline
Performance
Parallelism
CMPT 295
Assembler, Compiler and Linker
1
sum.c
sum.s
Compiler
C source
files
assembly
files
sum.o
Assembler
obj files
sum
Linker
executable
program
Executing
in
Memory
loader
process
exists on
disk
From Writing to Running
2
When most people say “compile” they mean
the entire process:
compile + assemble + link
“It’s alive!”
gcc -S
gcc -c
gcc -o
CMPT 295
Assembler, Compiler and Linker
C compiler produces assembly files (contain MIPS assembly, pseudo-instructions, directives, etc.)
MIPS assembler produces object files (contain MIPS machine code, missing symbols, some layout information, etc.)
MIPS linker produces executable file (contains MIPS machine code, no missing symbols, some layout information)
OS loader gets it into memory and jumps to first instruction (machine code)
2
Compiler output is assembly files
Assembler output is obj files
Linker joins object files into one executable
Loader brings it into memory and starts execution
Example: sum.c
CMPT 295
Assembler, Compiler and Linker
#include
int n = 100;
int main (int argc, char* argv[ ]) {
int i;
int m = n;
int sum = 0;
for (i = 1; i <= m; i++) { sum += i; } printf ("Sum 1 to %d is %d\n", n, sum); } 4 Example: sum.c CMPT 295 Assembler, Compiler and Linker # Compile [VM] riscv32-unknown-elf-gcc –S sum.c # Assemble [VM] riscv32-unknown-elf-gcc –c sum.s # Link [VM] riscv32-unknown-elf-gcc –o sum sum.o # Load [VM] qemu-riscv32 sum Sum 1 to 100 is 5050 RISC-V program exits with status 0 (approx. 2007 instructions in 143000 nsec at 14.14034 MHz) Example: sum.c CMPT 295 Assembler, Compiler and Linker Input: Code File (.c) Source code #includes, function declarations & definitions, global variables, etc. Output: Assembly File (RISC-V) RISC-V assembly instructions (.s file) Compiler 6 for (i = 1; i <= m; i++) { sum += i; } li x2,1 lw x3,fp,28 slt x2,x3,x2 CMPT 295 Assembler, Compiler and Linker $L2: lw $a4,-20($fp) lw $a5,-28($fp) blt $a5,$a4,$L3 lw $a4,-24($fp) lw $a5,-20($fp) addu $a5,$a4,$a5 sw $a5,-24($fp) lw $a5,-20($fp) addi $a5,$a5,1 sw $a5,-20($fp) j $L2 $L3: la $4,$str0 lw $a1,-28($fp) lw $a2,-24($fp) jal printf li $a0,0 mv $sp,$fp lw $ra,44($sp) lw $fp,40($sp) addiu $sp,$sp,48 jr $ra .globl n .data .type n, @object n: .word 100 .rdata $str0: .string "Sum 1 to %d is %d\n" .text .globl main .type main, @function main: addiu $sp,$sp,-48 sw $ra,44($sp) sw $fp,40($sp) move $fp,$sp sw $a0,-36($fp) sw $a1,-40($fp) la $a5,n lw $a5,0($a5) sw $a5,-28($fp) sw $0,-24($fp) li $a5,1 sw $a5,-20($fp) 7 prologue $a0 $a1 n=100 m=n=100 sum=0 i=1 i=1 m=100 if(m < i) 100 < 1 1(i) 0(sum) 1=(0+1) a5=i=1 sum=1 i=2=(1+1) i=2 call printf $a0 $a1 $a2 str m=100 sum sum.s (abridged) epilogue main returns 0 CMPT 295 Assembler, Compiler and Linker note: funny label names alignment directives .data versus .rdata frame pointer equals stack pointer totally unoptimized use of LA Actual compiled code below .file "sum.c" .option nopic .text .globl n .section .sdata,"aw" .align 2 .type n, @object .size n, 4 n: .word 100 .section .rodata .align 2 .LC0: .string "Sum 1 to %d is %d\n" .text .align 1 .globl main .type main, @function main: addi sp,sp,-48 sw ra,44(sp) sw s0,40(sp) addi s0,sp,48 sw a0,-36(s0) sw a1,-40(s0) lui a5,%hi(n) lw a5,%lo(n)(a5) sw a5,-28(s0) # m sw zero,-24(s0) # sum li a5,1 sw a5,-20(s0) # i j .L2 .L3: lw a4,-24(s0) # sum lw a5,-20(s0) # i add a5,a4,a5 # add sum + i sw a5,-24(s0) # sum store lw a5,-20(s0) # i addi a5,a5,1 # i++ sw a5,-20(s0) # store i .L2: lw a4,-20(s0) # i lw a5,-28(s0) # m ble a4,a5,.L3 # if (i < m) brant to L3 lui a5,%hi(n) lw a5,%lo(n)(a5) lw a2,-24(s0) # sum mv a1,a5 # n lui a5,%hi(.LC0) addi a0,a5,%lo(.LC0) # string call printf li a5,0 mv a0,a5 lw ra,44(sp) lw s0,40(sp) addi sp,sp,48 jr ra .size main, .-main .ident "GCC: (GNU) 8.2.0" Input: Assembly File (.s) assembly instructions, pseudo-instructions program data (strings, variables), layout directives Output: Object File in binary machine code RISC-V instructions in executable form (.o file in Unix, .obj in Windows) Assembler 8 addi r5, r0, 10 muli r5, r5, 2 addi r5, r5, 15 00000000101000000000001010010011 00000000001000101000001010000000 00000000111100101000001010010011 CMPT 295 Assembler, Compiler and Linker Arithmetic/Logical ADD, SUB, AND, OR, XOR, SLT, SLTU ADDI, ANDI, ORI, XORI, LUI, SLL, SRL, SLTI, SLTIU MUL, DIV Memory Access LW, LH, LB, LHU, LBU, SW, SH, SB Control flow BEQ, BNE, BLE, BLT, BGE JAL, JALR Special LR, SC, SCALL, SBREAK RISC-V Assembly Instructions 9 CMPT 295 Assembler, Compiler and Linker Assembly shorthand, technically not machine instructions, but easily converted into 1+ instructions that are Pseudo-Insns Actual Insns Functionality NOP ADDI x0, x0, 0 # do nothing MV reg, reg ADD r2, r0, r1 # copy between regs LI reg, 0x45678 LUI reg, 0x4 #load immediate ORI reg, reg, 0x5678 LA reg, label # load address (32 bits) B label BEQ x0, x0, label # unconditional branch + a few more… Pseudo-Instructions 10 CMPT 295 Assembler, Compiler and Linker Pseudo-Instructions NOP # do nothing ADDI x0, x0, 0 MV reg, reg # copy between regs ADDI x2, x0, x1 # copies contents of x1 to x2 LI reg, imm # load immediate (up to 32 bits) LA reg, label # load address (32 bits) B label # unconditional branch 10 Program Layout Programs consist of segments used for different purposes Text: holds instructions Data: holds statically allocated program data such as variables, strings, etc. add x1,x2,x3 ori x2, x4, 3 ... “sfu cs” 13 25 data text CMPT 295 Assembler, Compiler and Linker Assembling Programs Assembly files consist of a mix of + instructions + pseudo-instructions + assembler (data/layout) directives (Assembler lays out binary values in memory based on directives) Assembled to an Object File Header Text Segment Data Segment Relocation Information Symbol Table Debugging Information .text .ent main main: la $4, Larray li $5, 15 ... li $4, 0 jal exit .end main .data Larray: .long 51, 491, 3991 CMPT 295 Assembler, Compiler and Linker 12 A program is made up by code and data from several object files Each object file is generated independently Assembler starts at some PC address, e.g. 0, in each object file, generates code as if the program were laid out starting out at location 0x0 It also generates a symbol table, and a relocation table In case the segments need to be moved Global labels: Externally visible “exported” symbols Can be referenced from other object files Exported functions, global variables Examples: pi, e, userid, printf, pick_prime, pick_random Local labels: Internally visible only Only used within this object file static functions, static variables, loop labels, … Examples: randomval, is_prime Symbols and References 13 int pi = 3; int e = 2; static int randomval = 7; extern int usrid; extern int printf(char *str, …); int square(int x) { … } static int is_prime(int x) { … } int pick_prime() { … } int get_n() { return usrid; } math.c (extern == defined in another file) CMPT 295 Assembler, Compiler and Linker Producing Machine Language (1/3) Simple Cases Arithmetic and logical instructions, shifts, etc. All necessary info contained in the instruction What about Branches and Jumps? Branches and Jumps require a relative address Once pseudo-instructions are replaced by real ones, we know by how many instructions to branch, so no problem 14 CMPT 295 Assembler, Compiler and Linker “Forward Reference” problem Branch instructions can refer to labels that are “forward” in the program: Solution: Make two passes over the program Producing Machine Language (2/3) 15 or s0, x0, x0 L1: slt t0, x0, a1 beq t0, x0, L2 addi a1, a1, -1 j L1 L2: add t1, a0, a1 CMPT 295 Assembler, Compiler and Linker This is a problem because the program is read sequentially from top to bottom. Flesh the two passes out 15 Pass 1: Expands pseudo instructions encountered Remember position of labels Take out comments, empty lines, etc Error checking Pass 2: Use label positions to generate relative addresses (for branches and jumps) Outputs the object file, a collection of instructions in binary code Two Passes Overview 16 CMPT 295 Assembler, Compiler and Linker This is a problem because the program is read sequentially from top to bottom 16 Example: bne x1, x2, L sll x0, x0, 0 L: addi x2, x3, 0x2 The assembler will change this to bne x1, x2, +8 sll x0, x0, 0 addi x2, x3, 0x2 Final machine code 0X00208413 # bne 0x00001033 # sll 0x00018113 # addi Handling forward references 17 actually: 0000 0000 0010... 0000 0000 0000... 0000 0000 0000... Looking for L Found L CMPT 295 Assembler, Compiler and Linker 17 0x00208413 = 0000000 00010 00001 000 00100 0010011 #bne x1, x2, 0x4 0x00001033 = 0000000 00000 00000 001 00000 0110011 # sll x0, x0, x0 0x00018113 = 0000 0000 0000 00011 000 00010 0010011 # addi x2, x3, 0x2 Header Size and position of pieces of file Text Segment instructions Data Segment static data (local/global vars, strings, constants) Debugging Information line number code address map, etc. Symbol Table External (exported) references Unresolved (imported) references Object file 18 Object File CMPT 295 Assembler, Compiler and Linker Unix a.out COFF: Common Object File Format ELF: Executable and Linking Format Windows PE: Portable Executable All support both executable and object files Object File Formats 19 CMPT 295 Assembler, Compiler and Linker > riscv32-unknown-elf–objdump –disassemble math.o
Disassembly of section .text:
00000000
0: 27bdfff8 addi sp,sp,-8
4: afbe0000 sw fp,0(sp)
8: 03a0f021 mv fp,sp
c: 3c020000 lui a0,0x0
10: 8c420008 lw a0,8(a0)
14: 03c0e821 mv sp,fp
18: 8fbe0000 lw fp,0(sp)
1c: 27bd0008 addi sp,sp,8
20: 03e00008 jr ra
elsewhere in another file: int usrid = 41;
int get_n() {
return usrid;
}
Objdump disassembly
20
prologue
body
epilogue
unresolved symbol
(see symbol table next slide)
CMPT 295
Assembler, Compiler and Linker
something suspicious: a lot of zero constants in this code
20
> riscv-unknown-elf–objdump –syms math.o
SYMBOL TABLE:
00000000 l df *ABS* 00000000 math.c
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000008 l O .data 00000004 randomval
00000060 l F .text 00000028 is_prime
00000000 l d .rodata 00000000 .rodata
00000000 l d .comment 00000000 .comment
00000000 g O .data 00000004 pi
00000004 g O .data 00000004 e
00000000 g F .text 00000028 get_n
00000028 g F .text 00000038 square
00000088 g F .text 0000004c pick_prime
00000000 *UND* 00000000 usrid
00000000 *UND* 00000000 printf
Objdump symbols
21
[l]ocal
[g]lobal
size
segment
static local fn
@ addr 0x60
size = 0x28 bytes
[F]unction
[O]bject
external references (undefined)
CMPT 295
Assembler, Compiler and Linker
notice: pick_random, square, other functions; pi, e, randomval; some undefined symbols
21
sum.c
sum.s
Compiler
source files
assembly files
sum.o
Assembler
obj files
sum
Linker
executable
program
Executing
in
Memory
loader
process
exists on
disk
Separate Compilation & Assembly
22
math.c
math.s
math.o
http://xkcd.com/303/
small change ?
recompile one
module only
gcc -S
gcc -c
gcc -o
CMPT 295
Assembler, Compiler and Linker
C compiler produces assembly files (contain RISC-V assembly, pseudo-instructions, directives, etc.)
RISC-V assembler produces object files (contain RISC-V machine code, missing symbols, some layout information, etc.)
RISC-V linker produces executable file (contains RISC-V machine code, no missing symbols, some layout information)
OS loader gets it into memory and jumps to first instruction (machine code)
22
Linker (1/3)
Input: Object Code files, information tables
(e.g. foo.o,lib.o for RISC-V)
Output: Executable Code (e.g. a.out for RISC-V)
Combines several object (.o) files into a single executable (“linking”)
Enables separate compilation of files
Changes to one file do not require recompilation of whole program
Old name “Link Editor” from editing the “links” in jump and link instructions
23
CMPT 295
Assembler, Compiler and Linker
object file 1
text 1
data 1
info 1
object file 2
text 2
data 2
info 2
Linker
a.out
Relocated text 1
Relocated text 2
Relocated data 1
Relocated data 2
Linker (2/3)
24
CMPT 295
Assembler, Compiler and Linker
Linker (3/3)
Take text segment from each .o file and put them together
Take data segment from each .o file, put them together, and concatenate this onto end of text segments
Resolve References
Go through Relocation Table; handle each entry
i.e. fill in all absolute addresses
25
CMPT 295
Assembler, Compiler and Linker
Resolving References (1/2)
Linker assumes the first word of the first text segment is at 0x10000 for RV32.
More later when we study “virtual memory”
Linker knows:
Length of each text and data segment
Ordering of text and data segments
Linker calculates:
Absolute address of each label to be jumped to (internal or external) and each piece of data being referenced
26
CMPT 295
Assembler, Compiler and Linker
Resolving References (2/2)
To resolve references:
Search for reference (data or label) in all “user” symbol tables
If not found, search library files (e.g. printf)
Once absolute address is determined, fill in the machine code appropriately
Output of linker: executable file containing text and data (plus header)
27
CMPT 295
Assembler, Compiler and Linker
Three Types of Addresses
PC-Relative Addressing (beq, bne, jal)
never relocate
External Function Reference (usually jal)
always relocate
Static Data Reference (often auipc and addi)
always relocate
RISC-V often uses auipc rather than lui so that a big block of stuff can be further relocated as long as it is fixed relative to the pc
28
CMPT 295
Assembler, Compiler and Linker
Static Libraries
Static Library: Collection of object files
(think: like a zip archive)
Q: Every program contains the entire library?!?
A: No, Linker picks only object files needed to resolve undefined references at link time
e.g. libc.a contains many objects:
printf.o, fprintf.o, vprintf.o, sprintf.o, snprintf.o, …
read.o, write.o, open.o, close.o, mkdir.o, readdir.o, …
rand.o, exit.o, sleep.o, time.o, ….
29
CMPT 295
Assembler, Compiler and Linker
main.o
…
000000EF
21035000
1b80050C
8C040000
21047002
000000EF
…
00 T main
00 D usrid
*UND* printf
*UND* pi
*UND* get_n
.text
Symbol table
JAL printf JAL ???
Unresolved references to printf and get_n
40,JAL, printf
…
54,JAL, get_n
40
44
48
4C
50
54
Relocation info
math.o
…
21032040
000000EF
1b301402
00000B37
00028293
…
20 T get_n
00 D pi
*UND* printf
*UND* usrid
28,JAL, printf
24
28
2C
30
34
22
Linker Example: Resolving an External Fn Call
CMPT 295
Assembler, Compiler and Linker
main.o
0x000000EF = 00000000000000000000 0000 1110 1111 (JAL)
0x10000B37 = 00010000000000000000 00101 0110111 # LUI x5, 0x10000
0x00428293 = 0000 0000 0100 0010 1000 0010 1001 0011 # ADDI x5, x5, 0x4
0C000000 = 0000 1100 (JAL)
21035000 = 001000 01000 00011 0101 0000 0000 0000 (ADDI $3,$8, 0x5000)
1B80050C = 000110 11100 00000 0000 0101 0000 1100 (BLEZ $28, 0x050C)
3C040000 = 001111 00000 00100 0000 0000 0000 0000 (LUI, $4, 0)
21047002 = 001000 01000 00100 0111 0000 0000 0010 (ADDI $4, $8, 0x7002)
math.o
21032040 = 001000 01000 00011 0010 0000 0100 0000 (ADDI $3,$8, 0x2040)
0C000000 = 0000 1100 (JAL)
1B301402 = 000110 11001 10000 0001 0100 0000 0010 (BLEZ $23, 0x1402)
3C040000 = 001111 000000 0100 0000 0000 0000 0000 (LUI, $4, 0)
34040000 = 001101 000000 0100 0000 0000 0000 0000 (ORI, $4, 0)
1111 f
1110 e
1101 d
1100 c
1011 b
1010 a
30
main.o
…
000000EF
21035000
1b80050C
8C040000
21047002
000000EF
…
00 T main
00 D usrid
*UND* printf
*UND* pi
*UND* get_n
printf.o
…
3C T printf
.text
Symbol table
JAL printf JAL ??? Unresolved references to
printf and get_n
40,JAL, printf
…
54,JAL, get_n
40
44
48
4C
50
54
Relocation info
math.o
…
21032040
000000EF
1b301402
00000B37
00028293
…
20 T get_n
00 D pi
*UND* printf
*UND* usrid
28,JAL, printf
24
28
2C
30
34
22
Which symbols are undefined according to both main.o and math.o’s symbol table?
printf
pi
get_n
usr
printf & pi
CMPT 295
Assembler, Compiler and Linker
main.o
0x000000EF = 00000000000000000000 0000 1110 1111 (JAL)
0x10000B37 = 00010000000000000000 00101 0110111 # LUI x5, 0x10000
0x00428293 = 0000 0000 0100 0010 1000 0010 1001 0011 # ADDI x5, x5, 0x4
0C000000 = 0000 1100 (JAL)
21035000 = 001000 01000 00011 0101 0000 0000 0000 (ADDI $3,$8, 0x5000)
1B80050C = 000110 11100 00000 0000 0101 0000 1100 (BLEZ $28, 0x050C)
3C040000 = 001111 00000 00100 0000 0000 0000 0000 (LUI, $4, 0)
21047002 = 001000 01000 00100 0111 0000 0000 0010 (ADDI $4, $8, 0x7002)
math.o
21032040 = 001000 01000 00011 0010 0000 0100 0000 (ADDI $3,$8, 0x2040)
0C000000 = 0000 1100 (JAL)
1B301402 = 000110 11001 10000 0001 0100 0000 0010 (BLEZ $23, 0x1402)
3C040000 = 001111 000000 0100 0000 0000 0000 0000 (LUI, $4, 0)
34040000 = 001101 000000 0100 0000 0000 0000 0000 (ORI, $4, 0)
1111 f
1110 e
1101 d
1100 c
1011 b
1010 a
31
…
21032040
40023CEF
1b301402
3C041000
34040004
…
40023CEF
21035000
1b80050c
8C048004
21047002
400020EF
…
10201000
21040330
22500102
…
sum.exe
0040 0000
0040 0100
0040 0200
1000 0000
.text
.data
32
main.o
…
000000EF
21035000
1b80050C
8C040000
21047002
000000EF
…
00 T main
00 D usrid
*UND* printf
*UND* pi
*UND* get_n
printf.o
…
3C T printf
.text
Symbol table
JAL printf JAL ???
Unresolved references to
printf and get_n
Entry:0040 0100
text: 0040 0000
data: 1000 0000
math
main
printf
JAL get_n
JAL printf
JAL printf
40,JAL, printf
…
54,JAL, get_n
40
44
48
4C
50
54
Relocation info
math.o
…
21032040
000000EF
1b301402
00000B37
00028293
…
20 T get_n
00 D pi
*UND* printf
*UND* usrid
28,JAL, printf
24
28
2C
30
34
global variables
go here (later)
CMPT 295
Assembler, Compiler and Linker
main.o
0x000000EF = 00000000000000000000 0000 1110 1111 (JAL)
0x10000B37 = 00010000000000000000 00101 0110111 # LUI x5, 0x10000
0x00428293 = 0000 0000 0100 0010 1000 0010 1001 0011 # ADDI x5, x5, 0x4
0C000000 = 0000 1100 (JAL)
21035000 = 001000 01000 00011 0101 0000 0000 0000 (ADDI $3,$8, 0x5000)
1B80050C = 000110 11100 00000 0000 0101 0000 1100 (BLEZ $28, 0x050C)
3C040000 = 001111 00000 00100 0000 0000 0000 0000 (LUI, $4, 0)
21047002 = 001000 01000 00100 0111 0000 0000 0010 (ADDI $4, $8, 0x7002)
math.o
21032040 = 001000 01000 00011 0010 0000 0100 0000 (ADDI $3,$8, 0x2040)
0C000000 = 0000 1100 (JAL)
1B301402 = 000110 11001 10000 0001 0100 0000 0010 (BLEZ $23, 0x1402)
3C040000 = 001111 000000 0100 0000 0000 0000 0000 (LUI, $4, 0)
34040000 = 001101 000000 0100 0000 0000 0000 0000 (ORI, $4, 0)
1111 f
1110 e
1101 d
1100 c
1011 b
1010 a
32
main.o
…
000000EF
21035000
1b80050C
8C040000
21047002
000000EF
…
00 T main
00 D usrid
*UND* printf
*UND* pi
*UND* get_n
printf.o
…
3C T printf
.text
Symbol table
JAL printf JAL ???
Unresolved references to
printf and get_n
40,JAL, printf
…
54,JAL, get_n
40
44
48
4C
50
54
Relocation info
math.o
…
21032040
000000EF
1b301402
00000B37
00028293
…
20 T get_n
00 D pi
*UND* printf
*UND* usrid
28,JAL, printf
24
28
2C
30
34
Question 2
22
Which which 2 symbols are currently assigned the same location?
main & printf
usrid & pi
get_n & printf
main & usrid
main & pi
CMPT 295
Assembler, Compiler and Linker
main.o
0x000000EF = 00000000000000000000 0000 1110 1111 (JAL)
0x10000B37 = 00010000000000000000 00101 0110111 # LUI x5, 0x10000
0x00428293 = 0000 0000 0100 0010 1000 0010 1001 0011 # ADDI x5, x5, 0x4
0C000000 = 0000 1100 (JAL)
21035000 = 001000 01000 00011 0101 0000 0000 0000 (ADDI $3,$8, 0x5000)
1B80050C = 000110 11100 00000 0000 0101 0000 1100 (BLEZ $28, 0x050C)
3C040000 = 001111 00000 00100 0000 0000 0000 0000 (LUI, $4, 0)
21047002 = 001000 01000 00100 0111 0000 0000 0010 (ADDI $4, $8, 0x7002)
math.o
21032040 = 001000 01000 00011 0010 0000 0100 0000 (ADDI $3,$8, 0x2040)
0C000000 = 0000 1100 (JAL)
1B301402 = 000110 11001 10000 0001 0100 0000 0010 (BLEZ $23, 0x1402)
3C040000 = 001111 000000 0100 0000 0000 0000 0000 (LUI, $4, 0)
34040000 = 001101 000000 0100 0000 0000 0000 0000 (ORI, $4, 0)
1111 f
1110 e
1101 d
1100 c
1011 b
1010 a
33
…
21032040
40023CEF
1b301402
10000B37
00428293
…
40023CEF
21035000
1b80050c
8C048004
21047002
400020EF
…
10201000
21040330
22500102
…
sum.exe
0040 0000
0040 0100
0040 0200
1000 0000
.text
.data
34
main.o
…
000000EF
21035000
1b80050C
8C040000
21047002
000000EF
…
00 T main
00 D usrid
*UND* printf
*UND* pi
*UND* get_n
.text
Symbol table
LA = LUI/ADDI ”usrid” ???
Unresolved references to userid
Need address of global variable
Entry:0040 0100
text: 0040 0000
data: 1000 0000
math
main
printf
40,JAL, printf
…
54,JAL, get_n
40
44
48
4C
50
54
Relocation info
math.o
…
21032040
000000EF
1b301402
00000B37
00028293
…
20 T get_n
00 D pi
*UND* printf
*UND* usrid
28,JAL, printf
30,LUI, usrid
34,LA, usrid
24
28
2C
30
34
00000003
0077616B
pi
usrid
Notice: usrid gets relocated due to collision with pi
LA num:
LUI 10000
ADDI 004
CMPT 295
Assembler, Compiler and Linker
main.o
0x000000EF = 00000000000000000000 0000 1110 1111 # JAL
0x10000B37 = 00010000000000000000 00101 0110111 # LUI x5, 0x10000
0x00428293 = 0000 0000 0100 0010 1000 0010 1001 0011 # ADDI x5, x5, 0x4
0C000000 = 0000 1100 (JAL)
21035000 = 001000 01000 00011 0101 0000 0000 0000 (ADDI $3,$8, 0x5000)
1B80050C = 000110 11100 00000 0000 0101 0000 1100 (BLEZ $28, 0x050C)
3C040000 = 001111 00000 00100 0000 0000 0000 0000 (LUI, $4, 0)
21047002 = 001000 01000 00100 0111 0000 0000 0010 (ADDI $4, $8, 0x7002)
math.o
21032040 = 001000 01000 00011 0010 0000 0100 0000 (ADDI $3,$8, 0x2040)
0C000000 = 0000 1100 (JAL)
1B301402 = 000110 11001 10000 0001 0100 0000 0010 (BLEZ $23, 0x1402)
3C040000 = 001111 000000 0100 0000 0000 0000 0000 (LUI, $4, 0)
34040000 = 001101 000000 0100 0000 0000 0000 0000 (ORI, $4, 0)
1111 f
1110 e
1101 d
1100 c
1011 b
1010 a
34
Question
Where does the assembler place the following symbols in the object file that it creates?
A. Text Segment
B. Data Segment
C. Exported reference in symbol table
D. Imported reference in symbol table
E. None of the above
35
#include
#include heaplib.h
#define HEAP SIZE 16
static int ARR SIZE = 4;
int main() {
char heap[HEAP SIZE];
hl_init(heap, HEAP SIZE * sizeof(char));
char* ptr = (char *) hl alloc(heap, ARR SIZE * sizeof(char));
ptr[0] = ’h’;
ptr[1] = ’i’;
ptr[2] = ’\0’;
printf(%s\n, ptr); return 0;
}
Q1: HEAP_SIZE
Q2: ARR_SIZE
Q3: hl_init
CMPT 295
Assembler, Compiler and Linker Executable files are stored on disk 36 CMPT 295 Assembler, Compiler and Linker Loader 37 CMPT 295 Assembler, Compiler and Linker Loader 38 CMPT 295 Assembler, Compiler and Linker Shared Libraries Optimizations: CMPT 295 Assembler, Compiler and Linker Static and Dynamic Linking Dynamic linking CMPT 295 Assembler, Compiler and Linker sum.c CMPT 295 Assembler, Compiler and Linker Summary CMPT 295 Assembler, Compiler and Linker 42 43 Peer Question CMPT 295 Assembler, Compiler and Linker 43 44 Peer Question CMPT 295 Assembler, Compiler and Linker 44 /docProps/thumbnail.jpeg
Loader
Input:Executable Code (e.g. a.out for RISC-V)
Output:
When one is run, loader’s job is to load it into memory and start it running
In reality, loader is the operating system (OS)
loading is one of the OS tasks
Reads executable file’s header to determine size of text and data segments
Creates new address space for program large enough to hold text and data segments, along with a stack segment
Copies instructions and data from executable file into the new address space
Copies arguments passed to the program onto the stack
Initializes machine registers
Most registers cleared, but stack pointer assigned address of 1st free stack location
Jumps to start-up routine that copies program’s arguments from stack to registers and sets the PC
If main routine returns, start-up routine terminates program with the exit system call
Q: Every program contains parts of same library?!?
A: No, they can use shared libraries
Executables all point to single shared library on disk
final linking (and relocations) done by the loader
Library compiled at fixed non-zero address
Jump table in each program instead of relocations
Can even patch jumps on-the-fly
39
fixed address: drawbacks? advantages?
(makes linking trivial: few relocations needed)
Static linking
Big executable files (all/most of needed libraries inside)
Don’t benefit from updates to library
No load-time linking
Small executable files (just point to shared library)
Library update benefits all programs that use it
Load-time cost to do final linking
But dll code is probably already in memory
And can do the linking incrementally, on-demand
40
cost of loading big executables
dll hell, versioning problems
math.c
io.s
sum.s
math.s
Compiler
C source
files
assembly
files
libc.o
libm.o
io.o
sum.o
math.o
Assembler
obj files
sum.exe
Linker
executable
program
Executing
in
Memory
loader
process
exists on
disk
41
C compiler produces assembly files (contain MIPS assembly, pseudo-instructions, directives, etc.)
MIPS assembler produces object files (contain MIPS machine code, missing symbols, some layout information, etc.)
MIPS linker produces executable file (contains MIPS machine code, no missing symbols, some layout information)
OS loader gets it into memory and jumps to first instruction (machine code)
41
Compiler produces assembly files
(contain RISC-V assembly, pseudo-instructions,
directives, etc.)
Assembler produces object files
(contain RISC-V machine code, missing symbols,
some layout information, etc.)
Linker joins object files into one executable file
(contains RISC-V machine code, no missing symbols, some layout information)
Loader puts program into memory, jumps to
1st insn, and starts executing a process
(machine code)
42