How to Start a Program
Complier, Assembler, Linker, and Loader
Mc OMP273 1
• Compiler • Assembler • Linker
• Loader
Outline
Mc OMP273 2
Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 3
Compiler
• Input: High-Level Language Code (e.g., C)
• Output: Assembly Language Code (e.g., MIPS)
– In MARS we use .asm as a file extension, but .s is a common file
extension for assembly in other scenarios
• Note: Output may contain pseudo-instructions
– Assembler understands these instructions, but not the machine
Mc OMP273 4
Compiler and Standards
• Compilergeneratesassemblycodeanddirectivesthatrespect conventions
– For example, function call register conventions
– There are many more details concerning data representation and
function linkage which are beyond the scope of this course – Interested? Check COMP-520 Compiler Design
Mc OMP273 5
Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 6
Assembler
• Reads and Uses Directives
• Replace Pseudo-instructions • ProduceMachineCode
• Creates Object File
Mc OMP273 7
Assembler Directives
• Directives provide directions to assembler, but do not produce machine instructions
.text: Subsequent items put in user text (instructions) segment .data: Subsequent items put in user data segment
.globl sym: declares sym global allowing reference from other files .asciiz str: Store string str in memory and null-terminate it .word w1…wn: Store n 32-bit words in successive memory locations
Mc OMP273 8
Pseudo-instruction Replacement Assembler converts MAL to TAL
Pseudo (MAL):
subu $sp,$sp,32 sd $a0,32($sp)
addu $t0,$t6,1 ble $t0,100,loop
Real (TAL):
addiu $sp,$sp,-32 sw $a0,32($sp)
sw $a1,36($sp) addiu $t0,$t6,1 slti $at,$t0,101 bne $at,$0,loop
Mc OMP273 9
Producing Machine Language (1/2)
• SimpleinstructionsforAssembler
– Arithmetic, Logical, Shifts, and so on
– All necessary info is within the instruction already
• WhataboutBranches?
– PC-Relative
– Once pseudo-instructions are replaced by real ones, we know by how many instructions to branch
• So these 2 cases are handled easily
Mc OMP273 10
Producing Machine Language (2/2) • Whataboutjumps(jandjal)?
– Jumps require absolute address
• Whataboutreferencestodata?
– la gets broken up into lui and ori
– These will require the full 32-bit address of the data
• These can’t be determined yet
– Must wait to see where this code will appear in final program
• Two tables are used to help assembly and later resolution of addresses
Mc OMP273 11
1st Table: Symbol Table
• Symbol table: List of “items” in this file that may be used by this and other files
• What are they?
– Labels: function calling
– Data: anything in the .data section; variables which may be accessed across files
• FirstPass:recordlabel-addresspairs • SecondPass:producemachinecode
– Result: can jump to a label later in code without first declaring it
Mc OMP273 12
2nd Table: Relocation Table
• Relocation Table: line numbers of “items” in this file which need the address filled in (or fixed up) later.
• What are they?
– Any label jumped to: j or jal
• Internal (i.e., label inside this file) • external (including lib files)
– Any absolute address of piece of data
• Such as used by the load address la pseudo-instruction: la $destination,label
Mc OMP273 13
Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 14
Object File Format
• object file header: size and position of the other pieces of the object file
• text segment: the machine code
• data segment: binary representation of the data in the source file
• relocation table: identifies lines of code that need to be “handled”
• symbol table: list of this file’s labels and data that can be referenced
• debugging information
Mc OMP273 15
Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 16
Link Editor/ does Link Editor do?
Mc OMP273 17
Link Editor/ several object (.o) files into a single executable
Mc OMP273 18
Link Editor/ ? Enables Separate Compilation of files
Mc OMP273 19
Link Editor/Linker
• Step 1: Combine text segment from each .o file
• Step 2: Combine data segment from each .o file, and concatenate this onto end of text segments
• Step 3: Resolve References
– Go through Relocation Table
– Handle each entry using the Symbol Table • That is, fill in all absolute addresses
Mc OMP273 20
Four Types of Addresses
• PC-Relative Addressing (beq, bne): – never fix up (never “relocate”)
• Absolute Address (j, jal): – always relocate
• ExternalReference(usuallyjal): – always relocate
• Symbolic Data Reference (often lui and ori): – always relocate
Mc OMP273 21
Resolving References (1/2)
• Linker assumes first word of first text segment is at address 0x00000000
• Linkerknows:
– Length of each text and data segment – Ordering of text and data segments
• Linkercalculates:
– Absolute address of each label to be jumped to (internal or external)
and each piece of data being referenced
Mc OMP273 22
Resolving References (2/2) • To resolve references:
– Search for reference (data or label) in all symbol tables
– If not found, search library files (for example, for printf)
– once absolute address is determined, fill in the machine code appropriately
• Output of linker:
– Executable file containing text and data (plus a file header)
• May not have library object files resolved if dynamically loaded Mc OMP273 23
Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 24
Loader (1/3) • Executable files are stored on disk.
• When one is to be run, loader’s job is to load it into memory and start it running.
• In reality, loader is the operating system (OS) – Loading is one of the OS tasks
Mc OMP273 25
Loader (2/3) • So what does a loader do?
• Reads executable file’s header to determine size of text and data segments
• Creates new address space for program large enough to hold text and data segments, along with a stack segment
• Copiesinstructionsanddatafromexecutablefileintothenew address space
Mc OMP273 26
Loader (3/3)
• Copies arguments passed to the program onto the stack
• Initializesmachineregisters
– Most registers cleared, but stack pointer must be initialized to top of
the stack memory space
• Jumpstostart-uproutinethatcopiesprogram’sarguments from stack to registers and sets the PC
– If main routine returns, start-up routine terminates program with the exit system call
Mc OMP273 27
Dynamic Linking
• Some operating systems allow “dynamic linking”
• Both the loader and the linker are part of the operating system – so modules can be linked and loaded at runtime
• If a module is needed and already loaded, no need to load again • Called DLLs in Windows, .so in Unix
(Dynamically Linked Library / Shared Object)
Mc OMP273 28
C → Asm → Obj → Exe → Run Compile C Source
Let us consider compilation of the following code…
#include
int main (int argc, char *argv[]) {
}
int i;
int prod = 0;
for (i = 0; i <= 100; i = i + 1) {
prod = prod + i * i; }
printf ("The sum squares from 0 .. 100 is %d\n", prod);
C → Asm → Obj → Exe → Run Identify Pseudo-instructions
.text
.align 2 .globl main
main:
subu $sp,$sp,32
sw $ra, 20($sp)
sd $a0, 32($sp)
sw $0, 24($sp)
sw $0, 28($sp) loop:
lw $t6, 28($sp)
mul $t7, $t6,$t6
lw $t8, 24($sp) addu $t9,$t8,$t7 sw $t9, 24($sp) addu $t0, $t6, 1
sw $t0, 28($sp)
lw $a1, 24($sp) jal printf
move $v0, $0
lw $ra, 20($sp) addiu $sp,$sp,32 j $ra
.data
.align 0
str:
.asciiz "The product
from 0 .. 100 is %d\n"
ble $t0,100, loop
la $a0, str
FINE PRINT: The modification of the stack pointer may look strange, but this is ultimately from a real example of compilation... a number of the real details are being omitted here (ABI,etc.), some of which we will see later.
C → Asm → Obj → Exe → Run Remove Pseudoinstructions, Assign Addresses
00 addiu $29,$29,-32 40 04 sw $31,20($29) 44
08
0c
10 sw
14 sw
18 lw
1c mult
20 mflo
24 lw
28 addu
2c sw
30 addiu $8,$14, 1 34 sw $8,28($29) 38
3c
48 lw $5,24($29)
4c jal printf
50 addu $2, $0, $0 54 lw $31,20($29)
58 addiu $29,$29,32
5c jr $31
sw $4, 32($29) sw $5, 36($29)
$0, 24($29)
$0, 28($29) $14, 28($29) $14, $14
$24, 24($29)
$25,$24,$15
$25, 24($29)
$15
lui $4, l.str ori $4,$4, r.str
slti $1,$8, 101 bne $1,$0, loop
• Symbol Table Label
main:
loop:
str:
printf:
Address
0x00000000
0x00000018
0x10000430
-
C → Asm → Obj → Exe → Run Symbol Table Entries
• RelocationTable
Address Instruction/Type Dependency 0x0000004c jal printf
Mc OMP273 32
00 addiu
04 sw
08 sw
0c sw
10 sw
14 sw
18 lw
1c multu
20 mflo
24 lw
28 addu $25,$24,$15 2c sw $25, 24($29) 30 addiu $8,$14, 1 34 sw $8,28($29)
38 slti
48 lw
4c jal
50 addu 54 lw 58 addiu 5c jr
$1,$8, 101
$5,24($29)
0
$2, $0, $0 $31,20($29) $29,$29,32 $31
$29,$29,-32 $31,20($29) $4, 32($29) $5, 36($29) $0, 24($29) $0, 28($29) $14, 28($29) $14, $14
C → Asm → Obj → Exe → Run Edit Local Addresses
3c bne $1,$0, -10
40 lui $4, 0x1000
44 ori $4,$4,0x0430
$15
$24, 24($29)
Can fix several of these labels now, while others (0x4c) are left for later
Mc OMP273
33
C → Asm → Obj → Exe → Run
0x000000 00100111101111011111111111100000 0x000004 10101111101111110000000000010100 0x000008 10101111101001000000000000100000 0x00000c 10101111101001010000000000100100 0x000010 10101111101000000000000000011000 0x000014 10101111101000000000000000011100 0x000018 10001111101011100000000000011100 0x00001c 00000001110011100000000000011001 0x000020 00000000000000000111100000010010 0x000024 10001111101110000000000000011000 0x000028 00000011000011111100100000100001 0x00002c 10101111101010000000000000011100 0x000030 00100101110010000000000000000001 0x000034 10101111101110010000000000011000 0x000038 00101001000000010000000001100101 0x00003c 00010100001000001111111111110111 0x000040 00111100000001000001000000000000 0x000044 00110100100001000000010000110000 0x000048 10001111101001010000000000011000 0x00004c 00001100000100000000000011101100 0x000050 00000000000000000001000000100001 0x000054 10001111101111110000000000010100 0x000058 00100111101111010000000000100000 0x00005c 00000011111000000000000000001000
Mc OMP273 34
C → Asm → Obj → Exe → Run • Combinewithobjectfilecontaining“printf”
• Editabsoluteaddresses
– In this case edit jal printf to contain actual address of printf
• OMP273 35
Review
foo.c
C Program source code
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 36
Review
– converts a single HLL file into a single assembly language file
• Assembler
– removes pseudo-instructions
– converts TAL into machine language
– creates a checklist for the linker (relocation table).
– does 2 passes to resolve addresses, handling internal forward references
• Compiler
Mc OMP273 37
Review
– combines several .o files and resolves absolute addresses
• Linker
– enables separate compilation, libraries that need not be compiled
• Loader
– loads executable into memory and begins execution
• References
– Textbook 5th edition, Section 2.12, A.2 and A.3
Mc OMP273 38