CS代考 How to Start a Program

How to Start a Program
Complier, Assembler, Linker, and Loader
Mc OMP273 1

• Compiler • Assembler • Linker
• Loader
Outline
Mc OMP273 2

Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 3

Compiler
• Input: High-Level Language Code (e.g., C)
• Output: Assembly Language Code (e.g., MIPS)
– In MARS we use .asm as a file extension, but .s is a common file
extension for assembly in other scenarios
• Note: Output may contain pseudo-instructions
– Assembler understands these instructions, but not the machine
Mc OMP273 4

Compiler and Standards
• Compilergeneratesassemblycodeanddirectivesthatrespect conventions
– For example, function call register conventions
– There are many more details concerning data representation and
function linkage which are beyond the scope of this course – Interested? Check COMP-520 Compiler Design
Mc OMP273 5

Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 6

Assembler
• Reads and Uses Directives
• Replace Pseudo-instructions • ProduceMachineCode
• Creates Object File
Mc OMP273 7

Assembler Directives
• Directives provide directions to assembler, but do not produce machine instructions
.text: Subsequent items put in user text (instructions) segment .data: Subsequent items put in user data segment
.globl sym: declares sym global allowing reference from other files .asciiz str: Store string str in memory and null-terminate it .word w1…wn: Store n 32-bit words in successive memory locations
Mc OMP273 8

Pseudo-instruction Replacement Assembler converts MAL to TAL
Pseudo (MAL):
subu $sp,$sp,32 sd $a0,32($sp)
addu $t0,$t6,1 ble $t0,100,loop
Real (TAL):
addiu $sp,$sp,-32 sw $a0,32($sp)
sw $a1,36($sp) addiu $t0,$t6,1 slti $at,$t0,101 bne $at,$0,loop
Mc OMP273 9

Producing Machine Language (1/2)
• SimpleinstructionsforAssembler
– Arithmetic, Logical, Shifts, and so on
– All necessary info is within the instruction already
• WhataboutBranches?
– PC-Relative
– Once pseudo-instructions are replaced by real ones, we know by how many instructions to branch
• So these 2 cases are handled easily
Mc OMP273 10

Producing Machine Language (2/2) • Whataboutjumps(jandjal)?
– Jumps require absolute address
• Whataboutreferencestodata?
– la gets broken up into lui and ori
– These will require the full 32-bit address of the data
• These can’t be determined yet
– Must wait to see where this code will appear in final program
• Two tables are used to help assembly and later resolution of addresses
Mc OMP273 11

1st Table: Symbol Table
• Symbol table: List of “items” in this file that may be used by this and other files
• What are they?
– Labels: function calling
– Data: anything in the .data section; variables which may be accessed across files
• FirstPass:recordlabel-addresspairs • SecondPass:producemachinecode
– Result: can jump to a label later in code without first declaring it
Mc OMP273 12

2nd Table: Relocation Table
• Relocation Table: line numbers of “items” in this file which need the address filled in (or fixed up) later.
• What are they?
– Any label jumped to: j or jal
• Internal (i.e., label inside this file) • external (including lib files)
– Any absolute address of piece of data
• Such as used by the load address la pseudo-instruction: la $destination,label
Mc OMP273 13

Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 14

Object File Format
• object file header: size and position of the other pieces of the object file
• text segment: the machine code
• data segment: binary representation of the data in the source file
• relocation table: identifies lines of code that need to be “handled”
• symbol table: list of this file’s labels and data that can be referenced
• debugging information
Mc OMP273 15

Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 16

Link Editor/ does Link Editor do?
Mc OMP273 17

Link Editor/ several object (.o) files into a single executable
Mc OMP273 18

Link Editor/ ? Enables Separate Compilation of files
Mc OMP273 19

Link Editor/Linker
• Step 1: Combine text segment from each .o file
• Step 2: Combine data segment from each .o file, and concatenate this onto end of text segments
• Step 3: Resolve References
– Go through Relocation Table
– Handle each entry using the Symbol Table • That is, fill in all absolute addresses
Mc OMP273 20

Four Types of Addresses
• PC-Relative Addressing (beq, bne): – never fix up (never “relocate”)
• Absolute Address (j, jal): – always relocate
• ExternalReference(usuallyjal): – always relocate
• Symbolic Data Reference (often lui and ori): – always relocate
Mc OMP273 21

Resolving References (1/2)
• Linker assumes first word of first text segment is at address 0x00000000
• Linkerknows:
– Length of each text and data segment – Ordering of text and data segments
• Linkercalculates:
– Absolute address of each label to be jumped to (internal or external)
and each piece of data being referenced
Mc OMP273 22

Resolving References (2/2) • To resolve references:
– Search for reference (data or label) in all symbol tables
– If not found, search library files (for example, for printf)
– once absolute address is determined, fill in the machine code appropriately
• Output of linker:
– Executable file containing text and data (plus a file header)
• May not have library object files resolved if dynamically loaded Mc OMP273 23

Steps to Starting a Program
COMPILER
ASSEMBLER
LINKER
LOADER
Memory
foo.c
C Program source code
foo.s
Assembly program
foo.o
Object (machine language module)
lib.o
Library (similar to object file)
a.out
Executable (machine language program)
Mc OMP273 24

Loader (1/3) • Executable files are stored on disk.
• When one is to be run, loader’s job is to load it into memory and start it running.
• In reality, loader is the operating system (OS) – Loading is one of the OS tasks
Mc OMP273 25

Loader (2/3) • So what does a loader do?
• Reads executable file’s header to determine size of text and data segments
• Creates new address space for program large enough to hold text and data segments, along with a stack segment
• Copiesinstructionsanddatafromexecutablefileintothenew address space
Mc OMP273 26

Loader (3/3)
• Copies arguments passed to the program onto the stack
• Initializesmachineregisters
– Most registers cleared, but stack pointer must be initialized to top of
the stack memory space
• Jumpstostart-uproutinethatcopiesprogram’sarguments from stack to registers and sets the PC
– If main routine returns, start-up routine terminates program with the exit system call
Mc OMP273 27

Dynamic Linking
• Some operating systems allow “dynamic linking”
• Both the loader and the linker are part of the operating system – so modules can be linked and loaded at runtime
• If a module is needed and already loaded, no need to load again • Called DLLs in Windows, .so in Unix
(Dynamically Linked Library / Shared Object)
Mc OMP273 28

C → Asm → Obj → Exe → Run Compile C Source
Let us consider compilation of the following code…
#include
int main (int argc, char *argv[]) {
}
int i;
int prod = 0;
for (i = 0; i <= 100; i = i + 1) { prod = prod + i * i; } printf ("The sum squares from 0 .. 100 is %d\n", prod); C → Asm → Obj → Exe → Run Identify Pseudo-instructions .text .align 2 .globl main main: subu $sp,$sp,32 sw $ra, 20($sp) sd $a0, 32($sp) sw $0, 24($sp) sw $0, 28($sp) loop: lw $t6, 28($sp) mul $t7, $t6,$t6 lw $t8, 24($sp) addu $t9,$t8,$t7 sw $t9, 24($sp) addu $t0, $t6, 1 sw $t0, 28($sp) lw $a1, 24($sp) jal printf move $v0, $0 lw $ra, 20($sp) addiu $sp,$sp,32 j $ra .data .align 0 str: .asciiz "The product from 0 .. 100 is %d\n" ble $t0,100, loop la $a0, str FINE PRINT: The modification of the stack pointer may look strange, but this is ultimately from a real example of compilation... a number of the real details are being omitted here (ABI,etc.), some of which we will see later. C → Asm → Obj → Exe → Run Remove Pseudoinstructions, Assign Addresses 00 addiu $29,$29,-32 40 04 sw $31,20($29) 44 08 0c 10 sw 14 sw 18 lw 1c mult 20 mflo 24 lw 28 addu 2c sw 30 addiu $8,$14, 1 34 sw $8,28($29) 38 3c 48 lw $5,24($29) 4c jal printf 50 addu $2, $0, $0 54 lw $31,20($29) 58 addiu $29,$29,32 5c jr $31 sw $4, 32($29) sw $5, 36($29) $0, 24($29) $0, 28($29) $14, 28($29) $14, $14 $24, 24($29) $25,$24,$15 $25, 24($29) $15 lui $4, l.str ori $4,$4, r.str slti $1,$8, 101 bne $1,$0, loop • Symbol Table Label main: loop: str: printf: Address 0x00000000 0x00000018 0x10000430 - C → Asm → Obj → Exe → Run Symbol Table Entries • RelocationTable Address Instruction/Type Dependency 0x0000004c jal printf Mc OMP273 32 00 addiu 04 sw 08 sw 0c sw 10 sw 14 sw 18 lw 1c multu 20 mflo 24 lw 28 addu $25,$24,$15 2c sw $25, 24($29) 30 addiu $8,$14, 1 34 sw $8,28($29) 38 slti 48 lw 4c jal 50 addu 54 lw 58 addiu 5c jr $1,$8, 101 $5,24($29) 0 $2, $0, $0 $31,20($29) $29,$29,32 $31 $29,$29,-32 $31,20($29) $4, 32($29) $5, 36($29) $0, 24($29) $0, 28($29) $14, 28($29) $14, $14 C → Asm → Obj → Exe → Run Edit Local Addresses 3c bne $1,$0, -10 40 lui $4, 0x1000 44 ori $4,$4,0x0430 $15 $24, 24($29) Can fix several of these labels now, while others (0x4c) are left for later Mc OMP273 33 C → Asm → Obj → Exe → Run 0x000000 00100111101111011111111111100000 0x000004 10101111101111110000000000010100 0x000008 10101111101001000000000000100000 0x00000c 10101111101001010000000000100100 0x000010 10101111101000000000000000011000 0x000014 10101111101000000000000000011100 0x000018 10001111101011100000000000011100 0x00001c 00000001110011100000000000011001 0x000020 00000000000000000111100000010010 0x000024 10001111101110000000000000011000 0x000028 00000011000011111100100000100001 0x00002c 10101111101010000000000000011100 0x000030 00100101110010000000000000000001 0x000034 10101111101110010000000000011000 0x000038 00101001000000010000000001100101 0x00003c 00010100001000001111111111110111 0x000040 00111100000001000001000000000000 0x000044 00110100100001000000010000110000 0x000048 10001111101001010000000000011000 0x00004c 00001100000100000000000011101100 0x000050 00000000000000000001000000100001 0x000054 10001111101111110000000000010100 0x000058 00100111101111010000000000100000 0x00005c 00000011111000000000000000001000 Mc OMP273 34 C → Asm → Obj → Exe → Run • Combinewithobjectfilecontaining“printf” • Editabsoluteaddresses – In this case edit jal printf to contain actual address of printf • OMP273 35 Review foo.c C Program source code COMPILER ASSEMBLER LINKER LOADER Memory foo.s Assembly program foo.o Object (machine language module) lib.o Library (similar to object file) a.out Executable (machine language program) Mc OMP273 36 Review – converts a single HLL file into a single assembly language file • Assembler – removes pseudo-instructions – converts TAL into machine language – creates a checklist for the linker (relocation table). – does 2 passes to resolve addresses, handling internal forward references • Compiler Mc OMP273 37 Review – combines several .o files and resolves absolute addresses • Linker – enables separate compilation, libraries that need not be compiled • Loader – loads executable into memory and begins execution • References – Textbook 5th edition, Section 2.12, A.2 and A.3 Mc OMP273 38