CS计算机代考程序代写 algorithm assembler Hive compiler Introduction to Computer Systems 15-213/18-243, spring 2009

Introduction to Computer Systems 15-213/18-243, spring 2009

CSE 2421
Linking and Relocation

Required Reading: Computer Systems: A Programmer’s Perspective, 3rd Edition

Chapter 7 through 7.6.3 (inclusive)

2
Reminder – C Compilation Workflow
Option 1: Complete all stages of compilation
%gcc –o hello hello.c

Option 2: Complete 1st three phases first:
– Preprocessor: .c to .i
– Compiler: .i to .s
– Assembler phase: .s to .o
.o is a “relocatable” object file
% gcc –c hello.c
Produces a .o file with unresolved references to symbols

Then, complete the linker phase afterwards:
% gcc hello.o –o hello
Produces an executable by resolving references to any symbols

hello
hello.c

What are linking and relocation?
Linking is the process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed (that is, an executable).

Relocation is the process of adjusting addresses in object modules when the modules are linked with other modules to create an executable.

Why should I care?
Will help you build large programs
Will help with missing modules/linker error resolution
Will help you avoid “dangerous programming errors”
Should you choose to use global variables
Will help you understand language scoping
Will help you understand important system concepts you will see next semester
Virtual memory/paging/memory mapping(Systems II)
Will help you exploit shared libraries

Linking can be done:
At compile time
gcc command
At load time
When an executable loads into main memory
At run time
While an executable is running from main memory
How do we(actually, the operating system) decide?
What makes the most sense with respect to what is being linked and how it’s being used?
Is the code a library function? Individual program?

Related OS concepts
When a process is running, it enhances security if the address space of the process is divided into parts that are only known to the OS:
Read only space:
Read only data (e.g. format strings used with printf or scanf in C) and
Code (i.e., instructions)
Read-write space: data which can be both read and written.

Therefore, when the linker (part of gcc) does linking and relocation, it divides the address space of the executable into these parts.

Example C Program
int sum(int *a, int n);

int array[2] = {1, 2};

int main()
{
int val = sum(array, 2);
return val;
}

extern int array[];

int sum(int *a, int n)
{
int i, s = 0;

for (i = 0; i < n; i++) { s += a[i]; } return s; } main.c sum.c Global External Linker knows nothing of these variables Static Linking Programs are translated and linked using a compiler driver: linux> gcc -Og -o prog main.c sum.c
linux> ./prog

Linker (ld)
Translators
(cpp, cc1, as)
main.c
main.o
Translators
(cpp, cc1, as)
sum.c
sum.o
prog

Source files
Separately compiled
relocatable object files
Fully linked executable object file
(contains code and data for all functions
defined in main.c and sum.c)

Why Linkers?
Reason 1: Modularity

Program can be written as a collection of smaller source files, rather than one monolithic mass.

Can build libraries of common functions (more on this later)
e.g., Math library, standard C library

Why Linkers? (cont)
Reason 2: Efficiency
Time: Separate compilation
Change one source file, compile, and then relink.
No need to recompile other source files.
Consider the function of makefiles…

Space: Libraries
Common functions can be aggregated into a single file…
Yet executable files and running memory images contain only code for the functions they actually use.

What Do Linkers Do?
Step 1: Symbol resolution

Programs define and reference symbols (global variables and functions):
void swap() {…} /* define symbol swap */
swap(); /* reference symbol swap */
int *xp = &x; /* define symbol xp, reference x */

Symbol definitions are stored in object file (by assembler) in a symbol table.
Symbol table is an array of structs
Each entry includes name, size, and location of symbol among other things.

During symbol resolution step, the linker associates each symbol reference with exactly one symbol definition.

What Do Linkers Do? (cont)
Step 2: Relocation

Merges separate code and data sections into single sections

Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable.

Updates all references to these symbols to reflect their new positions.

Let’s look at these two steps in more detail….

Three Kinds of Object Files (Modules)
Relocatable object file (.o file)
Contains code and data in a form that can be combined with other relocatable object files to form executable object file.
Each .o file is produced from exactly one source (.c) file

Executable object file (a.out file)
Contains code and data in a form that can be copied directly into memory and then executed.

Shared object file (.so file)
Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time.
Called Dynamic Link Libraries (DLLs) by Windows

Executable and Linkable Format (ELF)
Standard binary format for object files

One unified format for
Relocatable object files (.o),
Executable object files (a.out)
Shared object files (.so)

Generic name: ELF binaries

15
Object File Format/Organization –
ELF Object File Format (used in Unix/Linux)
The object file formats provide parallel views of a file’s contents, reflecting the differing needs of the linker and the loader

ELF header (Executable and Linkable Format)
-Resides at the beginning and holds a “road map” describing the file’s organization.

Program (or Segment) header table
-Tells the system how to create a process image
-Object files used to build a process image (used by the loader) , i.e., executables must have a program header table; relocatable files do not need one.
-Object files used to do linking must have a Section header table (because it has location and size information for each section); executable object files do not need one.

http://www.sco.com/developers/gabi/2000-07-17/ch4.intro.html
http://docs.oracle.com/cd/E19455-01/806-3773/6jct9o0bs/index.html
http://docs.oracle.com/cd/E19082-01/819-0690/chapter6-46512/index.html

16
Object File Format/Organization (cont)
Section header table
-Contains information describing the file’s sections
-Every section has an entry in the table
-Each entry gives information such as the section name, the section size (needed to compute address information), and so on.

Sections
-Hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, etc.
-Object files used during linking must have a section header table; other object files may or may not have one.

ELF Object File Format
Elf header
Word size, byte ordering, file type (.o, exec, .so), machine type, etc.
Segment header table
Page size, virtual addresses memory segments (sections), segment sizes.
.text section
Code
.rodata section
Read only data: jump tables, …
.data section
Initialized global variables
.bss section
Uninitialized global variables
“Block Started by Symbol”
Better Save Space
Has section header but occupies no space

ELF header
Segment header table
(required for executables)
.text section
.rodata section
.bss section
.symtab section
.rel.txt section
.rel.data section
.debug section
Section header table
0
.data section

ELF Object File Format (cont.)
.symtab section
Symbol table
Procedure and static variable names
Section names and locations
.rel.text section
Relocation info for .text section
Addresses of instructions that will need to be modified in the executable during relocation step
.rel.data section
Relocation info for .data section
Addresses of pointer data that will need to be modified in the merged executable
.debug section
Info for symbolic debugging (gcc -g)
Section header table
Offsets and sizes of each section
ELF header
Segment header table
(required for executables)
.text section
.rodata section
.bss section
.symtab section
.rel.txt section
.rel.data section
.debug section
Section header table
0
.data section

Linker Symbols
Global symbols
Symbols defined by module m that can be referenced by other modules.
E.g.: non-static C functions and non-static global variables. (external linkage)

External symbols
Global symbols that are referenced by module m but defined by some other module.

Local symbols
Symbols that are defined and referenced exclusively by module m.
E.g.: C functions and global variables defined with the static attribute. (internal linkage)
Local linker symbols are not local program variables

Step 1: Symbol Resolution
int sum(int *a, int n);

int array[2] = {1, 2};

int main()
{
int val = sum(array, 2);
return val;
}
main.c
int sum(int *a, int n)
{
int i, s = 0;

for (i = 0; i < n; i++) { s += a[i]; } return s; } sum.c Referencing a global… Defining a global Linker knows nothing of val Referencing a global… …that’s defined here Linker knows nothing of i or s …that’s defined here Local Symbols Local non-static C variables vs. local static C variables local non-static C variables: stored on the stack local static C variables: stored in either .bss, or .data int f() { static int x = 0; return x; } int g() { static int x = 1; return x; } Compiler allocates space in .data for each definition of x C variables in .bss aren’t allocated space until execution time Creates local symbols in the symbol table with unique names, e.g., x.1 and x.2 or, perhaps, x.f and x.g How Linker Resolves Duplicate Symbol Definitions Program symbols are either strong or weak Strong: procedures and initialized globals Weak: uninitialized globals int foo=5; p1() { } int foo; p2() { } p1.c p2.c strong weak strong strong Linker’s Symbol Rules Rule 1: Multiple strong symbols are not allowed Each item can be defined only once Otherwise: Linker error Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol References to the weak symbol resolve to the strong symbol Rule 3: If there are multiple weak symbols, pick an arbitrary one. Linker Puzzles int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} int x=7; int y=5; p1() {} double x; p2() {} int x=7; p1() {} int x; p2() {} int x; p1() {} p1() {} Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? Writes to x in p2 might overwrite y! Writes to x in p2 will overwrite y! Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. References to x will refer to the same initialized variable. Global Variables Avoid if you can Otherwise Use static if you can Initialize if you define a global variable Use extern if you reference an external global variable 26 Step 2: Relocation Relocation merges the input modules and assigns run-time addresses to each symbol When an assembler generates an object module, it does not know where the code and data will ultimately be stored in main memory or the locations of any externally defined functions or global variables referenced by the module A “relocation entry” is generated when the assembler encounters a reference to an data object, function, or jump label whose ultimate location is unknown 2 types R_386_PC64 For PC relative relocation (for labels in jump instructions) R_386_64 Absolute relocation (for data in .data section and for labels in call instructions) A PC relative “address” is not an address at all! It is a displacement which is added to the current PC to get the PC for the next instruction. Jump instructions use PC relative addressing. Absolute relocation, which is used to relocate addresses for data in the .data section, and for labels in call instructions, actually uses a 64 bit address. 27 Static linking – What do linkers do? Step 2. Relocation -Merges separate code and data sections into single sections -Take the code section from each of the relocatable object files, main.o and swap.o, and merge them into a single code section. -Take the .rodata sections from each of the relocatable object files, and merge them into a single .rodata section. -Take the .data sections from each of the relocatable object files, and merge them into a single .data section. -Take the .bss (unitialized file scope variables) sections from individual relocatable object files, and merge them into a single .bss section -Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable. -Updates all references to these symbols (i.e., any encoded instructions which have the addresses of these symbols) to reflect their new positions. Relocation main() main.o sum() sum.o System code int array[2]={1,2} System data Relocatable Object Files .text .data .text .data .text Headers main() swap() 0 More system code Executable Object File .text .symtab .debug .data System code System data int array[2]={1,2} Packaging Commonly Used Functions How to package functions commonly used by programmers? Math, I/O, memory management, string manipulation, etc. Awkward, given the linker framework so far: Option 1: Put all functions into a single source file Programmers link big object file into their programs Space and time inefficient Option 2: Put each function in a separate source file Programmers explicitly link appropriate binaries into their programs More efficient, but burdensome on the programmer Old-fashioned Solution: Static Libraries (You may still have to work with these) Static libraries (.a archive files) Concatenate related relocatable object files into a single file with an index (called an archive). Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. If an archive member file resolves reference, link it into the executable. Creating Static Libraries Translator atoi.c atoi.o Translator printf.c printf.o libc.a Archiver (ar) ... Translator random.c random.o unix> ar rs libc.a \
atoi.o printf.o … random.o

C standard library
Archiver allows incremental updates
Recompile function that changes and replace .o file in archive.

Commonly Used Libraries
libc.a (the C standard library)
4.6 MB archive of 1496 object files.
I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math
libm.a (the C math library)
2 MB archive of 444 object files.
floating point math (sin, cos, tan, log, exp, sqrt, …)

% ar –t libc.a | sort
…
fork.o
…
fprintf.o
fpu_control.o
fputc.o
freopen.o
fscanf.o
fseek.o
fstab.o
…
% ar –t libm.a | sort
…
e_acos.o
e_acosf.o
e_acosh.o
e_acoshf.o
e_acoshl.o
e_acosl.o
e_asin.o
e_asinf.o
e_asinl.o
…

Linking with Static Libraries
#include
#include “vector.h”

int x[2] = {1, 2};
int y[2] = {3, 4};
int z[2];

int main()
{
addvec(x, y, z, 2);
printf(“z = [%d %d]\n”,
z[0], z[1]);
return 0;
}
main2.c
void addvec(int *x, int *y,
int *z, int n) {
int i;

for (i = 0; i < n; i++) z[i] = x[i] + y[i]; } void multvec(int *x, int *y, int *z, int n) { int i; for (i = 0; i < n; i++) z[i] = x[i] * y[i]; } multvec.c addvec.c libvector.a Linking with Static Libraries Translators (cpp, cc1, as) main2.c main2.o libc.a Linker (ld) prog2c printf.o and any other modules called by printf.o libvector.a addvec.o Static libraries Relocatable object files Fully linked executable object file vector.h Archiver (ar) addvec.o multvec.o “c” for “compile-time” Using Static Libraries Linker’s algorithm for resolving external references: Scan .o files and .a files in the command line order. During the scan, keep a list of the current unresolved references. As each new .o or .a file, obj, is encountered, try to resolve each unresolved reference in the list against the symbols defined in obj. If any entries in the unresolved list at end of scan, then error. Problem: Command line order matters! Moral: put libraries at the end of the command line. A real pain in the backside if there is a circular dependency unix> gcc -L. libtest.o -lmine
unix> gcc -L. -lmine libtest.o
libtest.o: In function `main’:
libtest.o(.text+0x4): undefined reference to `libfun’

Modern Solution: Shared Libraries
Static libraries have the following disadvantages:
Duplication in the stored executables (every function needs libc)
Duplication in the running executables
Minor bug fixes of system libraries require each application to explicitly relink (and sometimes restart)

Modern solution: Shared Libraries
Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time
Also called: dynamic link libraries, DLLs, .so files

Shared Libraries (cont.)
Dynamic linking can occur when executable is first loaded and run (load-time linking).
Common case for Linux, handled automatically by the dynamic linker (ld-linux.so).
Standard C library (libc.so) usually dynamically linked.
Dynamic linking can also occur after program has begun
(run-time linking).
In Linux, this is done by calls to the dlopen() interface.
Distributing software.
High-performance web servers.
Runtime library interpositioning.
No explicit requirement to recompile/relink after a library function update
Shared library routines can be shared by multiple processes.
Think of all running processes using the same spot in memory for the printf() program
More on this when you learn about virtual memory in Systems II

Dynamic Linking at Load-time

Translators
(cpp, cc1, as)
main2.c
main2.o

libc.so
libvector.so
Linker (ld)
prog2l

Dynamic linker (ld-linux.so)

Relocation and symbol table info

libc.so
libvector.so
Code and data

Partially linked
executable object file
Relocatable
object file
Fully linked
executable
in memory

vector.h
Loader (execve)
unix> gcc -shared -o libvector.so addvec.c multvec.c

Linking Summary
Linking is a technique that allows programs to be constructed from multiple object files.

Linking can happen at different times in a program’s lifetime:
Compile time (when a program is compiled)
Load time (when a program is loaded into memory)
Run time (while a program is executing)

Understanding linking can help you avoid nasty errors and make you a better programmer.

Related Posts