CS代考 PERFORMANCE OPTIMISATION

PERFORMANCE OPTIMISATION
Overview of compilers

What is a compiler

Copyright By PowCoder代写 加微信 powcoder

• A compiler is a translator
• It translates a program from one language to another
• High Level Language •C
• C++ • F90
• Interpreted Language
• Various types of Byte-code
• Executable Language (machine code) • X86 instructions
• ARM instructions • Etc.

Compiler optimisation
• Compilation often proceeds via several stages
• Each stage generates a new representation of the program
• May be text or a binary data-structure.
• Each representation must accurately represent the
original program
• However there are frequently multiple different choices the compiler can take
• Order of instructions • Choice of instructions
• At each stage compiler may have an opportunity to optimise (make choices appropriate to the target environment)

Traditional compilers • HLL -> machine code.
processor.
• Stored as bit-patterns in memory/files.
• Do not need to run on the target machine
• May be a cross-compiler running on an entirely different type of
processor.
• Using an built-in model of the target environment
• The more information the compiler has about its target environment the better job it can do.
• Also need to tell the compiler your requirements • Performance
• Portability
• Speedofcompilation.
• Usuallybyseveraltransformations.
• Machine code is the language understood natively by

Interpreted Languages
• Many modern languages are designed to be interpreted.
• Final representation of the program are inputs for a program:
• virtual machine • Interpreter
• Various good features:
• Interpreted languages are easier to implement.
• Interpreted languages are easier to port to new environments.
• Often possible to analyse (reflection) or even modify the code representation while the program runs.
• Interpreted languages usually slower than compiled • However often fast enough

Interpreters
• Hardware cpus are also “interpreters” except:
• They are implemented as electronic circuits
• Interpreted language only contains low-level operations like
• Read memory address into register
• Add two registers
• Jump to memory location
• Software interpreters:
• Can implement high level operations
• String manipulation • Lists,Sets Maps etc.
• Programs that mostly consist of these operations perform just as well when interpreted.

Just-In-Time compilers
• JIT compilers generate executable machine code when the program runs.
• Often part of a virtual machine
• Sometimes used in combination with interpreter (JVM) • Sometimes always compiles before execution (.NET)
• JIT compilers potentially have better knowledge of the target environment (they are running in it)
• More constrained about the time and resources they can consume.

Components of a compiler
Object Code
Code Generator

• Break the source code into a stream of tokens (words)
• Language Keywords • Strings
• Comments stripped out.
• Whitespace usually ignored.
• Basic constants like numbers may be translated to binary representation.

• Parsers analyse a stream of tokens according to the
syntax rules of the input language (grammer).
• For example
• expr : number
• | expression operator expression
• | ‘(‘ expression ‘)’
37 x (3 + 2)
• Used to build an Abstract Syntax Tree representation • Usually in memory tree structure.
• Simplified to an Intermediate Representation
• Captures required behaviour dropping information only
relevant to source language.
• Interpreted languages pass IR to the interpreter

Optimisers
• Optimisation passes work on the IR
• IR is re-written
• May use same IR
• May transform to lower level IR closer to machine code.
• Optimisations are always optional transformations. • First job of the compiler is to produce a solution by a
direct transliteration of the IR.
• This is what you get if the compiler can’t perform an optimisation for any reason.
• More on this in later lectures

Code generators
• Code generators translate IR into executable machine code.
• Different code generators for different processors/ISAs
• Multiple possible translations
• Possibility for optimisation at this stage as well.
• Output is object code • Machineinstructions
• Compile time constant data
• Memoryaddressesofroutinesarenotfinalisedyet.
• Object-file – binary file containing object code/data and symbol table of uncommitted addresses.
• Assembly language – text representation of object code, can generate an object file from this using “assember”

• Application Binary Interface
• Set of conventions on how one subroutine or function is to call another
• How arguments are to be passed
• How results are to be returned
• Which registers (if any) are preserved across a call.
• Compiler only needs to know the interface signature
• Called routine does not even need to exist
• Call address is uncommitted entry in symbol table

Subroutines and functions
• By default the compiler will independently translate each subroutine and function in the program.
• Each time a subroutine or function is called there is an overhead
• Registers must be saved
• Argument registers setup according to API • Jump/call to location of subroutine
• Where the compiler knows the contents of a subroutine they may optimise by in-lining the contents to reduce this overhead.
• Each sub-routine is also always generated as stand-alone code.

• Linkers combine object files into an executable
• Resolves dependencies
• All referenced routines/data must exist in one of the object files
• Addresses are allocated to each.
• These are virtual addresses. Location will be the same for each
running instance.
• Missing addresses are patched into code based on symbol table.
• When program runs executable is mapped into process memory and processor jumps to start location.

Dynamic and Static linking
• Previous slide describes static linking.
• There is also dynamic linking.
• Dynamic libraries are loaded when the program runs • JIT linking
• Dynamic libraries are created from object files using linker
• But loaded at run-time by link-loader.
• Library is mapped into program memory
• Single set of virtual memory pages shared by all running
processes that use the library.
• Routines usually accessed via a jump-table.
• Location of the jump table statically allocated by static linker • Location of library filled in when dynamically loaded

• Simple C example
#include
double add(double a, double b){
double result = a+b; printf(“result=%g\n”,result); return result;
• Defines one symbol uses another
-bash-4.1$ cc -c example.c -bash-4.1$ nm example.o 0000000000000000 T add

Result of linking
• Linker assigns addresses
-bash-4.1$ cc -o hello hello.c example.o -bash-4.1$ nm hello 0000000000600748 d _DYNAMIC
00000000004004f0 T add
00000000004004c4 T main

printf is from dynamic library
-bash-4.1$ ldd hello
linux-vdso.so.1 => (0x00007fff32e63000)
libc.so.6 => /lib64/libc.so.6 (0x0000003d33800000) /lib64/ld-linux-x86-64.so.2 (0x0000003d33400000)

Assembly language
-bash-4.1$ cc -S example.c -bash-4.1$ cat example.s
– .section .rodata
.file “example.c”
movl $.LC0, %eax movsd -8(%rbp), %xmm0 movq %rax, %rdi
movl $1, %eax
call printf
movq -8(%rbp), %rax movq %rax, -40(%rbp) movsd -40(%rbp), %xmm0 leave
.cfi_def_cfa 7, 8
.cfi_endproc
.size add, .-add
.ident “GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7-17)”
.string “result=%g\n”
.globl add
.type add, @function
add: .LFB0:
.cfi_startproc
pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16
movq %rsp, %rbp .cfi_def_cfa_register 6
%xmm0, -24(%rbp)
%xmm1, -32(%rbp)
-24(%rbp), %xmm0
-32(%rbp), %xmm0
%xmm0, -8(%rbp)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com