OPTIMISING MEMORY STRUCTURES
• Memory Introduction • Motivation
• Hardware view
Copyright By PowCoder代写 加微信 powcoder
Motivation
• Why is memory structure important?
• With current hardware memory access has become the most significant resource impacting program performance.
• Changing memory structures can have a big impact on code performance.
• Memory structures are frequently global to the program
• Different code sections communicate via memory structures.
• The programming cost of changing a memory structure can be very high.
Programmer’s perspective:
• Memory structures are the programmers responsibility
• At best the compiler can add small amounts of padding in limited circumstances.
• Compilers can (and hopefully will) try to make best use of the memory structures that you specify (e.g. uni-modular transformations)
• Changing the memory structures you specify may allow the compiler to generate better code.
Hardware view
• Memory structures are a feature of the programming language.
• Hardware only deals with memory addresses and registers.
• Memory is viewed as a linear sequence of bytes
• The numerical position of a byte in the sequence is its “address”
• In most modern architectures these are virtual addresses (each process has its own address space)
• The compiler has to map high level language data types onto hardware addresses.
• Most languages provide some guidance on how this should be done to allow inter-operation between different languages and compilers.
• Registers are storage locations embedded within the processor itself
• Fastest form of storage but very limited capacity
• Most modern cpus use a load/store architecture
• Separate sets of instructions that
• Transfer data between memory and registers
• Perform operations on data in registers
• Registers do not have addresses
• In assembly language they are given names
• In binary machine code bit-fields within instructions
• Instruction set can only support limited number of registers without making instructions longer
• Typically 32 registers
• Though different classes of instruction can refer to
different sets
• One important use of registers is to store memory addresses
• Typical load store instruction specifies • A source/destination register
• A base register containing the memory address • A constant offset (limited number of bits)
Address sizes
• Length of the address registers sets a limit on maximum amount of memory a process can reference
• 32-bit addresses ➔ 232 = 4 * 10243 = 4GiB
• 64-bit addresses ➔ 264 = 16 * 10246 = 16EiB
• This may not be the ultimate limit.
• Downside is stored addresses take up more space.
• Many ISAs (Instruction Set Architecture) come in different flavours with different address sizes.
• Sometimes same CPU can operate in multiple modes. • Some Oss support per-process selection.
Intrinsic types
• CPU supports different intrinsic types • 16-bit integer
• 32-bit integer
• 64-bit integer
• 32-bit floating point number • 64-bit floating point number
• The same registers may be used for multiple types • Floating-point usually stored in different register files to
integer/address
• Different instruction variants for the different intrinsic types
Address alignment
• Also different load/store instructions for different sized types
• Common restriction is that the address of an intrinsic type must be a multiple of the size of the type (in bytes)
• Use sizeof operator in C to query size of a type. • This is the natural address alignment of the type
• Compiler will ensure this automatically
• Some CPUs will support mis-aligned loads but at a performance cost.
Vector registers
• Most modern CPUs have vector extensions
• SIMD instructions that perform multiple instances of the same operation in parallel
• This includes vector load/store operations
• Load multiple words of data
• Destination/source are extra-long vector registers that hold multiple values
• Sometimes these correspond to a set of scalar registers or could be distinct set of registers.
• Vector load/store instructions usually store a contiguous block of memory
• Also may only work (or work faster) if addresses are aligned with size of the vector
• This is a problem as HLL program written in terms of scalar types
Pointers and addresses
• Some languages (like C) expose addresses to the programmer in “pointer” types.
• In C converting a pointer to a sufficiently large integer type usually gives the address.
• Addresses of larger types (int, float, long etc) are the address of the first byte in the value. This usually has to be a multiple of the size of the type.
• Incrementing a C pointer actually adds the size of the type to the address.
long *p = &l; // pointer to long
p++; // move to next long value
// address increases by sizeof(long)
Function pointers
• Machine instructions are also stored in memory.
• Every instruction also has an address
• The CPU keeps track of the currently issuing instruction in a register (the program counter)
• In C a bare function name is a constant with the value of the entry point of the function
Virtual addressing
• Each process sees its own address space.
• Virtualphysical mapping
• Mapped in blocks called “pages”
• Mappings stored in page tables controlled by OS • Cache of mappings held in processor TLB
• Not all addresses have a defined mapping (page fault if accessed)
• OS can “swap” memory pages out to disk.
• Access to missing pages cause OS to reload page
• Performance implications
• TLB cache misses cause large performance impact. • Page fault to disc, very large performance hit
• Simple Last-In-First-Out data allocation mechanism usually used by compilers to allocate variables declared local within a subroutine.
• Contiguous block of memory with register (the stack pointer) pointing to the last allocated position
• Allocate data by moving pointer down
• De-allocate by moving pointer back up.
• Data is accessed relative to stack pointer.
• Stack can be made larger by OS mapping additional memory pages onto the end.
• More advanced memory allocator • Usuallyimplementedvialibrary
• E.gmalloc/free
• Keeps linked list of unused memory regions
• Allocation
• Searchlistforsufficientlylargeblock
• Removeblock
• Return part of block as allocated memory, return unused portion to list
• Most heap libraries return addresses aligned for largest intrinsic type.
• De-allocation
• Returnmemorytofreelist
• Heap grown by OS mapping new pages and adding these to free list
• Heap allocators are software.
• Behaviour can change easily
• Usually hard to predict location returned by heap allocator
• Fragmentation is a common problem
• Frequent allocation/free of memory blocks result in free-
list consisting of large numbers of small memory block
• Make it slow (or impossible) to allocate large blocks
• Some heap allocators pre-allocate pools of commonly requested block sizes to work round this.
Code segment
• Virtual memory is mapped in pages each with a set of flags
• Execute – may contain executable code • Read/Write
• Pages may be private to the process or shared.
• Code-segment
• region of memory where program code is mapped
• Mapping address fixed by the linker
• Pages execute enabled, usually read-only, shared between all running instances of the program
• Shared libraries mapped independently and shared between all processes that use that library
• Data-segment
• Statically allocated variables for the process • Read/Write private
• Mapped to fixed address similar to code.
• Heap/Stack traditionally mapped to regions above/below statically allocated regions.
• Threading makes this more complicated.
• You can map a binary file into process address space using the mmap system call.
• OS swaps to/from the file (useful for random access to large files)
• Can also be used to create anonymous memory blocks (swap to regions of the swap disk).
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com