10. Memory and the OS
•
The Memory Hierarchy
Larger memory (more bytes) is inherently slower than a smaller one:
– All else equal, larger physically, so signals must traverse longer wires;
– When addresses are bigger, address decoding needs bigger circuits (more delays).
– Note: address decoding selects a memory word (location) given the value on the address bus.
Identify hierarchy of memory levels in a computer. Each stage much faster/smaller than next.
1. CPU Registers are based on fast bi-stable electronic components called flip-flops. Each of these uses 6 transistors to store a bit. This technology, called static RAM is volatile.
2. Cache RAM (see later) is also static RAM and volatile
3. Primary memory typically uses a different technology called dynamic RAM (DRAM) which needs just 1 transistor to store a bit. Allows bigger memories on same area of a chip but maybe 10x slower than SRAM. Also volatile.
4. Secondary memory. Two technologies in common use: magnetic spinning hard disks and solid state disks (SSDs) which are faster but smaller. Non-volatile.
5. Tape (magnetic surface, reeled tape). Only in server-type installations. Archivable.
Systems and Networks 10. Memory Hierarchy 2
•
Cache Memory
• Modern CPUs invariably have a small amount of fast static RAM called a cache memory usually placed physically next to CPU itself (also on same chip).
• The cache is much faster (~10x) but much smaller than primary memory.
– E.g. typical desktop might have 16GB of primary memory but only 12MB of cache.
• Cache can be instruction only or instruction and data.
• Data or instruction word fetched from primary memory for the first time also automatically stored in cache.
• Often instructions and data that have been used once are used again soon after.
– This is called locality of reference.
– If items are in the cache they can be retrieved faster than from primary memory.
• Hardware accesses the cache automatically when primary memory is referenced.
– If the item is in the cache (a cache hit), primary memory access never takes place (fast)
– If the item is not in the cache ( a cache miss) primary memory is accessed anyway (slower)
• Cache has limited space, so must throw something out when full, Various strategies: e.g. least recently used (LRU) item is eliminated.
• Note: writes (stores) do not benefit so obviously from caching and many caches are designed only to speed up reads.
Systems and Networks 10. Memory Hierarchy 3
• •
•
• •
• •
The Memory Management Unit
The MMU intercepts the address bus as it leaves the CPU (usually on same chip as CPU).
Examines addresses output by CPU (by program) and translates them.
– The addresses sent out by the CPU are called logical addresses (what the programmer wants).
– The addresses sent out by the MMU are physical addresses (real addresses).
The MMU is informed by the CPU which process is currently running.
– Each process has its own translation table set up by the OS and loaded into the MMU.
– E.g. logical address 0 ($0000) maps onto a different physical address for Process 1 than Process 2
MMU logical-to-physical translations keep the physical addresses used by processes apart.
The MMU connected to the data bus so that the CPU can write to it to reconfigure (e.g. change process info)
− This can only be done when the OS is running.
− Only the OS can alter the behaviour of the MMU.
Only logical addresses used by a process are translated. OS works out which for each process.
If a process tries to access a logical address which cannot be translated, this is a memory fault.
Logical address
Address Bus
Data Bus
Memory fault line
Physical address
CPU
MMU
Physical Memory
– Signalled to CPU on wire (H/L)
Systems and Networks 10. Memory Hierarchy
CPU can write new config data to MMU
4
The Working Set
• What if several concurrent processes are such that their images will not all fit in physical primary memory at same time?
• Most (though not all) programs spend most of their time using a small portion of the address space available to them.
– Programs have lots of loops and often operate repeatedly on a small group of variables and data structures. Often <1% of the address space accounts for ~99% of a program’s recent accesses. This is called the working set.
– The working set changes over time but usually not quickly
• Idea: keep the working set items in primary (physical) memory for all running processes, keep inactive code and data on disk.
– Primary memory content is stored on disk in blocks called pages (~2- 4KB). Pages are retrieved from disk as needed.
• In a virtual memory system, when a process needs a code or data item not in primary memory, MMU detects this.
– Signals memory fault to CPU via memory fault line.
– Process suspended while the OS organises the transfer of whole page containing missing item from disk to memory using disk I/O.
– If memory full, least-used page(s) will be written to disk to free space.
– Transfer is transparent to the process (performance hit though)
– A process can seem to have a full address space even when there is not enough physical memory for this.
Process 1 Address Space 1 GByte
2 GByte Physical Memory
Process 2 Address Space 1.5 GBytes
Systems and Networks 10. Memory Hierarchy 5
OS Address Space 256 MBytes
Exceptions
• Interrupts and memory faults have much in common. The idea can be generalised as follows.
• An exception is a hardware or software event which causes the CPU to stop executing the current process and jump to an exception handler routine, usually in the OS.
• In addition to interrupts and memory faults other familiar exceptions include system reset and trap instructions.
• Every exception has its own handler. Since this is in the OS, exceptions force a control change to the OS. Traps are used to request OS services (including I/O).
• When the handler is finished, the suspended process may resume if the CPU state has been saved.
– However, other processes may be allowed to run first
– Some exceptions (e.g. Reset, some memory faults) do not allow a resume and abort any running process.
Systems and Networks 10. Memory Hierarchy 6
User and System Mode
• Hardware support is needed to control access to the MMU
– At all times, the processor is either in user mode (U-mode) or in system mode (S-mode)
– The instructions that access the MMU are called privileged instructions (only run in S-mode)
• To implement this, the CPU contains a flag bit (called the CPU mode bit) which indicates the current mode. This is usually in the status register.
• The hardware executes a privileged instruction as follows:
– If system mode, perform the instruction
– If user mode, cause a privilege exception which transfers control immediately to the OS
• It is essential to ensure that the OS can execute its key portions in S-mode.
• User processes cannot change CPU to S-mode without passing control to OS.
• Exception flips CPU automatically to S-mode. OS controls all handler routines.
• A privileged instruction is used to set U-mode before a user process is run.
• Every operation that might be dangerous must be executed in S-mode (includes all I/O).
• To perform I/O, a program builds a request: data structure that describes operation wanted.
– Then program executes a TRAP. Switches to S-mode.
– The OS takes control from your program and examines the request you made.
– If legitimate request OS executes privileged instructions to perform the action
– If not legitimate, OS refuses the action!
Systems and Networks 11: Protection 7
Files
• Secondary memory is much larger but slower than primary.
• SSD in modern desktop maybe 1TB, magnetic disk 10TB. But SSD much faster (100x faster to access data randomly, 10x faster to transfer it).
• Data in secondary memory is stored in files.
• The MMU and system/user modes give protection to information in memory
while a program is running
• Data that is stored permanently in files needs file protection, which is enforced by the operating system
• File access is essentially a specialised form of I/O
– User programs are absolutely prohibited from performing any I/O.
– To do I/O, a user program sends a request to the operating system.
– The OS checks the request, to determine whether the user is the owner of the file; if the request is OK, the OS performs the actual machine language instructions to do the I/O
Systems and Networks 11: Protection 8
Hard Disks
• Hard disks are so-called secondary memory and are not directly addressable by the CPU.
– Instead disks are treated as I/O devices and accessed via a special I/O interface called a disk controller.
• Electromechanical: data written on circular tracks on rotating disk. Track is divided into sectors (usually 512 bytes)
• To access data on a disk must first move the read head to the correct track. This is called seeking. Time taken is seek time and because of the mechanical nature it is slow.
– Then wait for rotation of disk to bring requested sector under the head (rotational latency). Modern desktop disks spin at up to 7200 rpm.
– At this point data can be streamed rapidly at a much faster data transfer rate. In a modern disk, rates of > 200MBytes/sec are possible
• Large modern disk size 16TB (16 x 1012 bytes), access time <5ms
• 1,000 times bigger than RAM, 1,000,000 times slower on access (perhaps 100 times slower on sustained transfer).
• Solid state disks (SSDs) are faster but lower capacity (4TB is big) than mechanical drives.
– Access times < 50 s
– DTR 10x mechanical (still much slower than RAM)
Above is a disk platter. Tracks are the concentric circles, each divided into sectors.
Systems and Networks 10. Memory Hierarchy
9
Appendix: Paging I
• Virtual memory system is organised & controlled by the MMU.
– Logical address space of process is aka its virtual address space.
– Address sent out by a process (program) is a virtual address.
– Every process has its own virtual address space.
• Virtual space of each process is divided into fixed size contiguous chunks called pages (~2Kbytes or 4Kbytes each).
– Divide each virtual address into two fields: most significant is a page number, least significant an offset into the page
• Physical address space (real primary memory of the computer) is divided into blocks called page frames each able to hold one page.
– Some pages for a process need not be in primary memory even when that process is running
– When page is loaded into memory it is placed in a free frame.
• OS (and MMU) must keep track of which page is in which frame (or where on disk)
– Maintain page table of mappings to page frames or disk storage.
– Each process needs a page table of its own.
Virtual address
Page number
offset
MMU maps page into page frame
Systems and Networks 10. Memory Hierarchy 10
The Page Table
Disk
Memory
Page Table
0
1
1
0
0
1
Page Table gives location of each page
Systems and Networks 10. Memory Hierarchy 11
Paging II
• The MMU examines the page number in every virtual address and consults the page table. There are two possibilities
1. The page is in primary memory. The MMU translates the virtual address to a physical one by replacing its page number with the number of the page frame containing it. This dynamic address translation is done by the MMU hardware and so fast that the memory cycle is not usually slowed at all
2. The page is not in primary memory and must be copied from disk before it can be accessed. This is a page fault.
• On a page fault, MMU sends a hardware memory fault signal to the CPU which aborts the memory cycle and jumps to a memory fault handler routine in the OS.
– The handler routine transfers the required data from disk to memory and then allows the process to resume.
– Very like an interrupt but memory faults occur in the middle of instructions. CPU must save all of its state (not just programmer’s model) if process is to resume.
• By keeping virtual address spaces separate, MMU protects one process from another
– However, if desired a page can be shared between two processes.
– The OS can manipulate and bypass the MMU. It can therefore access anything.
Systems and Networks 10. Memory Hierarchy 12
Unassessed Exercise
• Consider the instruction sequence:
lea R15,10[R0]
lea R1,1[R0]
lea R2,0[R0]
loop load
add R4,R4,R4
store R4,array[R2]
add R2,R1,R2
cmplt R10,R2,R15
jumpt R10,loop[R0]
trap R0,R0,R0
– What does this loop do?
– How many memory cycles would this loop involve?
– If a primary memory read or write takes 10ns, how long will this program take to execute with no cache? (You may assume that an instruction takes exactly as long as the time to conduct all its memory cycles).
– If a cache is introduced with an access time of 1ns, how many cache hits are there?
– If the cache read or write time is 1ns, how long will the program take to execute.
R4,array[R2]
Systems and Networks 10. Memory Hierarchy 13