Squishy Maps for Soft Body Modelling Using Generalised Chain Mail
KIT308/408 (Advanced) Multicore Architecture and Programming
Basic CPU Architecture
Dr. Ian Lewis
Discpline of ICT, School of TED
University of Tasmania, Australia
1
Introduce the topics that will be covered in the general computer architecture part of this course
Show the big picture, how all of the topics relate and fit in together
Give some idea of the type of concepts you will be expected to understand
We need to understand these architectures before we can understand why multicore architectures are (currently) essential and the different designs of GPUs
Introduce (or refresh memories) on low-level concepts and terminology
Purpose of Lecture
2
3
Core Components
Central Processing Unit (CPU)
Processes data
Executes a list of instructions (programs)
Major focus of performance improvements
Memory (RAM)
Holds the information that the CPU processes
Programs and data are both stored here
Connections
For transfer of data between CPU and RAM
CPU
Memory Bus
RAM
As we’ve seen, main memory technology has improved at a slower rate than that of CPUs
CPU runs at a higher speed than main memory
Another component added between memory and CPU to speed up transfers of data between the two
Cache stores recently used memory locations in the hope they’ll be accessed again and nearby memory locations that may be accessed
Cache
4
CPU
Memory Bus
RAM
Cache
Cache
Something we are going to revisit over and over and over
Multiple levels of cache
Level 1 (L1) and Level 2 (L2) cache (and Level 3 (L3))
Split cache for pipelined / superscalar architectures
Data cache and instruction cache
Shared cache for multi-core architectures
See the importance of cache coherency
(We’ll likely take a brief diversion to see how the Cell multicore design (mostly) does away with caches)
Finally we’ll see how the approach to caches used by GPUs (pretty similar to multicore CPUs)
5
Multiple cores on the CPU
Each with own L1 caches
L2 shared cache still integrated into CPU
Memory controller integrated into CPU
PCIe (with enough lanes) is fast enough to handle video
PC Organisation (circa 2010)
6
CPU
Memory Switch
RAM
Input/Output Bus
Memory Controller
I/O Controller
PCIe
SATA
USB
Harddrive
CD drive
…
Video
Network
…
Keyboard
Mouse
Printer
L2 Cache
Core
Core
Core
…
Our Focus
The computer architecture part of this unit focuses on CPU design
Simple CPUs, through pipelining, then superscalar, (also hyperthreading), then multicore, then GPUs
Requires a look at RAM/Cache, MMUs, virtual memory, etc.
Everything “below” the memory bus isn’t of our concern
If you want to learn more about I/O buses, storage devices, etc.
Er, there’s probably an Engineering unit for this 😉
7
Control unit
Runs user programs
Has interface to memory to obtain next instructions to execute
Status Registers
Control unit both writes these and reads them
Used to make decisions for branching
Registers
Current working data and addresses
ALU
The “work horse”
Responsible for all calculations
ALU and Registers may both access main store directly
Not all architectures allow ALU to read/write from main store
CPU Major Internal Components
8
Central Processing Unit
Registers
Arithmetic Logic Unit
Control Unit
Status Registers
Memory interface
Treats numbers in memory as instructions
Executes these instructions, performing what ever actions they specify
CPU runs instructions in sequence (1, 2, 3, 4, …) unless explicitly told otherwise
Some early architectures didn’t do this
CPU must have sufficiently varying instructions for programs to be constructed
Normal instruction execution is like following a recipe:
Get flour
Get eggs
Break eggs into bowl
Sift flour into bowl
Mix for one minute
If lumpy, go back to step 5
Put in oven
Watch TV
If burnt, go back to step 1
Eat
Basic CPU Function
9
Arithmetic
Add, subtract, multiply, etc.
Boolean logic
And, or, xor, left-shift, right-shift, rotate, etc.
Load / store
Read or write values from memory
Comparison / testing
Check for properties of values
Is a number zero? Is A > B? etc.
Branching
Go to instruction X
if A is zero, goto X
The recipe that the CPU follows is more like:
Put item 76 from pantry in bowl A
Put item 5 from fridge in bowl B
Break contents of B
Sift contents of A into B
Mix B for one minute
If lumpy, go back to step 5
Put contents of bowl B in oven
Watch TV
Put contents of oven in bowl B
If burnt, go back to step 1
Eat contents of B
Some CPU Instruction Types
10
The CPU actually has a bunch of little steps to execute one instruction
Any memory accesses required in an instruction execution (may) need an address calculation
The execute cycle has two distinct stages
Operand fetching: all values (operands) needed to perform an instruction need to be obtained
Operand store: all results of the instruction need to be stored somewhere
Not all instructions need or produce operands
Fetch / Execute Cycle
11
12
Other CPU facilities
Modern CPUs have many facilities beyond this basic operation
All these discussed next week
Speed optimisations
Cache
Specialised ALUs
Memory management unit (MMU)
For program security and virtual memory
Interrupts and interrupt handling
Some events need to be taken care of as soon as possible
Skipping how cache is organised for now
Forever really
Quick look at how RAM is organised
So we can see some of what makes it slow
RAM
13
CPU
Memory Switch
RAM
Input/Output Bus
Memory Controller
I/O Controller
PCIe
SATA
USB
Harddrive
CD drive
…
Video
Network
…
Keyboard
Mouse
Printer
L2 Cache
Core
Core
Core
…
3 Address Lines
Therefore 23 = 8 locations
2 bits per location
Say we wanted to access memory location 5 (1012)
Example RAM Organisation
14
Selector
Multiplexor
3 Address Lines
23 = 8 Selection Lines
23 = 8 Memory Locations (0..7)
Each storing 2 bits
23 * 2 = 16 Sense Lines
2 Output Lines
1 0 1
Memory locations arranged in a grid
Row address and column address specify location
Each address part generally passed into chip individually
Reduce external pins
Reduce cost
Entire row selected first
Then column selected for read/write
Multiple reads from same row faster than random reads
RAM Chip Organisation
15
Output Lines (m)
Memory Locations
(2n * 2n * m)
Column Selector
Row Selector
…
… Selection Lines (2n) …
Address Lines (n)
…
… Selection Lines (2n) …
Address Lines (n)
Memory chips are usually grouped
Typically 8 or 16 chips per memory stick
Each chip stores a smaller number of bits per cell
Typically 4 or 8 bits per cell
Simultaneous access to ALL chips
Results of cells combined to produce output
Output usually one memory word, but perhaps many
Cost reduction
Simpler design
Chips have wider applicability
Grouping Memory Chips
16
Address
RAMChip
RAMChip
RAMChip
RAMChip
Combiner
Data
Computer Basics (Of Limited Value):
computer.howstuffworks.com/channel.htm?ch=computer&sub=sub-hardware
computer.howstuffworks.com/microprocessor.htm
www.howstuffworks.com/pc.htm
Number Systems
Section 9.2, Appendix B
William Stallings, Computer Organization and Architecture (Sixth Edition), Prentice-Hall, ISBN 0-13-049307-4.
Use the scientific mode of the Windows accessory “calculator” it has modes for binary, octal, and hex
Binary conversion game: www.jwart.com/games/binaryfun.php
Animations: scholar.hw.ac.uk/site/computing/subindex_f1ncomp5topic1.html
www.golgotha.org.uk/useful/bases.html
Units (and binary)
computer.howstuffworks.com/bytes.htm
17
Further Reading
18
Further Reading
General storage
Section 4.1
William Stallings, Computer Organization and Architecture (Sixth Edition), Prentice-Hall,
ISBN 0-13-049307-4
RAM
Stallings Chapter 5
computer.howstuffworks.com/ram.htm
Bus
Stallings Section 3.4
CPU: (AKA chip, processor) The component that does all the calculation
RAM: (AKA memory, main store) The component that stores values
Bus: The shared connections between components
Switch: A more sophisticated connection scheme than a bus that allows many components to communicate simultaneously
Glossary
19
CPU: the most important component of a computer, the part that performs all the calculations
Instructions: commands performed by the CPU
Instruction Cycle: the set of steps taken by a CPU in order to execute (perform) a command
Operand: a value used in calculation
Control Unit: the component of the CPU that controls the timing of the actions taken in the execution of an instruction
Registers: the CPU’s short-term memory
ALU: (Arithmetic Logic Unit) the component within the CPU in which calculations actually occur
Microcode: essentially the program that is executed by the control unit in order to perform the instruction cycle
20
Glossary
21
Glossary
Bus: a medium along which components communicate
Address lines: the part of the bus on which the address to be read from (or written to) is placed
Data lines: the part of the bus on which the data read (or written) is placed
Control lines: the part of the bus used for determining the current function of the bus
Width: the number of lines used for data or addresses
Clock frequency: the number of times per second a value can be read from RAM (or transferred via the bus)
Bandwidth: the theoretical number of bits (or bytes) per second that can be transferred through the bus
/docProps/thumbnail.jpeg