程序代写代做代考 assembly algorithm AI GPU cache compiler COMP8551 Hardware design

COMP8551 Hardware design

COMP 8551
Advanced Games
Programming
Techniques

Hardware and Assembly Language

Borna Noureddin, Ph.D.
British Columbia Institute of Technology

Overview
Hardware concepts

•Definitions

•Design and architectures

2

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Concepts
CPU, GPU, GPGPU, FPGA
• All integrated circuits
• FPGA: program arrays of transistors
• CPU/GPU/GPGPU/FPU: general-purpose or
specialized microprocessors
• CPU: most general, FPU/GPU: most specific
• Traditionally: processing (physics, AI, etc.) on
CPU, rendering (geometry, shading, lighting,
etc.) on GPU

3

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Concepts
CPU, GPU, GPGPU, FPGA
• Game engines used to process some graphics
(e.g., lighting) on CPU, but now moving away
from that: very efficient, empty refrigerator is
useless
• CPU still important: concurrent tasks (firewall,
networking, I/O, etc.), physics, AI – very
efficient, full refrigerator still means bigger
refrigerator may be needed

4

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Concepts
CPU, GPU, GPGPU, FPGA
• Over past decade, graphics and gaming drove
Moore’s Law
• Demand for better GPUs increased
dramatically: GPUs became arguably more
powerful than CPUs
• GPGPUs attempt to combine both, but require
radical change in how one approaches
programming

5

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

RISC, CISC, SIMD
CISC
• Complex instruction set computer
• CPU design strategy
• Single instructions execute several low-level
operations

http://en.wikipedia.org/wiki/Complex_instruction_set_computer

RISC
• Reduced instruction set computing
• CPU design strategy
• Small set of simple instructions that can run very fast
http://en.wikipedia.org/wiki/Reduced_instruction_set_computer 6

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

RISC, CISC, SIMD
SIMD
• Multiple processing elements that perform the
same operation on multiple data simultaneously
• Examples:
• Change image brightness: each pixel (RGB) read from
memory, value added/subtracted, result written back
to memory
• Can process multiple pixels simultaneously: single
instruction to fetch multiple memory locations
• Value added to all locations referenced by instruction
at the same time (high level of parallelism) 7

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

RISC, CISC, SIMD

8

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Multiprocessor vsMulticore
• Traditionally, for true parallelism, needed multiple
processors (CPUs)
• Allows each CPU to work independently
• Data synchronization was always challenging
• Bottlenecks included bus speeds and lack of
parallel code (or good compilers that balance
loads)
• But, more CPUs always improves speed if they are
utilized at all (even if not in the most efficient
way): this is why multi-threading is so important!
• Main obstacle was cost

9

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Multiprocessor vsMulticore
• Multicore means multiple processing units on a
single chip
• In practice, lower the clock speed to save power
consumption and heat, but much lower cost,
making it now affordable for consumer-level
products to have multiple “processors”
• Turns out multicore is often more efficient (e.g.,
share data bus)
• Newer game consoles (Xbox360, PS3) are
multicore

Note: sometimes, bottleneck is not processing
power, but bus speed/architecture and/or I/O

1
0

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
Instruction set
• Instructions: mnemonics (aka machine language
or opcodes) used to identify commands to CPU
to carry out operations using transistors
• Native data types: depends on system
architecture, bus width, etc.
• Registers: on-chip memory reserved for
operands of instructions
• Addressing modes: methods of accessing
instructions and other memory (e.g., direct,
indirect, offset, etc.)

1
1

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
Instruction set (cont’d)
• Interrupts and exceptions: mechanisms
provided to allow change of regular sequential
flow of operations
• All of this is paced by the system clock and
hardware logic
• Micro-architecture: microprocessor design
techniques used to implement instruction set
(different micro-architectures can share
common instruction set).
• Micro-code: instructions broken down into sub-
operations that can be pipelined

1
2

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
Pipelining
• Set of data processing elements connected in
series, but often executed in parallel (which
requires some buffering of shared data)
• Instruction pipeline: overlapping execution of
multiple instructions with shared circuitry (e.g.,
units for decoding used for one instruction while
units for arithmetic or register fetch used for
another).

1
3

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
Microcode
• Hardware instructions or data structures used to
implement higher level machine code instructions
• Resides in specialized high-speed memory (not
necessarily cache)
• Provides layer of abstraction so instructions can be
designed independently from underlying electronics
• Related terms: microprogramming, microprogram
• Can also be used for hardware emulation or support for
legacy hardware without having to include old circuitry
• Sometimes used as synonym for firmware 1

4

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
Instruction set extensions
• MMX, SSE, SSE1-4, etc.
• With SIMD came various attempts to extend
existing instruction sets
• They added
• longer registers (e.g., 128-bit on 32-bit systems)
• instructions to perform single operation on multiple
memory locations simultaneously (vector operations)
• more complex math operations at machine code level
• DSP and thread management instructions
• geometry instructions
• complex integer arithmetic operations

1
5

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
CPU cache
• Cache: special, more expensive but much faster access
memory
• Used by CPU to reduce average time to access memory
(which can be very expensive, depending on the type of
memory and bus architecture)
• Stores copies of data from most frequently used
memory locations
• Various sophisticated heuristics and algorithms used to
optimize cache performance (trade-off between keeping
things around too long or not long enough) 1

6

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Hardware Definitions
CPU cache (cont’d)
• Instruction cache: fetching instructions stored in main
memory
• Data cache: transfer of data between main memory and
internal registers
• Translation lookasidebuffer: virtual-physical address
translation

1
7

©
B

or
na

N
ou

re
dd

in
C

O
M

P
85

51

Review

Hardware concepts

•Definitions

•Design and architectures

1
8

©
B

or
na

N

ou
re

dd
in

C
O

M
P

85
51

1
9

©
B

or
na

N

ou
re

dd
in

C
O

M
P

85
51

E N D