CS计算机代考程序代写 GPU distributed system concurrency cache Parallel Memory Models

Parallel Memory Models
CMPSC 450

Taxonomy of Parallel Computing Paradigms
• SIMD – Single Instruction Multiple Data – A single instruction pipeline applied to multiple compute elements. Ex: Vector Processors, GPU Processing, MMX, SSE, AVX instruction sets.
• MIMD – Multiple Instruction Multiple Data – Multiple instruction pipelines are working on multiple data streams concurrently. Ex: Shared memory and distributed memory multi-core paradigms.
• SISD – Single Instruction Single Data – A single instruction pipeline applied to a single data set. Ex: Single core embedded processors.
• MISD – Multiple Instruction Single Data – Multiple instruction pipelines work on the same data. No good examples.
CMPSC 450

What is the difference between concurrency and parallelism?
• Concurrency: simultaneous execution of computational tasks/units/components.
• Parallelism: using multiple compute elements efficiently to solve a single task.
• Concurrency is more general concept than parallelism.
• Concurrency issues can arise in distributed systems as well.
• In this class, we will synonymously use concurrency and parallelism in the context of tightly-coupled compute systems.
CMPSC 450

Shared-memory computers
UMA – Uniform Memory Access
• Single interface to memory is shared among processors.
• Memory latency and bandwidth are the same for all processors and all memory locations
CMPSC 450

Shared-memory computers
ccNUMA – cache coherent Nonuniform Memory Access
• Memory is physically distributed but logically shared.
• Memory latency is dependent on which CPU accessed which part of memory.
CMPSC 450

ccNUMA
Cache coherence now requires overhead communications links between memory controllers.
HT – Hypertransport used by AMD.
CMPSC 450

Cache coherence example: MESI
Modified: The cache line is modified in this cache, and resides in no other cache line than this one.
Exclusive: The cache line has been read from memory but not yet modified. However, it resides in no other cache.
Shared: The cache line has been read from memory but not yet modified. There may be other copies in other caches.
Invalid: The cache line does not reflect any sensible data. Under normal circumstances, this happens if the cache line was in the shared state and another processor has requested exclusive ownership.
CMPSC 450

Cache coherence
CMPSC 450

Distributed Memory
• Each processing module has own dedicated memory.
• Parallelism is maintained through message passing network.
• Common programming paradigm. Each thread has it’s own memory.
• Hardware example: Beowulf cluster
• Communication network can be Ethernet, Inifiniband or something more complex.
CMPSC 450

Hybrid memory model
• Most common HPC architecture.
• Each node is a ccNUMA model.
• Collection of nodes resembles distributed memory model.
CMPSC 450

Source material
• Introduction to High Performance Computing for Scientists and Engineers, Chapter 4
• We will cover distributed network topologies in a few weeks.
CMPSC 450