slides are adapted from CA course of wisc, princeton, mit, berkeley, etc.
The uses of the slides of this course are for educa/onal purposes only and should be
used only in conjunc/on with the textbook. Deriva/ves of the slides must
acknowledge the copyright no/ces of this and the originals.
1
The Computer Revolution
• Progress in computer technology
– Underpinned by Moore’s Law
• Makes novel applicacons feasible
– Computers in automobiles
– Cell phones
– Human genome project
– World Wide Web
– Search Engines
• Computers are pervasive
Classes of Computers
• Personal Mobile Device (PMD) – e.g.smartphones,tabletcomputers
– Emphasisonenergyefficiencyandreal-cme
• DesktopComputer
– Emphasisonprice-performance
• Servers
– Emphasisonavailability,scalability,throughput
• Clusters / Warehouse Scale Computers
– Usedfor“SoftwareasaService(SaaS)”
– Emphasisonavailabilityandprice-performance
• EmbeddedComputers
– Emphasis: price
The PostPC Era
The PostPC Era
• Personal Mobile Device (PMD) – BaIery operated
– Connects to the Internet
– Hundreds of dollars
– Smart phones, tablets, electronic glasses
• Cloud compucng
– Warehouse Scale Computers (WSC)
– Sogware as a Service (SaaS)
– Porcon of sogware run on a PMD and a porcon run in the Cloud
– Amazon and Google
历史上的计算机…
EDSAC, University of Cambridge, UK, 1949
历史上的计算机…
IAS Machine. Design directed by John von Neumann.
First booted in Princeton NJ in 1952
Smithsonian Institution Archives (Smithsonian Image 95-06151)
现代的计算机…
SensorNets Cameras
Audio Set-top Assistants Media boxes
Games
Servers
Routers
Supercomputers
Players Laptops Smart
phones
Robots
Automobiles
现代的计算机…
现代的计算机…
What is Computer Architecture? Applica’ons
Instruccon Set Architecture
Instr. Set Proc.
Operacng System
Compiler
Firmware
I/O system
Datapath & Control
Digital Design
Circuit Design
Layout & fab
Semiconductor Materials
What is Computer Architecture?
Application
Physics
What is Computer Architecture? Application
二者之间的巨大
差距很难通过一
步来连接
有没有一些物理直接可以转为
应用的特例?
Physics
What is Computer Architecture? Application
二者之间的巨大
差距很难通过一
步来连接
Physics
In its broadest definition,
computer architecture is the
design of the abstraction/
implementation layers that
allow us to execute
information processing
applications efficiently using manufacturing technologies
Abstractions in Modern Computing Systems
Application
Algorithm Programming Language Operating System/Virtual Machines Instruction Set Architecture Microarchitecture
Register-Transfer Level Gates
Circuits
Devices
Physics
Abstractions in Modern Computing Systems
Application
Algorithm Programming Language Operating System/Virtual Machines Instruction Set Architecture Microarchitecture
Register-Transfer Level
Gates Circuits
Devices
Physics
这些是计算机体系结构所关
注的内容。
Computer Architecture is Constantly Changing
Application
Algorithm Programming Language Operating System/Virtual Machines
Instruction Set Architecture
Microarchitecture Register-Transfer Level Gates
Circuits
Devices
Physics
应用程序需求:
• Suggest how to improve architecture
• Provide revenue to fund development
Pull v.s. Push
技术约束和推动:
• Restrict what can be done efficiently • New technologies make new arch
possible
Computer Architecture is Constantly
Changing
Application
Algorithm Programming Language Operating System/Virtual Machines
Instruction Set Architecture
Microarchitecture Register-Transfer Level Gates
Circuits
Devices
Physics
应用程序需求:
• Suggest how to improve architecture
• Provide revenue to fund development
Architecture provides feedback to guide application and technology research directions
技术约束和推动:
• Restrict what can be done efficiently • New technologies make new arch
possible
课程信息
课程教材:
参考教材,强烈推荐。
课程信息
授课教师:
考核方式:
课程信息
4次课程作业 3个课程实验 1次课堂测试
1个课程报告,10%。
Great Ideas in Computer Architectures
1. Design for Moore’s Law
2. Use abstraction to simplify design 3. Make the common case fast
4. Performance via parallelism
5. Performance via pipelining
6. Performance via prediction
7. Hierarchy of memories
8. Dependability via redundancy
Major Technology Generations
Vacuum Tubes
Bipolar CMOS nMOS
Relays Electromechanical
[from Kurzweil]
pMOS
Sequential Processor Performance
From Hennessy and Patterson 6e Image Copyright © 2019, Elsevier Inc. All rights Reserved.
Sequential Processor Performance
From Hennessy and Patterson 6e Image Copyright © 2019, Elsevier Inc. All rights Reserved.
课程内容 计算机组成
Computer Organization
• Basic Pipelined Processor
~50,000 Transistors
Photo of Berkeley RISC I, © University of California (Berkeley)
Components of a Computer
The BIG Picture
• Same components for all kinds of computer
– Desktop, server, embedded
• Input/output includes – User-interface devices
• Display, keyboard, mouse – Storage devices
• Hard disk, CD/DVD, flash – Network adapters
• For communicacng with other computers
课程内容
计算机体系结构
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
课程内容 计算机体系结构
~700,000,000 Transistors
Intel Nehalem Processor, Original Core i7, Image Credit Intel: http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg
• Instruction Level Parallelism – Superscalar
– Very Long Instruction Word (VLIW)
• Long Pipelines (Pipeline
• Advanced Memory and Caches
• Data Level Parallelism – Vector
– GPU
• Thread Level Parallelism – Multithreading
– Multiprocessor
– Multicore
– Manycore
Parallelism)
Architecture vs. Microarchitecture
“Architecture”/Instruction Set Architecture:
• Programmer visible state (Memory & Register)
• Operations (Instructions and how they work)
• Execution Semantics (interrupts)
• Input/Output
• Data Types/Sizes
Instruction Set Architecture
sogware
一个软件与硬件
之间的契约
hardware
instruccon set
• Properces of a good abstraccon
– Lasts through many generacons (portability)
– Used in many different ways (generality)
– Provides convenient funcconality to higher levels – Permits an efficient implementacon at lower levels
Architecture vs. Microarchitecture
“Architecture”/Instruction Set Architecture:
• Programmer visible state (Memory & Register)
• Operations (Instructions and how they work) • Execution Semantics (interrupts)
• Input/Output
• Data Types/Sizes Microarchitecture/Organization:
• Tradeoffs on how to implement ISA for some metric (Speed, Energy, Cost)
• Examples: Pipeline depth, number of pipelines, cache size, silicon area, peak power, execution ordering, bus widths, ALU widths
up to 1955 1955-60
软件的发展
Libraries of numerical routines
– Floating point operations
– Transcendental functions
– Matrix manipulation, equation solvers, . . .
High level Languages – Fortran 1956 Operating Systems –
– Assemblers, Loaders, Linkers, Compilers
– Accounting programs to keep track of usage and charges
up to 1955 1955-60
软件的发展
Libraries of numerical routines
– Floating point operations
– Transcendental functions
– Matrix manipulation, equation solvers, . . .
High level Languages – Fortran 1956 Operating Systems –
– Assemblers, Loaders, Linkers, Compilers
– Accounting programs to keep track of usage and charges
Machines required experienced operators
• Most users could not be expected to understand
these programs, much less write them
• Machines had to be sold with a lot of resident software
IBM的兼容性问题
By early 1960’s, IBM had 4 incompatible lines of
computers!
701 ⇒ 7094 650 ⇒ 7074 702 ⇒ 7080 1401 ⇒ 7010
Each system had its own • Instruction set
• I/O system and Secondary Storage: magnetic tapes, drums and disks
• assemblers, compilers, libraries,…
• market niche business, scientific, real time, …
IBM的兼容性问题
By early 1960’s, IBM had 4 incompatible lines of
computers!
701 ⇒ 7094 650 ⇒ 7074 702 ⇒ 7080 1401 ⇒ 7010
Each system had its own
• Instruction set
• I/O system and Secondary Storage:
magnetic tapes, drums and disks • assemblers, compilers, libraries,…
• market niche business, scientific, real time, … 这会导致什么问题?
IBM的兼容性问题
By early 1960’s, IBM had 4 incompatible lines of
computers!
701 ⇒ 7094 650 ⇒ 7074 702 ⇒ 7080 1401 ⇒ 7010
Each system had its own
• Instruction set
• I/O system and Secondary Storage:
magnetic tapes, drums and disks • assemblers, compilers, libraries,…
• market niche business, scientific, real time, … ⇒ IBM 360
IBM 360: A General-Purpose Register (GPR) Machine
• Processor State
– 16 General-Purpose 32-bit Registers
• may be used as index and base register
• Register 0 has some special properties – 4 Floating Point 64-bit Registers
– A Program Status Word (PSW)
• PC, Condition codes, Control flags
• A 32-bit machine with 24-bit addresses
– But no instruction contains a 24-bit address!
• Data Formats
– 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words
IBM 360: A General-Purpose Register (GPR) Machine
• Processor State
– 16 General-Purpose 32-bit Registers
• may be used as index and base register
• Register 0 has some special properties – 4 Floating Point 64-bit Registers
– A Program Status Word (PSW)
• PC, Condition codes, Control flags
• A 32-bit machine with 24-bit addresses
– But no instruction contains a 24-bit address! • Data Formats
– 8-bit bytes, 16-bit half-words, 32-bit words, 64-bit double-words The IBM 360 is why bytes are 8-bits long today!
IBM 360: Initial Implementations
Storage Datapath CircuitDelay Local Store Control Store
8K – 64 KB 8-bit 30nsec/level Main Store Read only 1μsec
Model 30
. . .
Model 70
256K – 512 KB 64-bit
5nsec/level Transistor Registers Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models.
IBM 360: Initial Implementations
Storage Datapath CircuitDelay Local Store Control Store
8K – 64 KB 8-bit 30nsec/level Main Store Read only 1μsec
Model 30
. . .
Model 70
256K – 512 KB 64-bit
5nsec/level Transistor Registers Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models.
Milestone: The first true ISA designed as portable hardware-software interface!
IBM 360: Initial Implementations
Storage Datapath CircuitDelay Local Store Control Store
8K – 64 KB 8-bit 30nsec/level Main Store Read only 1μsec
Model 30
. . .
Model 70
256K – 512 KB 64-bit
5nsec/level Transistor Registers Conventional circuits
IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models.
Milestone: The first true ISA designed as portable hardware-software interface!
With minor modifications it still survives today!
IBM 360: Over 50 years later…
The zSeries z14 Microprocessor
Image Credit: IBM
Courtesy of International Business Machines Corporation, © International Business Machines Corporation.
• 5.2 GHz in IBM 14nm SOI CMOS technology
• 6.1 billion transistors in 696 mm2
• 64-bit virtual addressing
– original S/360 was 24-bit, and S/370 was 31-bit extension
• 10-core design
• 6-fetch/cycle
• 10-issue/cycle out-of-order superscalar pipeline
• Out-of-order memory accesses
• Redundant datapaths
– every instruction performed in two parallel datapaths and
results compared
• 128KB L1 I-cache, 128KB L1 D-cache on-chip • 2MB private I-cache L2 per core
• 4MB private D-cache L2 per core
• On-Chip 128MB eDRAM L3 cache
• Up to 672MB eDRAM L4
Same Architecture Different Microarchitecture
AMD Phenom X4
• X86 Instruction Set
• Quad Core
• 125W
• Decode 3 Instructions/Cycle/Core
• 64KB L1 I Cache, 64KB L1 D Cache
• 512KB L2 Cache
• Out-of-order
• 2.6GHz
Intel Atom
• X86 Instruction Set
• Single Core
• 2W
• Decode 2 Instructions/Cycle/Core • 32KB L1 I Cache, 24KB L1 D Cache • 512KB L2 Cache
• In-order • 1.6GHz
Image Credit: AMD
Image Credit: Intel
Different Architecture Different Microarchitecture
AMD Phenom X4
• X86 Instruction Set
• Quad Core
• 125W
• Decode 3 Instructions/Cycle/Core
• 64KB L1 I Cache, 64KB L1 D Cache
• 512KB L2 Cache
• Out-of-order
• 2.6GHz
IBM POWER7
• Power Instruction Set
• Eight Core
• 200W
• Decode 6 Instructions/Cycle/Core • 32KB L1 I Cache, 32KB L1 D Cache • 256KB L2 Cache
• Out-of-order • 4.25GHz
Image Credit: IBM
Courtesy of International Business Machines
Corporation, © International Business Machines Corporation.
Image Credit: AMD
bonus slides
• Complex Digital ASIC Design •
Architectural Challenges
Three”Eras”of”Processor”Performance”
Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Single4Core”” Era”
Enabled$by:$
$
Voltage$Scaling$ MicroArchitecture$
Multi4Core”” Era”
Enabled$by:$
$
Desire$for$Throughput$
20$years$of$SMP$arch$ $
Constrained$by:$
Power$ Parallel$SW$availability$ Scalability$
o”
we#are# here#
Heterogeneous” Systems”Era”
Enabled$by:$
$
Abundant$data$parallelism$
Power$efficient$GPUs$ $
Currently)constrained$by:$ Programming$models$ Communication$overheads$
o”
4″ Data$Processing$in$Exascale1class$Computing$Systems$$|$$April$27,$2011$$|$$CRM$ • Massive (ca. 4X) increase in concurrency
$ Constrained$by:$
Power$ Complexity$
we#are# here#
o”
?$
we#are# here#
Time$
ECE 5950
Course Overview 18 / 35
Source: Chuck Moore, Data Processing in ExaScale-ClassComputer Systems, Salishan, April 2011 Time$ Time$
(Data1parallel#exploitation)#
– Mulccore (4 – <100)àManycores (100s – 1ks)
I/O system and Secondary Storage: magnetic tapes, drums and disks
• Heterogeneity
– System-level (accelerators) vs chip level (embedded)
• assemblers, compilers, libraries,...
• Compute power and memory speed challenges (two walls)
• market niche business, scientific, real time, ... – 500x compute power and 30x memory of 2PF HW
– Memory access 'me lags further behind
(##of#Processors)#
Single1thread$$Performance$
Throughput$$Performance$
Targeted$Application$$ Performance$
bonus slides
MOOC资源:张晨曦老师的《计算机系统结构》 课程网址:
https://coursehome.zhihuishu.com/courseHome/2038508#onlineCourse 账号: 你的学号
初始密码: 123456
计算机体系结构的黄金时代 https://www.bilibili.com/video/av46710093/?redirectFrom=h5
• I/O system and Secondary Storage: • assemblers, compilers, libraries,...
课程网盘: 链接:https://pan.baidu.com/s/1BBOW25lScILpfBHJyN0QpQ
• 提取码:s7a7