CS152: Computer Architecture and Engineering
Computer Architecture
Unit 1: Introduction
Copyright By PowCoder代写 加微信 powcoder
Slides developed by , , C.J. Taylor, &
at the University of Pennsylvania
with sources that included University of Wisconsin slides
by , , , and .
Today’s Agenda
Course overview and administrivia
What is computer architecture anyway?
…and the forces that drive it
Course Overview
CIS 371/501 vs CIS 240
In CIS 240 you learned how a processor worked, in CIS 371/501 we will tell you how to make it work well.
CIS 371/501
CIS 240 is a hard pre-req. If you’ve taken CIS 371, don’t take this class.
CIS 240 vs. CIS 371/501
Focus on one toy ISA: LC4
Focus on functionality: “just get something that works”
Instructive, learn to crawl before you can walk
Not representative of real machines: 240 hardware is circa 1975
CIS 371/501
Less emphasis on any particular ISA during lectures
Focus on quantitative aspects: performance, cost, power, etc.
Representative of ~1980s hardware
also modern low-power processors, e.g., inside a Fitbit
Pervasive Idea: Abstraction and Layering
Abstraction: only way of dealing with complex systems
Divide world into objects, each with an…
Interface: knobs, behaviors, knobs behaviors
Implementation: “black box” (ignorance+apathy)
Only specialists deal with implementation, rest of us with interface
Example: only mechanics know how cars work
Layering: abstraction discipline makes life even simpler
Divide objects in system into layers, layer n objects…
Implemented using interfaces of layer n – 1
(mostly) Don’t need to know interfaces of layer n – 2
Inertia: a dark side of layering
Layer interfaces become entrenched over time (“standards”)
Very difficult to change even if benefit is clear
Opacity: hard to reason about performance across layers
Feb 17, 2009: digital TV conversion date, was postponed to June 12th, even after significant government subsidy
Abstraction, Layering, and Computers
Computers are complex, built in layers
Several software layers: assembler, compiler, OS, applications
Instruction set architecture (ISA)
Several hardware layers: transistors, gates, CPU/Memory/IO
99% of users don’t know hardware layers implementation
90% of users don’t know implementation of any layer
That’s okay, world still works just fine
But sometimes it is helpful to understand what’s “under the hood”
System software
Transistors
Beyond CIS 371/501
CIS 380: Operating Systems
A closer look at system level software
CIS 371/501: Computer Organization and Design
A closer look at hardware layers
ESE 370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
Diving into gate-level abstractions
ESE 532: System-on-Chip Design
HW+SW: design an application-specific hardware accelerator
System software
most CIS courses
CIS 371/501
Why Study Hardware?
Understand where computers are going
Future capabilities drive the (computing) world
Real world-impact: no computer architecture no computers!
Understand high-level design concepts
The best system designers understand all the levels
Hardware, compiler, operating system, applications
Understand computer performance
Writing well-tuned (fast) software requires knowledge of hardware
Write better software
The best software designers also understand hardware
Understand the underlying hardware and its limitations
Design hardware
Intel, AMD, IBM, ARM, Qualcomm, Apple, Oracle, NVIDIA, Samsung, …
Some of you will actually be designing chips and systems and working with things at this level
All of you will be writing code and if you know how the system works you will be able to take better advantage of things that the system does well and avoid things that the computer does poorly.
This understanding can have profound effects of code performance.
Penn Legacy
ENIAC: electronic numerical integrator and calculator
First operational general-purpose stored-program computer
Designed and built here by Eckert and Mauchly
See it in Moore 100!
First seminars on computer design
Moore School Lectures, 1946
“Theory and Techniques
for Design of Electronic
Digital Computers”
Administrivia
Course Staff
Instructor
Levine 572
Alexander Do
Aliza Gindi
Eric Giovannini
Brandon Park
Shreyas Shivakumar
Important Dates
(see Canvas)
PhD students: WPE-1 exam
starting this semester, CIS 501 is a “course work” WPE-1 course
must obtain a sufficiently high course grade
you must declare your WPE1 status with Britton by next Friday
This year only, CIS 501 also offers the classic exam-only WPE1 option
The Verilog Labs
“Build your own processor” (pipelined 16-bit CPU for LC4)
Use Verilog HDL (hardware description language)
Programming language compiles to gates/wires not insns
Implement and test on real hardware
FPGA (field-programmable gate array)
Instructive: learn by doing
Satisfying: “look, I built my own processor”
Lab 5 Demo
Lab Logistics
Xilinx Vivado hardware compiler
Run it from biglab.seas.upenn.edu
ZedBoard FPGA boards
Live in Towne lockers, details coming
Most labs have a demo component that runs on the ZedBoard
Development and simulation can be done before final testing on the board
Coursework (1 of 2)
Labs – Labs 2-5 done in groups of two
Lab 1: Verilog debugging
Lab 2: arithmetic unit
Lab 3: single-cycle LC4 & register file
Lab 4: pipelined LC4
Lab 5: pipelined +superscalar LC4
Labs are cumulative and increasingly complex
Each lab broken down into “milestone” deadlines
Roughly one per week
Coursework (2 of 2)
In-class midterm (see Canvas)
Cumulative final exam (time & date set by registrar)
Class participation
A good way to earn some extra calories
We will not use clickers
See the participation section of the policies page
Course Resources
Course web site
Everything is at http://www.cis.upenn.edu/~cis501 or on Canvas (syllabus, lectures, homework, submission, grades, etc.)
“Campuswire”: the (new?) -up link on the course web site
The way to ask questions/clarifications
Can post to just me & TAs or anonymous to class
As a general rule, don’t email us directly
Sign-up required!
P+H, Computer Organization and Design, 4th or 5th edition
Reese & Thornton, Intro to Logic Synthesis using Verilog HDL
Both available free online! See course homepage for links
In many ways this is a class about debugging
Tentative grade contributions:
Midterm: 20%
Final: 25%
Historical grade distribution: median grade is B+
No guarantee this semester will be similar, but the distribution seems reasonable
Homework and Late Days
Assignments usually due on Mondays at 11:59pm. Deadline is enforced by Canvas.
Submit as often as you like; your last submission is what counts.
Any assignment can be submitted up to 48 hours late, for 75% credit
No need to give an excuse, just turn it in late
Assignments are cumulative – you have to get things to work!
Academic Misconduct
Cheating will not be tolerated
General rule:
Anything with your name on it must be YOUR OWN work
You MUST scrupulously credit all sources of help
Example: individual work on homework assignments
See the course policies
Penn’s Code of Conduct
http://www.vpul.upenn.edu/osl/acadint.html
What is Computer Architecture?
Computer Architecture
Computer architecture
Definition of ISA to facilitate implementation of software layers
The hardware/software interface
Computer micro-architecture
Design processor, memory, I/O to implement ISA
Efficiently implementing the interface
CIS 371/501 is mostly about processor micro-architecture
“architecture” is also a vacuous term for “the design of things”
software architect, network architecture, …
Application Specific Designs
This class is about general-purpose CPUs
Processor that can do anything, run a full OS, etc.
E.g., Intel Atom/Core/Xeon, AMD Ryzen/EPYC, ARM M/A series
In contrast to application-specific chips
Or ASICs (Application specific integrated circuits)
Also application-domain specific processors
Implement critical domain-specific functionality in hardware
Examples: video encoding, 3D graphics, machine learning
General rules
Hardware is less flexible than software
Hardware more effective (speed, power, cost) than software
Domain specific more “parallel” than general purpose
But mainstream processors are quite parallel as well
Technology Trends
“Technology”
Basic element
Solid-state transistor (i.e., electrical switch)
Building block of integrated circuits (ICs)
What’s so great about ICs? Everything
High performance, high reliability, low cost, low power
Lever of mass production
Several kinds of integrated circuit families
SRAM/logic: optimized for speed (used for processors)
DRAM: optimized for density, cost, power (used for memory)
Flash: optimized for density, cost (used for storage)
Increasing opportunities for integrating multiple technologies
Non-transistor storage and inter-connection technologies
Magnetic disks, optical storage, ethernet, fiber optics, wireless
Moore’s Law – 1965
233 transistors
Moore’s Law today
data c/o WikiChip
gray line is Moore’s Law, doubling density every ~2.5 years
TSMC 7nm was used for 12 processor, in iPhone XS/XR
Moore’s Law today
data c/o WikiChip
gray line is Moore’s Law, doubling density every ~2.5 years
TSMC 7nm was used for 12 processor, in iPhone XS/XR
Revolution I: The Microprocessor
Microprocessor revolution
One significant technology threshold was crossed in 1970s
Enough transistors (~25K) to put a 16-bit processor on one chip
Huge performance advantages: fewer slow chip-crossings
Even bigger cost advantages: one “stamped-out” component
Microprocessors have allowed new market segments
Desktops, CD/DVD players, laptops, game consoles, set-top boxes, mobile phones, digital camera, mp3 players, GPS, automotive
And replaced incumbents in existing segments
Microprocessor-based system replaced supercomputers, “mainframes”, “minicomputers”, “desktops”, etc.
First Microprocessor
Intel 4004 (1971)
Application: calculators
Technology: 10,000 nm
2300 transistors
4-bit data
Single-cycle datapath
Revolution II: Implicit Parallelism
Then to extract implicit instruction-level parallelism
Hardware provides parallel resources, figures out how to use them
Software is oblivious
Initially using pipelining …
Which also enabled increased clock frequency
… caches …
Which became necessary as processor clock frequency increased
… and integrated floating-point
Then deeper pipelines and branch speculation
Then multiple instructions per cycle (superscalar)
Then dynamic scheduling (out-of-order execution)
We will talk about these things
Pinnacle of Single-Core Microprocessors
Intel Pentium4 (2003)
Application: desktop/server
Technology: 90nm
55M transistors
32/64-bit data (16x)
22-stage pipelined datapath
3 instructions per cycle (superscalar)
Two levels of on-chip cache
data-parallel vector (SIMD) instructions, hyperthreading
Revolution III: Explicit Parallelism
Then to support explicit data & thread level parallelism
Hardware provides parallel resources, software specifies usage
Why? diminishing returns on instruction-level-parallelism
First using (subword) vector instructions…, Intel’s SSE
One instruction does four parallel multiplies
… and general support for multi-threaded programs
Coherent caches, hardware synchronization primitives
Then using support for multiple concurrent threads on chip
First with single-core multi-threading, now with multi-core
Graphics processing units (GPUs) are highly parallel
CIS 501: Computer Architecture | Prof. | Introduction
Modern Multicore Processor
AMD EPYC 7H12
Application: server
Technology: 7nm
39.5B transistors
2.6 to 3.3 Ghz
256-bit data (2x)
19-stage pipelined datapath
4 instructions per cycle
292MB of on-chip cache
data-parallel vector (SIMD) instructions, hyperthreading
64-core multicore
image from https://www.servethehome.com/amd-epyc-2-rome-what-we-know-will-change-the-game/
Historical Microprocessor Evolution
Feature Intel 4004 Intel Pentium 4 MD EPYC Rome
release date 1971 2004 2019
transistor size 10,000 nm 90 nm 7 nm, 14 nm
transistor count 2,300 125M 39.5B
area 13 mm2 112 mm2 1008 mm2
frequency 740 KHz 3.8 GHz 2.6-3.3 GHz
data width 4-bit 64-bit 256-bit
pipeline stages n/a 31 19
pipeline width n/a 3 4
core count 1 1 64
on-chip cache n/a 1MB 292MB
4004: https://en.wikipedia.org/wiki/Intel_4004
Prescott: https://en.wikipedia.org/wiki/Pentium_4#Prescott, https://techreport.com/review/6213/intels-pentium-4-prescott-processor/
EPYC: https://wccftech.com/amd-2nd-gen-epyc-rome-iod-ccd-chipshots-39-billion-transistors/
CIS 501: Computer Architecture | Prof. | Introduction
Revolution IV: Accelerators
Combining multiple kinds of compute engines in one die
not just homogenous collection of cores
System-on-Chip (SoC) is one common example in mobile space
Lots of stuff on the chip beyond just CPUs
Graphics Processing Units (GPUs)
throughput-oriented specialized multicore processors
good for gaming, machine learning, computer vision, …
Special-purpose logic
media codecs, radios, encryption, compression, machine learning
Excellent energy efficiency and performance
extremely complicated to program!
c/o Qualcomm
Our Zedboard SoC
Cerebras Wafer-Scale Engine
giant 8.5” square chip!
full of deep learning accelerators
18GB on-chip memory
9 PB/sec on-chip memory bandwidth
TSMC 16nm transistors
size of a mousepad
Technology Disruptions
Classic examples:
transistor
microprocessor
More recent examples:
flash-based solid-state storage
shift to accelerators
Nascent disruptive technologies:
non-volatile memory (“disks” as fast as DRAM)
Chip stacking (also called “3D die stacking”)
The end of Moore’s Law
“If something can’t go on forever, it must stop eventually”
Transistor speed/energy efficiency not improving like before
“Golden Age of Computer Architecture”
Hennessy & Patterson, 2018 Turing Laureates
the end of Dennard scaling & Moore’s Law means no more free performance
“The next decade will see a Cambrian explosion of novel computer architectures”
Parallelism
enhance system performance by doing multiple things at once
instruction-level parallelism, multicore, GPUs, accelerators
exploiting locality of reference: storage hierarchies
Try to provide the illusion of a single large, fast memory
moores law intel tsmc samsung
2010 2012 2014 2016 2018 2020 2022
2010 2012 2014 2016 2018 2020 2022
moores law intel tsmc samsung
2010 2012 2014 2016 2018 2020 2022
2010 2012 2014 2016 2018 2020 2022
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com