CS代考 CIS 371 Computer Architecture

CIS 371 Computer Architecture
Unit 1: Introduction
Slides developed by , , C.J. Taylor, & at the University of Pennsylvania
with sources that included University of Wisconsin slides

Copyright By PowCoder代写 加微信 powcoder

by , , , and .

Today’s Agenda
• Course overview and administrivia
• What is computer architecture anyway? • …and the forces that drive it

Course Overview

CIS 371/501 vs CIS 240
• In CIS 240 you learned how a processor worked, in CIS 371/501 we will tell you how to make it work well.
CIS 240 CIS 371/501

CIS 240 vs. CIS 371/501
• Focus on one toy ISA: LC4
• Focus on functionality: “just get something that works”
• Instructive, learn to crawl before you can walk
• Not representative of real machines: 240 hardware is circa 1975
• CIS 371/501
• Less emphasis on any particular ISA during lectures
• Focus on quantitative aspects: performance, cost, power, etc. • Representative of ~1980s hardware
• also modern low-power processors, e.g., inside a Fitbit

Pervasive Idea: Abstraction and Layering
• Abstraction:onlywayofdealingwithcomplexsystems • Divide world into objects, each with an…
• Interface: knobs, behaviors, knobs ® behaviors
• Implementation: “black box” (ignorance+apathy)
• Only specialists deal with implementation, rest of us with interface • Example: only mechanics know how cars work
• Layering:abstractiondisciplinemakeslifeevensimpler • Divide objects in system into layers, layer n objects…
• Implemented using interfaces of layer n – 1
• (mostly) Don’t need to know interfaces of layer n – 2
• Inertia:adarksideoflayering
• Layer interfaces become entrenched over time (“standards”) – Very difficult to change even if benefit is clear
• Opacity: hard to reason about performance across layers 6

Abstraction, Layering, and Computers
• Computers are complex, built in layers
• Several software layers: assembler, compiler, OS, applications • Instructionsetarchitecture(ISA)
• Several hardware layers: transistors, gates, CPU/Memory/IO
• 99% of users don’t know hardware layers implementation
• 90% of users don’t know implementation of any layer
• That’s okay, world still works just fine
• But sometimes it is helpful to understand what’s “under the hood”
System software
Transistors

Beyond CIS 371/501
System software
Mem CPU I/O
most CIS courses CIS 380
CIS 371/501 ESE 532 ESE 370
• CIS 380: Operating Systems
• A closer look at system level software
• CIS371/501:ComputerOrganizationandDesign • Acloserlookathardwarelayers
• ESE 370: Circuit-Level Modeling, Design, and Optimization for Digital Systems
• Diving into gate-level abstractions
• ESE 532: System-on-Chip Design
• HW+SW: design an application-specific hardware accelerator

Why Study Hardware?
• Understandwherecomputersaregoing
• Future capabilities drive the (computing) world
• Real world-impact: no computer architecture ® no computers!
• Understandhigh-leveldesignconcepts
• The best system designers understand all the levels
• Hardware, compiler, operating system, applications
• Understandcomputerperformance
• Writing well-tuned (fast) software requires knowledge of hardware
• Writebettersoftware
• The best software designers also understand hardware • Understand the underlying hardware and its limitations
• Designhardware
• Intel, AMD, IBM, ARM, Qualcomm, Apple, Oracle, NVIDIA, Samsung, …

Penn Legacy
• ENIAC: electronic numerical integrator and calculator • First operational general-purpose stored-program computer • Designed and built here by Eckert and Mauchly
• See it in Moore 100!
• Firstseminarsoncomputerdesign
• Moore School Lectures, 1946
• “Theory and Techniques for Design of Electronic Digital Computers”

Administrivia

Course Staff
• Instructor
• Sehyeok Park

Important Dates
• (see Canvas)

PhD students: WPE-1 exam
• starting this semester, CIS 501 is a “course work” WPE-1 course
• must obtain a sufficiently high course grade
• you must declare your WPE1 status with Britton by next Friday
• This year only, CIS 501 also offers the classic exam-only WPE1 option

The Verilog Labs
• “Build your own processor” (pipelined 16-bit CPU for LC4)
• Use Verilog HDL (hardware description language) • Programming language compiles to gates/wires not insns
• Implement and test on real hardware • FPGA (field-programmable gate array)
+ Instructive: learn by doing
+ Satisfying: “look, I built my own processor”

Lab 5 Demo

Lab Logistics
• hardware compiler
• Run it from biglab.seas.upenn.edu • ZedBoard FPGA boards
• Live in Towne lockers, details coming
• Logistics
• Most labs have a demo component that runs on the ZedBoard
• Development and simulation can be done before final testing on the board

Coursework (1 of 2)
• Labs – Labs 2-5 done in groups of two • Lab 1: Verilog debugging
• Lab 2: arithmetic unit
• Lab 3: single-cycle LC4 & register file
• Lab 4: pipelined LC4
• Lab 5: pipelined +superscalar LC4
• Labs are cumulative and increasingly complex
• Each lab broken down into “milestone” deadlines • Roughly one per week

Coursework (2 of 2)
• In-class midterm (see Canvas)
• Cumulative final exam (time & date set by registrar)
• Class participation
• A good way to earn some extra calories
• We will not use clickers
• See the participation section of the policies page

Course Resources
• Course web site
• Everything is at http://www.cis.upenn.edu/~cis501 or on Canvas (syllabus, lectures, homework, submission, grades, etc.)
• “Campuswire”: the (new?) Piazza
• Sign-up link on the course web site
• The way to ask questions/clarifications
• Can post to just me & TAs or anonymous to class • As a general rule, don’t email us directly
• Sign-up required!
• Textbook
• P+H, Computer Organization and Design, 4th or 5th edition
• Reese & Thornton, Intro to Logic Synthesis using Verilog HDL • Bothavailablefreeonline!Seecoursehomepageforlinks

In many ways this is a class about debugging

Debugging Rules!

• Tentative grade contributions: • Midterm: 20%
• Final: 25%
• Labs: 55%
• Historical grade distribution: median grade is B+
• No guarantee this semester will be similar, but the distribution seems reasonable

Homework and Late Days
• AssignmentsusuallydueonMondaysat11:59pm. Deadline is enforced by Canvas.
• Submit as often as you like; your last submission is what counts.
• Any assignment can be submitted up to 48 hours late, for 75% credit
• No need to give an excuse, just turn it in late
• Assignments are cumulative – you have to get things to work!

Academic Misconduct
• Cheating will not be tolerated
• General rule:
• Anything with your name on it must be YOUR OWN work • You MUST scrupulously credit all sources of help
• Example: individual work on homework assignments
• See the course policies
• Penn’s Code of Conduct
• http://www.vpul.upenn.edu/osl/acadint.html

What is Computer Architecture?

Computer Architecture
• Computerarchitecture
• Definition of ISA to facilitate implementation of software layers • The hardware/software interface
• Computermicro-architecture
• Design processor, memory, I/O to implement ISA • Efficiently implementing the interface
• CIS 371/501 is mostly about processor micro-architecture • “architecture” is also a vacuous term for “the design of things”
• software architect, network architecture, …

Application Specific Designs
• This class is about general-purpose CPUs
• Processor that can do anything, run a full OS, etc.
• E.g., Intel Atom/Core/Xeon, AMD Ryzen/EPYC, ARM M/A series
• In contrast to application-specific chips • Or ASICs (Application specific integrated circuits)
• Also application-domain specific processors
• Implement critical domain-specific functionality in hardware
• Examples: video encoding, 3D graphics, machine learning • General rules
– Hardware is less flexible than software
+ Hardware more effective (speed, power, cost) than software + Domain specific more “parallel” than general purpose
• But mainstream processors are quite parallel as well

Technology Trends

“Technology”
• Basic element
• Solid-state transistor (i.e., electrical switch) • Building block of integrated circuits (ICs)
gate source
• What’s so great about ICs? Everything
+ High performance, high reliability, low cost, low power + Lever of mass production
• Several kinds of integrated circuit families
• SRAM/logic:optimizedforspeed(usedforprocessors)
• DRAM:optimizedfordensity,cost,power(usedformemory) • Flash:optimizedfordensity,cost(usedforstorage)
• Increasing opportunities for integrating multiple technologies
• Non-transistor storage and inter-connection technologies • Magnetic disks, optical storage, ethernet, fiber optics, wireless

Moore’s Law – 1965
233 transistors

Moore’s Law today
Intel Samsung TSMC
2010 2012 2014
data c/o WikiChip
Million transistors per mm^2

Moore’s Law today
Intel Samsung TSMC
14nm 14nm 16nm
7nm 14nm++
10nm 10nm 10nm
2010 2012 2014
data c/o WikiChip
Million transistors per mm^2

Revolution I: The Microprocessor
• Microprocessorrevolution
• One significant technology threshold was crossed in 1970s
• Enough transistors (~25K) to put a 16-bit processor on one chip • Huge performance advantages: fewer slow chip-crossings
• Even bigger cost advantages: one “stamped-out” component
• Microprocessors have allowed new market segments
• Desktops, CD/DVD players, laptops, game consoles, set-top boxes, mobile phones, digital camera, mp3 players, GPS, automotive
• And replaced incumbents in existing segments
• Microprocessor-based system replaced supercomputers, “mainframes”, “minicomputers”, “desktops”, etc.

First Microprocessor
• Intel 4004 (1971)
• Application: calculators • Technology: 10,000 nm
• 2300 transistors • 13mm2
• 12 Volts
• 4-bit data
• Single-cycle datapath

Revolution II: Implicit Parallelism
• Then to extract implicit instruction-level parallelism • Hardware provides parallel resources, figures out how to use them • Software is oblivious
• Initially using pipelining …
• Which also enabled increased clock frequency
• …caches…
• Which became necessary as processor clock frequency increased
• … and integrated floating-point
• Then deeper pipelines and branch speculation
• Then multiple instructions per cycle (superscalar) • Then dynamic scheduling (out-of-order execution)
• We will talk about these things

Pinnacle of Single-Core Microprocessors
• Intel Pentium4 (2003)
• Application: desktop/server • Technology: 90nm
• 55M transistors • 101 mm2
• 1.2 Volts
• 32/64-bit data (16x)
• 22-stage pipelined datapath
• 3 instructions per cycle (superscalar)
• Two levels of on-chip cache
• data-parallel vector (SIMD) instructions, hyperthreading

Revolution III: Explicit Parallelism
• Then to support explicit data & thread level parallelism • Hardware provides parallel resources, software specifies usage
• Why? diminishing returns on instruction-level-parallelism
• First using (subword) vector instructions…, Intel’s SSE • One instruction does four parallel multiplies
• … and general support for multi-threaded programs • Coherent caches, hardware synchronization primitives
• Then using support for multiple concurrent threads on chip • First with single-core multi-threading, now with multi-core
• Graphics processing units (GPUs) are highly parallel

Modern Multicore Processor
• AMD EPYC 7H12
• Application: server • Technology: 7nm
• 39.5B transistors
• 1008 mm2
• 2.6to3.3Ghz
• 256-bit data (2x)
• 19-stage pipelined datapath • 4 instructions per cycle
• 292MB of on-chip cache
• data-parallel vector (SIMD) instructions, hyperthreading • 64-coremulticore
CIS 501: Computer Architecture | Prof. | Introduction 38

Historical Microprocessor Evolution
Intel 4004
Intel Pentium 4 MD EPYC Rome
release date
transistor size
7 nm, 14 nm
transistor count
2.6-3.3 GHz
data width
pipeline stages
pipeline width
core count
on-chip cache

Revolution IV: Accelerators
• Combining multiple kinds of compute engines in one die • not just homogenous collection of cores
• System-on-Chip (SoC) is one common example in mobile space
• Lots of stuff on the chip beyond just CPUs • Graphics Processing Units (GPUs)
• throughput-oriented specialized multicore processors
• good for gaming, machine learning, computer vision, … • Special-purpose logic
• media codecs, radios, encryption, compression, machine learning
• Excellent energy efficiency and performance • extremely complicated to program!
CIS 501: Computer Architecture | Prof. | Introduction 40

c/o Qualcomm

Our Zedboard SoC

Cerebras Wafer-Scale Engine
• giant 8.5” square chip!
• full of deep learning accelerators
• 18GB on-chip memory
• 9 PB/sec on-chip memory bandwidth
• TSMC 16nm transistors

Technology Disruptions
• Classic examples: • transistor
• microprocessor
• More recent examples:
• flash-based solid-state storage
• shift to accelerators
• Nascent disruptive technologies:
• non-volatile memory (“disks” as fast as DRAM) • Chip stacking (also called “3D die stacking”)
• The end of Moore’s Law
• “If something can’t go on forever, it must stop eventually”
• Transistor speed/energy efficiency not improving like before

“Golden Age of Computer Architecture”
• Hennessy & Patterson, 2018 Turing Laureates
• the end of Dennard scaling & Moore’s Law means no more free performance
• “The next decade will see a Cambrian explosion of novel computer architectures”

• Parallelism
• enhance system performance by doing multiple things at once • instruction-level parallelism, multicore, GPUs, accelerators
• exploiting locality of reference: storage hierarchies
• Try to provide the illusion of a single large, fast memory

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com