程序代写代做代考 algorithm Fortran mips C clock c++ compiler cache assembly Lecture 1 &2

Lecture 1 &2
Computer Abstractions and Technology
Chapter 1

• Syllabus
Chapter 1 — Computer Abstractions and Technology — 2

The Computer Revolution
• Progress in computer technology • UnderpinnedbyMoore’sLaw
• Makes novel applications feasible • Computersinautomobiles
• Cellphones
• Humangenomeproject
• WorldWideWeb • SearchEngines
• Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 3

Classes of Computers
• Desktop computers
• Generalpurpose,varietyofsoftware • Subjecttocost/performancetradeoff
• Server computers
• Networkbased
• Highcapacity,performance,reliability
• Rangefromsmallserverstobuildingsized
• Embedded computers
• Hiddenascomponentsofsystems
• Stringentpower/performance/costconstraints
Chapter 1 — Computer Abstractions and Technology — 4

What You Will Learn
• How programs are translated into the machine language
• Andhowthehardwareexecutesthem
• The hardware/software interface
• What determines program performance • Andhowitcanbeimproved
• How hardware designers improve performance
• What is parallel processing
Chapter 1 — Computer Abstractions and Technology — 5

Understanding Performance
• Algorithm
• Determines number of operations executed
• Programminglanguage,compiler, architecture
• Determine number of machine instructions executed per operation
• Processorandmemorysystem
• Determine how fast instructions are executed
• I/Osystem(includingOS)
• Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 6

Below Your Program
• Applicationsoftware
• Written in high-level language (HLL)
• Systemsoftware
• Compiler: translates HLL code to machine code
• Operating System:
• Handlinginput/output
• Managing memory and storage
• Scheduling tasks & sharing resources
• Hardware
• Processor, memory, I/O controllers
Chapter 1 — Computer Abstractions and Technology — 7

Levels of Program Code
• High-levellanguage
• Level of abstraction closer to problem domain
• Provides for productivity and portability
• Assemblylanguage
• Textual representation of instructions
• Hardwarerepresentation
• Binary digits (bits)
• Encoded instructions and data
Chapter 1 — Computer Abstractions and Technology — 8

The BIG Picture
• Samecomponentsfor all kinds of computer
• Desktop, server, embedded
• Input/outputincludes
• User-interface devices
• Display, keyboard, mouse
• Storage devices
• Hard disk, CD/DVD, flash
• Network adapters
• For communicating with other computers
Components of a Computer
Chapter 1 — Computer Abstractions and Technology — 9

Opening the Box
Chapter 1 — Computer Abstractions and Technology — 10

Inside the Processor (CPU)
• Datapath: performs operations on data
• Control: sequences datapath, memory, …
• Cache memory
• SmallfastSRAMmemoryforimmediateaccessto data
Chapter 1 — Computer Abstractions and Technology — 11

Inside the Processor
• AMD: 4 processor cores
Chapter 1 — Computer Abstractions and Technology — 12

A Safe Place for Data
• Volatile main memory
• Loses instructions and data when power off
• Non-volatile secondary memory
• Magnetic disk
• Flash memory
• Optical disk (CDROM, DVD)
Chapter 1 — Computer Abstractions and Technology — 13

Networks
• Communication and resource sharing
• Local area network (LAN): Ethernet • Withinabuilding
• Wide area network (WAN: the Internet
• Wireless network: WiFi, Bluetooth
Chapter 1 — Computer Abstractions and Technology — 14

Technology Trends
• Electronics technology continues to evolve
• Increased capacity and performance
• Reduced cost
DRAM capacity
Year
Technology
Relative performance/cost
1951
Vacuum tube
1
1965
Transistor
35
1975
Integrated circuit (IC)
900
1995
Very large scale IC (VLSI)
2,400,000
2005
Ultra large scale IC
6,200,000,000
Chapter 1 — Computer Abstractions and Technology — 15

Response Time and Throughput
• Responsetime
• Howlongittakestodoatask
• Throughput
• Total work done per unit time
• e.g., tasks/transactions/… per hour
• Howareresponsetimeandthroughput affected by
• Replacing the processor with a faster version? • Adding more processors?
• We’llfocusonresponsetimefornow… Chapter 1 — Computer Abstractions and Technology — 16

Relative Performance
• Define Performance = 1/Execution Time
• “X is n time faster than Y”
PerformanceX PerformanceY
= Execution timeY Execution timeX = n
n Example: time taken to run a program
n 10s on A, 15s on B
n Execution TimeB / Execution TimeA = 15s / 10s = 1.5
n So A is 1.5 times faster than B Chapter 1 — Computer Abstractions and Technology — 17

Measuring Execution Time
• Elapsed time
• Total response time, including all aspects • Processing, I/O, OS overhead, idle time
• Determinessystemperformance • CPU time
• Timespentprocessingagivenjob
• Discounts I/O time, other jobs’ shares
• DifferentprogramsareaffecteddifferentlybyCPUand system performance
Chapter 1 — Computer Abstractions and Technology — 18

CPU Clocking
• Operationofdigitalhardwaregovernedbya constant-rate clock
Clock period
Clock (cycles)
Data transfer and computation
Update state
n Clock period: duration of a clock cycle
n e.g., 250ps = 0.25ns = 250×10–12s
n Clock frequency (rate): cycles per second
n e.g., 4.0GHz = 4000MHz = 4.0×109Hz Chapter 1 — Computer Abstractions and Technology — 19

CPU Time
CPU Time = CPU Clock Cycles´Clock Cycle Time
= CPU Clock Cycles Clock Rate
• Performance improved by
• Reducingnumberofclockcycles
• Increasingclockrate
• Hardwaredesignermustoftentradeoffclockrateagainst cycle count
Chapter 1 — Computer Abstractions and Technology — 20

CPU Time Example
• Computer A: 2GHz clock, 10s CPU time
• Designing Computer B
• Aim for 6s CPU time
• Can do faster clock, but causes 1.2 × clock cycles
• How fast must Computer B clock be?
Clock RateB = Clock CyclesB = 1.2´Clock CyclesA CPU TimeB 6s
ClockCyclesA =CPUTimeA´ClockRateA =10s´2GHz = 20´109
Clock RateB = 1.2´20´109 = 24´109 = 4GHz 6s 6s
Chapter 1 — Computer Abstractions and Technology — 21

Instruction Count and CPI
Clock Cycles = Instruction Count ´ Cycles per Instruction CPU Time = Instruction Count ´CPI´Clock Cycle Time
= Instruction Count ´ CPI Clock Rate
• InstructionCountforaprogram
• Determined by program, ISA and compiler
• Averagecyclesperinstruction
• Determined by CPU hardware
• If different instructions have different CPI • Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 22

CPI Example
• ComputerA:CycleTime=250ps,CPI=2.0 • ComputerB:CycleTime=500ps,CPI=1.2 • Same ISA
• Whichisfaster,andbyhowmuch?
CPUTimeA =InstructionCount´CPIA ´CycleTimeA =I´2.0´250ps=I´500ps
CPU Time = Instruction Count ´ CPI ´ Cycle Time BBB
=I´1.2´500ps=I´600ps
CPUTimeB =I´600ps=1.2 CPUTimeA I´500ps
A is faster…
…by this much
Chapter 1 — Computer Abstractions and Technology — 23

CPI in More Detail
• If different instruction classes take different numbers of cycles
n
i=1
Clock Cycles = å(CPI ´Instruction Count )
ii
n Weighted average CPI
Clock Cycles n æ Instruction Count ö CPI= =åçCPIi´ i÷
Instruction Count i=1 è Instruction Count ø
Chapter 1 — Computer Abstractions and Technology — 24
Relative frequency

CPI Example
• Alternativecompiledcodesequencesusing instructions in classes A, B, C
Class
A
B
C
CPI for class
1
2
3
IC in sequence 1
2
1
2
IC in sequence 2
4
1
1
n Sequence1:IC=5
n Clock Cycles
= 2×1 + 1×2 + 2×3 = 10
n Avg.CPI=10/5=2.0 Chapter 1 — Computer Abstractions and Technology — 25
n Sequence2:IC=6
n Clock Cycles
= 4×1 + 1×2 + 1×3 = 9
n Avg.CPI=9/6=1.5

The BIG Picture
Performance Summary
CPU Time = Instructions ´ Clock cycles ´ Seconds Program Instruction Clock cycle
• Performance depends on
• Algorithm:affectsIC,possiblyCPI
• Programminglanguage:affectsIC,CPI
• Compiler:affectsIC,CPI
• Instructionsetarchitecture:affectsIC,CPI,Tc
Chapter 1 — Computer Abstractions and Technology — 26

Structure
COMPUTER
I/O
Main memory
System Bus
CPU
CONTROL UNIT
Sequencing Logic
Control Unit Registers and Decoders
Control Memory
CPU Registers
Internal Bus
Control Unit
ALU
Figure 1.1 A Top-Down View of a Computer
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

There are four main structural components
of the computer:
ª CPU – controls the operation of the computer and performs its data processing functions
ªMain Memory – stores data
ªI/O – moves data between the computer and its external environment
ªSystem Interconnection – some mechanism that provides for communication among CPU, main memory, and I/O
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

CPU
Major structural components:


Control Unit
• Controls the operation of the CPU and hence the computer
Arithmetic and Logic Unit (ALU)
• Performs the computer’s data processing function
• Registers
• Provide storage internal to the
CPU
• CPU Interconnection
• Some mechanism that provides for communication among the control unit, ALU, and registers
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Cache Memory
• Multiplelayersofmemorybetweenthe processor and main memory
• Issmallerandfasterthanmainmemory
• Usedtospeedupmemoryaccessbyplacing in the cache data from main memory that is likely to be used in the near future
• Agreaterperformanceimprovementmaybe obtained by using multiple levels of cache, with level 1 (L1) closest to the core and additional levels (L2, L3, etc.) progressively farther from the core
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

MOTHERBOARD
Main memory chips
I/O chips
Processor chip
PROCESSOR CHIP
Core
Core
Core
Core
L3 cache
L3 cache
Core
Core
Core
Core
CORE
Instruction logic
Arithmetic and logic unit (ALU)
Load/ store logic
L1 I-cache
L1 data cache
L2 instruction cache
L2 data cache
Figure 1.2 Simplified View of Major Elements of a Multicore Computer
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Moore’s Law
1965;; Gordon Moore
Observed number of transistors that could be put on a single chip was doubling every year

co

founder of Intel
The pace slowed to a doubling every 18 months in the 1970’s but has sustained that rate ever since
The cost of computer logic and memory circuitry has fallen at a dramatic rate
The electrical path length is shortened, increasing operating speed
Computer becomes smaller and is more convenient to use in a variety of environments
Reduction in power and cooling requirements
Fewer interchip connections
Consequences of Moore’s law:
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

194750 55
60 65
70 75
80 85 90
100 bn 10 bn
1 bn 100 m 10 m 100,000 10.000 1,000 100
10
1 95 2000 05 11
Figure 1.12 Growth in Transistor Count on Integrated Circuits (DRAM memory)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
First working transistor
Invention of integrated circuit
Moore’s law promulgated

Power Trends
• In CMOS IC technology
Power = Capacitive load´ Voltage2 ´Frequency
×30 5V → 1V ×1000 Chapter 1 — Computer Abstractions and Technology — 34
§1.5 The Power Wall

Problems with Clock Speed and Login Density
• Power
• Power density increases with density of logic and clock speed
• Dissipating heat
• RC delay
• Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them
• Delay increases as the RC product increases
• As components on the chip decrease in size, the wire interconnects become thinner, increasing resistance
• Also, the wires are closer together, increasing capacitance
• Memory latency
• Memory speeds lag processor speeds
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Reducing Power
• Suppose a new CPU has
• 85%ofcapacitiveloadofoldCPU
• 15%voltageand15%frequencyreduction
P new
P old
= C
old
´0.85´(V ´0.85)2 ´F ´0.85 = 0.854 = 0.52 old old
C ´V 2´F old old old
n The power wall
n We can’t reduce voltage further n We can’t remove more heat
n How else can we improve performance? Chapter 1 — Computer Abstractions and Technology — 36

Multiprocessors
• Multicore microprocessors
• Morethanoneprocessorperchip
• Requires explicitly parallel programming
• Comparewithinstructionlevelparallelism
• Hardware executes multiple instructions at once • Hidden from the programmer
• Hardtodo
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
Chapter 1 — Computer Abstractions and Technology — 37

Manufacturing ICs
• Yield: proportion of working dies per wafer Chapter 1 — Computer Abstractions and Technology — 38
§1.7 Real Stuff: The AMD Opteron X4

Wafer
• A thin wafer of silicon is divided into a matrix of small areas, each a few millimeters square.
• The identical circuit pattern is fabricated in each area, and the wafer is broken up into chips.
• Each chip consists of many gates and/or memory cells plus a number of input and output attachment points.
• This chip is then packaged in housing that protects it and provides pins for attachment to devices beyond the chip.
Wafer
Chip
• A number of these packages can then
be interconnected on a printed circuit
board to produce larger and more chip complex circuits.
Gate
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Packaged
Figure 1.11 Relationship Among Wafer, Chip, and Gate

Benchmark Principles
n The same set of programs can be run on different machines and the execution times compared.
n Benchmarks provide guidance to customers trying to decide which system to buy and can be useful to vendors and designers in determining how to design systems to meet benchmark goals.
n Desirable characteristics of a benchmark program:
1. It is written in a high-level language, making it portable across
different machines
2. It is representative of a particular kind of programming domain
or paradigm, such as systems programming, numerical
programming, or commercial programming
3. It can be measured easily
4. It has wide distribution
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Standard Performance Evaluation Corporation (SPEC)
n Programs used to measure performance nSupposedly typical of actual workload nDevelops benchmarks for CPU, I/O, Web, …
n Performance measurements are widely used for comparison and research purposes
n SPEC CPU2006
nElapsed time to execute a selection of programs
nNegligible I/O, so focuses on CPU performance
nNormalize relative to reference machine
nSummarize as geometric mean of performance ratios nCINT2006 (integer) and CFP2006 (floating-point)
n
n ÕExecutiontimeratio
i=1
i
Chapter 1 — Computer Abstractions and Technology — 41

Standard Performance Evaluation Corporation
• Best known SPEC benchmark suite
SPEC CPU2006
• Industry standard suite for processor intensive applications
• Appropriate for measuring performance for applications that spend most of their time doing computation rather than I/O
• Consists of 17 floating point programs written in C, C++, and Fortran and 12 integer programs written in C and C++
• Contains over 3 million lines of code
• Fifth generation of processor intensive suites from SPEC
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Amdahl’s Law
n Gene Amdahl
n Deals with the potential speedup of a program using
multiple processors compared to a single processor
n Illustrates the problems facing industry in the development of multi-core machines
nSoftware must be adapted to a highly parallel execution environment to exploit the power of parallel processing
n Can be generalized to evaluate and design technical improvement in a computer system
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.

Amdahl’s Law
nImproving an aspect of a computer and expecting a proportional improvement in overall performance
T=T+T improved improvement factor unaffected
affected
n Example: multiply accounts for 80s/100s
n How much improvement in multiply performance to
get 5× overall?
20 = 80 + 20 n
n Can’t be done! n Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 44
§1.8 Fallacies and Pitfalls


Pitfall: MIPS as a Performance Metric
MIPS: Millions of Instructions Per Second
• Doesn’taccountfor
• Differences in ISAs between computers
• Differences in complexity between instructions
MIPS = Instruction count Execution time ´106
= Instruction count = Clock rate Instructioncount´CPI´106 CPI´106
Clock rate
n CPI varies between programs on a given CPU Chapter 1 — Computer Abstractions and Technology — 45