程序代写代做代考 Fortran compiler algorithm finance cache mips FTP assembly assembler Java database PowerPoint 演示文稿

PowerPoint 演示文稿

CO101
Principle of Computer

Organization
Lecture 01: Introduction

Liang Yanyan

澳門科技大學
Macau of University of Science and

Technology

General References
• Instructor: Dr. Liang Yanyan (梁延研)

• Email: yyliang@must.edu.mo
• Tel: 88971997
• Office: A212

• TA: Mr. Lin Chi (林馳)
• Email: linantares@gmail.com

• You are encouraged to ask questions during the lecture
or after, or stop by my office.

• Classroom
• D1: Monday@C505, Friday@C408
• D2: Wednesday@C505, Friday@C408

2

General References
• Textbook: Computer

Organization and Design: the
Hardware/Software Interface –
4th Edition, David Patterson
and John Hennessy.

• Some Resource in FTP
• Username: yyliang_stu
• Password: Ja7Hr3lW
• Link: ftp://ftp.must.edu.mo/

3

ftp://ftp.must.edu.mo/

Grading Information
• Grade determinates

• Attendance 5%
• Assignments 10%
• Quizzes 15%
• Labs 15%
• Project 15%
• Final Exam 40%

• Late submission per day is subject to 10% of penalty.
• A student must gain at least 40% of the full marks in

each part in order to pass the course.

4

Why Learn This Stuff?
• You want to call yourself a “computer scientist/engineer”.
• You want to build HW/SW system people use.
• You need to make a purchasing decision or offer “expert”

advice.

• So need to know the relationship between performance
and power.
• Both hardware and software affect performance/power. Because

• Algorithm determines number of source-level statements.
• Language/compiler/architecture determine the number of machine-

level instructions.
• Processor/memory determines how fast and how power-hungry

machine-level instructions are executed.

5

Course Contents
• Introduction to the major components of a computer system, how they

function together in executing a program.
• Introduction to CPU datapath and control unit design.
• Introduction to techniques to improve performance and energy

efficiency of computer systems.
• Introduction to multiprocessor architecture.

• This course is to learn what determines the capabilities and
performance of computer systems,

• and to understand the interactions between the computer’s architecture
and its software,

• so that
• future software designers (compiler writers, operating system designers,

database programmers, application programmers, …) can achieve the best
cost-performance trade-offs,

• and so that
• future architects understand the effects of their design choices on software.

6

What You Will Learn
• How programs are translated into the machine

language,
• and how the hardware executes them.

• What determines program performance,
• and how it can be improved.

• How hardware designers improve performance.

7

What You Should Already Know
• Electronic circuit and digital logic.
• Knowledge of structured programming

languages
• Create, compile, and run C (C++, Java) programs

8

Computer Organization
• This course is all about how computers work.
• But what do we mean by a computer?

• Different types: embedded, laptop, desktop, server.
• Different uses: automobiles, graphics, finance, genomics…
• Different manufacturers: Lenovo, Apple, IBM, HP, Sony…

• Analogy: Consider a course on “automotive vehicles”.
• Many similarities from vehicle to vehicle (e.g., wheels).
• Huge differences from vehicle to vehicle (e.g., gas vs. electric).

• Best way to learn:
• Focus on a specific instance and learn how it works,
• While learning general principles and historical perspectives.

9

A Computer

10

Are there other kind of computers?

A Computer

11

Are there other kind of computers?

A Computer

12

Are there other kind of computers?

Classes of Computers
• Desktop computers

• Designed to deliver good performance to a single
user at low cost usually executing 3rd party software,
usually incorporating a graphics display, a keyboard,
and a mouse.

• Servers
• Used to run larger programs for multiple,

simultaneous users typically accessed only via a
network and that places a greater emphasis on
dependability and (often) security.

13

Classes of Computers
• Supercomputers

• A high performance, high cost class of servers with
hundreds to thousands of processors, terabytes of
memory and petabytes of storage that are used for
high-end scientific and engineering applications

• Embedded computers (processors)
• A computer inside another device used for running

one predetermined application

14

Supercomputers
• Tianhe-2 (天河-2)

• Over 3 million cores
• Power: 17.6 MW (24 MW with cooling)
• Speed: 33.86 PFLOPS (peta = 1015)

15

Embedded Computers in You Car

16

• Personal Mobile Device (PMD) and wearable devices.

• Where else are embedded processors found?

PostPC Era

17

Growth in Cell Phone Sales (Embedded)

18

The Evolution of Computer Hardware
• When was the first transistor invented?

19

1947 – the bi-polar transistor – by
Bardeen et.al at Bell Laboratories

UNIVAC I (Universal Automatic
Computer) – the first
commercial computer in USA

The Evolution of Computer Hardware
• When was the first IC (integrated circuit) invented?

20

1958, by Jack Kilby@Texas Instruments,
by hand, several transistors, resistors
and capacitors on a single substrate.

IBM System/360, 2MHz,
128KB ~ 256KB

The Evolution of Computer Hardware
• When was the first Microprocessor?

21

1971, Intel 4004

The Chip Manufacturing Process

22

a die

AMD Opteron X2 Wafer

23

300mm wafer, 117 chips, with 90nm technology

Integrated Circuit Cost

24

2( )Wafer diameter / 2 Wafer diameter
Dies per wafer Test dies per wafer

Die Area 2 Die Area
π π× ×

= − −
×

Cost of wafer
Die Cost

Dies per wafer

No. of testing dies for
characteristics testing

No. of dies at the edge
≈ circumference/diagonal of die

Ideal case:

Integrated Circuit Cost (Die yield)

{1 }
Defects per unit area Die area

Die yield α
α

−×= +

Be referred to No. of critical processing steps in
the manufacturing process

Cost of wafer
Die Cost

Dies per wafer Die yield
=

×

1/( 1)Defects per unit area Die yield
Die area

αα −= −

Integrated Circuit Cost (Example)
• What is the approximate cost of a die in the wafer?

• An 8-inch wafer costs $1000
• Defect density is 1 per cm2

• Die area is 91 mm2

• Assume α = 2, test dies per wafer is 10

26

2
1 0.91

Die yield 1 0.47
2


× 

= + = 
 

2(8 2.54/2) 8 2.54
Dies per wafer 10

0.91 2 0.91
π π× × × ×

= − −
×

( )
1000

Die Cost
Dies per wafer 0.47

=
×

Real World Examples
• Nonlinear relation to area and defect rate

• Wafer cost and area are fixed
• Defect rate determined by manufacturing process
• Die area determined by architecture and circuit design

27

Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost
layers width cost /cm2 mm2 wafer

386DX 2 0.90 $900 1.0 43 360 71% $4
486DX2 3 0.80 $1200 1.0 81 181 54% $12
PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53
HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73
DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149
SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272
Pentium 3 0.80 $1500 1.5 296 40 9% $417

From “Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

Die cost goes up with the die area.

Impacts of Advancing Technology
• Processor

• logic capacity: increases about 30% per year
• performance: increases 2x every 1.5 years

• Memory
• DRAM capacity: increases 4x every 3 years, about 60% per

year
• memory speed: increases 1.5x every 10 years
• cost per bit: decreases about 25% per year

• Disk
• capacity: increases about 60% per year

28

Courtesy, Intel ®

Dual Core
Itanium with

1.7B transistors

feature size
&

die size

 In 1965, Intel’s Gordon Moore
predicted that the number of
transistors that can be
integrated on single chip would
double about every two years.

Moore’s Law

Moore’s Law for CPUs and DRAMs

30

From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.

Main driver: device scaling …

31

From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.

Main driver: device scaling …

32

Highest Clock Rate of Intel Processors

• In CMOS (Complementary Metal-Oxide-Semiconductor)
IC technology

FrequencyVoltageload CapacitivePower 2 ××=

×1000×30 5V → 1V

A Sea Change is at Hand
• The power challenge has forced a change in the design

of microprocessors.
• Since 2002 the rate of improvement in the response time of

programs on desktop computers has slowed from a factor of 1.5
per year to less than a factor of 1.2 per year.

• As of 2006 all desktop and server companies are
shipping microprocessors with multiple processors per
chip.

• Plan of record is to add two cores per chip per
generation (about every two years).
• Pentium 4, 2 cores, 2002-2005
• Core 2 Duo, 2-4 cores, 2006-2009
• Core i7, 4-8 cores, 2010-now
• Xeon, 1-15 cores, 1998-now

34

Intel Core i7 Processor

35

45nm technology, 18.9mm x 13.6mm, 0.73billion transistors, 2008

What is a Computer?
• Components:

• processor (datapath,
control)

• input (mouse,
keyboard)

• output (display, printer)
• memory (cache

(SRAM), main memory
(DRAM), disk drive,
CD/DVD)

36

Four Issues about Machine Organization
• Capabilities and performance characteristics of the

principal Functional Units (FUs).
• Functional Unit: a hardware component that can perform specific

operations (functions). For example, Adders, Registers, ALU,
Shifters, Logic Units.

• The ways in which these FUs are interconnected.
• e.g., buses.

• Information flows between components.
• e.g., the data flow is fetched from memory and transferred to

processor.
• Logic and means by which such information flow is

controlled.

37

Our Primary Focus
• Our primary focus: the processor (datapath and control)

and its interaction with memory systems.
• Implemented using tens/hundreds of millions of transistors.
• Impossible to understand by looking at each transistor.
• We need abstraction!

38

Processor Organization
• Control unit needs to have circuitry to

• Decide which is the next instruction and input it from memory.
• Decode the instruction.
• Issue signals that control the way information flows between

datapath components.
• Control what operations the datapath’s functional units perform.

• Datapath needs to have circuitry to
• Execute instructions – functional units (e.g., adder) and storage

locations (e.g., register file).
• Interconnect the functional units so that the instructions can be

executed as required.
• Load data from and store data to memory.

39

Below the Program

• Application software
• Written in high-level language, e.g. C, C++,
java…

• System software
• Operating system – supervising program that interfaces the

user’s program with the hardware (e.g., Linux, iOS, Windows).
• Handles basic input and output operations.
• Allocates storage and memory.
• Provides for protected sharing among multiple applications.

• Compiler – translates programs written in a high-level language
(e.g., C, Java) into instructions that the hardware can execute.

40

Why We use Higher-Level Languages?
• Higher-Level Languages

• Allow the programmer to think in a more natural language and
for their intended use (Fortran for scientific computation, Cobol
for business programming, Lisp for symbol manipulation, Java
for web programming, …).

• Improve programmer productivity – more understandable code
that is easier to debug and validate.

• Improve program maintainability.
• Allow programs to be independent of the computer on which

they are developed (compilers and assemblers can translate
high-level language programs to the binary instructions of any
machine).

• Emergence of optimizing compilers that produce very
efficient assembly code optimized for the target machine.
• As a result, very little programming is done today at the

assembler level.

41
You can become programmers programming programs that program programs!
— using AI!

Below the Program
• High-level language program (in C)

swap (int v[], int k)
(

int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;

)

• Assembly language program (for MIPS)
swap: sll $2, $5, 2

add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31

• Machine (object) code (for MIPS)
000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
. . .

42

Input Device Inputs Object Code

43

000000 00000 00101 0001000010000000
000000 00100 00010 0001000000100000
100011 00010 01111 0000000000000000
100011 00010 10000 0000000000000100
101011 00010 10000 0000000000000000
101011 00010 01111 0000000000000100
000000 11111 00000 0000000000001000

Object Code Stored in Memory

44

Processor Fetches an Instruction
• Processor fetches an instruction from memory.

45

Control Decodes the Instruction
• Control decodes the instruction to determine what to

execute.

46

Datapath Executes the Instruction
• Datapath executes the instruction as directed by control.

47

Processor Fetches the Next Instruction
• Processor fetches the next instruction from memory.

48

How does it know which location in memory to fetch from next?

Output Data Stored in Memory
• At program completion the data to be output resides in

memory.

49

Output Device Outputs Data

50

00000100010100000000000000000000
00000000010011110000000000000100
00000011111000000000000000001000

The Instruction Set Architecture (ISA)
• The instruction set is a critical interface.
• The interface description is separating the software and

hardware.

51

Instruction Set Architecture (ISA)
• ISA – the abstract interface between the hardware and

the lowest level software that includes all the information
necessary to write a machine language program,
including instructions, registers, memory access, I/O, …
• Enables implementations of varying cost and performance to run

identical software.
• The combination of the basic instruction set (the ISA)

and the operating system interface is called the
application binary interface (ABI).
• ABI – The user portion of the instruction set plus the operating

system interfaces used by application programmers. Defines a
standard for binary portability across computers.

52

How Do the Pieces Fit Together?

• Key Idea: levels of abstraction

53

I/O systemProcessor

Compiler
Operating System

(Unix;
Windows 9x)

Application (Netscape)

Digital Design
Circuit Design

Instruction Set
Architecture

Datapath & Control

transistors, IC layout

MemoryHardware

Software Assembler

CS 101

Abstractions
• Abstraction helps us deal with complexity of real systems, as

it hides unnecessary lower-level implementation details.
• Both hardware and software consist of hierarchical layers, with each

lower layer hiding details from the level above.
• One key interface between the levels of abstraction is the

instruction set architecture – the interface between the
hardware and low-level software.
• This abstract interface enables many implementations of varying

cost and performance to run identical software.
• An instruction set architecture allows computer designers to

talk about functions independently from the hardware that
performs them.
• Computer designers distinguish architecture from an

implementation of an architecture along the same lines: an
implementation is hardware that obeys the architecture abstraction.

54

CO101�Principle of Computer Organization
General References
General References
Grading Information
Why Learn This Stuff?
Course Contents
What You Will Learn
What You Should Already Know
Computer Organization
A Computer
A Computer
A Computer
Classes of Computers
Classes of Computers
Supercomputers
Embedded Computers in You Car
PostPC Era
Growth in Cell Phone Sales (Embedded)
The Evolution of Computer Hardware
The Evolution of Computer Hardware
The Evolution of Computer Hardware
The Chip Manufacturing Process
AMD Opteron X2 Wafer
Integrated Circuit Cost
Integrated Circuit Cost (Die yield)
Integrated Circuit Cost (Example)
Real World Examples
Impacts of Advancing Technology
Moore’s Law
Moore’s Law for CPUs and DRAMs
Main driver: device scaling …
Main driver: device scaling …
Highest Clock Rate of Intel Processors
A Sea Change is at Hand
Intel Core i7 Processor
What is a Computer?
Four Issues about Machine Organization
Our Primary Focus
Processor Organization
Below the Program
Why We use Higher-Level Languages?
Below the Program
Input Device Inputs Object Code
Object Code Stored in Memory
Processor Fetches an Instruction
Control Decodes the Instruction
Datapath Executes the Instruction
Processor Fetches the Next Instruction
Output Data Stored in Memory
Output Device Outputs Data
The Instruction Set Architecture (ISA)
Instruction Set Architecture (ISA)
How Do the Pieces Fit Together?
Abstractions