CS计算机代考程序代写 compiler Java assembly CPSC 213 Introduction to Computer Systems

CPSC 213 Introduction to Computer Systems
Unit 1b
Static Scalars and Arrays
1
Reading for Next 3 Lectures
‣ Companion • 2.4.1-2.4.3
‣ Textbook
• Array Allocation and Access • 3.8
2

The Big Picture
‣ Build machine model of execution
• for Java and C programs
• by examining language features
• and deciding how they are implemented by the machine
‣ What is required
• design an ISA into which programs can be compiled • implement the ISA in the hardware simulator
‣ Our approach
• examine code snippets that exemplify each language feature in turn
• look at Java and C, pausing to dig deeper when C is different from Java • design and implement ISA as needed
‣ The simulator is an important tool
• machine execution is hard to visualize without it • this visualization is really our WHOLE POINT here
3
Design Plan
4

Examine Java and C Bit by Bit
‣ Reading writing and arithmetic on Variables • static base types (e.g., int, char)
• static and dynamic arrays of base types
• dynamically allocated objects and object references
• object instance variables
• procedure locals and arguments
‣ Control flow
• static intra-procedure control flow (e.g., if, for, while) • static procedure calls
• dynamic control flow and polymorphic dispatch
5
Design Tasks
‣ Design Instructions for SM213 ISA
• design instructions necessary to implement the languages
• keep hardware simple/fast by adding as few/simple instructions possible
‣ Develop Compilation Strategy
• determine how compiler will compile each language feature it sees • which instructions will it use?
• in what order?
• what can compiler compute statically?
‣ Consider Static and Dynamic Phases of Computation • the static phase of computation (compilation) happens just once
• the dynamic phase (running the program) happens many times
• thus anything the compiler computes, saves execution time later
6

The Simple Machine (SM213) ISA
‣ Architecture • Register File • CPU
• Main Memory
8, 32-bit general purpose registers
one cycle per instruction (fetch + execute) byte addressed, Big Endian integers
‣ Instruction Format
• 2 or 6 byte instructions (each character is a hexits) – x-01, xx01, x0vv or x-01 vvvvvvvv
• where
– x is opcode (unique identifier for this instruction) – – means unused
– 0 and 1 are operands
– vv vvvvvvvv are immediate / constant values
7
Machine and Assembly Syntax
‣ Machine code
•[ addr: ] x-01 [ vvvvvvvv ]
– addr: sets starting address for subsequent instructions
– x-01 hex value of instruction with opcode x and operands 0 and 1 – vvvvvvvv hex value of optional extended value part instruction
‣ Assembly code
•( [label:] [instruction | directive] [# comment] | )* -directive :: (.pos number) | (.long number)
-instruction :: opcode operand+
-operand!! -reg!! ! -literal !! – offset ! ! -number!!
::$literal|reg|offset(reg)|(reg,reg,4) ::r0..7
:: number
:: number
::decimal|0xhex
8

Register Transfer Language (RTL)
‣ Goal
• a simple, convenient pseudo language to describe instruction semantics • easy to read and write, directly translated to machine steps
‣ Syntax
• each line is of the form LHS ← RHS
• LHS is memory or register specification
• RHS is constant, memory, or arithmetic expression on two registers
‣ Register and Memory are treated as arrays • m[a] is memory location at address a
• r[i] is register number i
‣ For example •r[0] ← 10
•r[1] ← m[r[0]]
• r[2] ← r[0] + r[1]
9
Static Variables of Built-In Types
10

Static Variables, Built-In Types (S1-global-static)
‣ Java
• static data members are allocated to a class, not an object
• they can store built-in scalar types or references to arrays or objects (references later)
public class Foo {
static int a;
static int[] b; // array is not static, so skip for now
public void foo () {
a = 0; }}
‣C
• global variables and any other variable declared static
• they can be static scalars, arrays or structs or pointers (pointers later)
int a;
int b[10];
void foo () { a =0;
b[a] = a; }
Static Variable Allocation
11
int a;
int b[10];
void foo () { a =0;
b[a] = a; }
int a;
int b[10];
Static Memory Layout
0x1000: value of a
0x2000: value of b[0]
0x2004: value of b[1]

0x2020: value of b[9]
‣ Allocation is
• assigning a memory location to store variable’s value
• assigning the variable an address (its name for reading and writing)
‣ Key observation
• global/static variable’s can exist before program starts and live until after it finishes
‣ Static vs dynamic computation
• compiler allocates variables, giving them a constant address
• no dynamic computation required to allocate the variables, they just exist
12

Static Variable Access (scalars)
int a;
int b[10];
void foo () { a =0;
b[a] = a; }
a = 0; b[a] = a;
Static Memory Layout
0x1000: value of a
0x2000: value of b[0]
0x2004: value of b[1]

0x2020: value of b[9]
‣ Key Observation
• address of a, b[0], b[1], b[2], … are constants known to the compiler
‣ Use RTL to specify instructions needed for a = 0
Generalizing
* What if its a = a + 2? or a = b? or a = foo ()? * What about reading the value of a?
13
Question (scalars)
int a;
int b[10];
void foo () { a =0;
b[a] = a; }
a = 0; b[a] = a;
Static Memory Layout
0x1000: value of a
0x2000: value of b[0]
0x2004: value of b[1]

0x2020: value of b[9]
‣ When is space for a allocated (when is its address determined)? • [A] The program locates available space for a when program starts
• [B] The compiler assigns the address when it compiles the program
• [C] The compiler calls the memory to allocate a when it compiles the program
• [D] The compiler generates code to allocate a before the program starts running • [E] The program locates available space for a when the program starts running • [F] The program locates available space for a just before calling foo()
14

Static Variable Access (static arrays)
int a;
int b[10];
void foo () { a =0;
b[a] = a; }
a = 0; b[a] = a;
Static Memory Layout
0x1000: value of a
0x2000: value of b[0]
0x2004: value of b[1]

0x2020: value of b[9]
‣ Key Observation
• compiler does not know address of b[a]
– unless it can knows the value of a statically, which it could here by looking at a=0, but not in general ‣ Array access is computed from base and index
• address of element is base plus offset; offset is index times element size
• the base address (0x2000) and element size (4) are static, the index is dynamic
‣ Use RTL to specify instructions for b[a] = a, not knowing a?
15
Designing ISA for Static Variables
‣ Requirements for scalars a = 0; • load constant into register
– r[x]←v
• store value in register into memory at constant address
– m[0x1000] ← r[x]
• load value in memory at constant address into a register
– r[x] ← m[0x1000]
‣ Additional requirements for arrays b[a] = a;
• store value in register into memory at address in register*4 plus constant – m[0x2000+r[x]*4] ← r[y]
• load value in memory at address in register*4 plus constant into register – r[y] ← m[0x2000+r[x]*4]
‣ Generalizing and simplifying we get • r[x] ← constant
• m[r[x]] ← r[y] and r[y] ← m[r[x]]
• m[r[x] + r[y]*4] ← r[z] and r[z] ← m[r[x] + r[y]*4]
16

‣ The compiler’s semantic translation
• it uses these instructions to compile the program snippet
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
r[0] ←0
r[1] ← 0x1000
m[r[1]] ← r[0]
r[2] ← m[r[1]]
r[3] ← 0x2000
m[r[3]+r[2]*4] ← r[2]
‣ ISA Specification for these 5 instructions
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d– vvvvvvvv
load base+offset
r[d] ← m[r[s]]
ld ?(rs), rd
1?sd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]] ← r[s]
st rs, ?(rd)
3s?d
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
17
‣ The compiler’s assembly translation
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
r[0] ←0
r[1] ← 0x1000
m[r[1]] ← r[0]
r[2] ← m[r[1]]
r[3] ← 0x2000
m[r[3]+r[2]*4] ← r[2]
ld $0, r0
ld $0x1000, r1
st r0, (r1)
ld (r1), r2
ld $0x2000, r3
st r2, (r3,r2,4)
18

‣ If a human wrote this assembly
• list static allocations, use labels for addresses, add comments
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
ld $0, r0
ld $a_data, r1
st r0, (r1)
# r0 = 0 #r1=addressofa # a = 0
ld (r1), r2
ld $b_data, r3
st r2, (r3,r2,4) # b[a] = a
# r2 = a #r3=addressofb
.pos 0x1000
a_data:
.long 0
.pos 0x2000
b_data:
.long 0
.long 0
… .long 0
# the variable a
# the variable b[0]
# the variable b[1]
# the variable b[9]
19
Addressing Modes
‣ In these instructions
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d– vvvvvvvv
load base+offset
r[d] ← m[r[s]]
ld ?(rs), rd
1?sd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]] ← r[s]
st rs, ?(rd)
3s?d
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
‣ We have specified 4 addressing modes for operands
• immediate • register
• base+offset
• indexed
constant value stored in instruction
operand is register number, register stores value
operand in register number
register stores memory address of value
two register-number operands
store base memory address and index of value
20

Basic Arithmetic, Shifting NOP and Halt
‣ Arithmentic
Name
Semantics
Assembly
Machine
register move
r[d] ← r[s]
mov rs, rd
60sd
add
r[d] ← r[d] + r[s]
add rs, rd
61sd
and
r[d] ← r[d] & r[s]
and rs, rd
62sd
inc
r[d] ← r[d] + 1
inc rd
63-d
inc address
r[d] ← r[d] + 4
inca rd
64-d
dec
r[d] ← r[d] – 1
dec rd
65-d
dec address
r[d] ← r[d] – 4
deca rd
66-d
not
r[d] ← ~ r[d]
not rd
67-d
‣ Shifting NOP and Halt
Name
Semantics
Assembly
Machine
shift left
r[d] ← r[d] << S = s shl rd, s 71SS shift right r[d] ← r[d] << S = -s shr rd, s halt halt machine halt f0-- nop do nothing nop ff-- 21 Global Dynamic Array 22 Global Dynamic Array ‣ Java • array variable stores reference to array allocated dynamically with new statement public class Foo { static int a; static int b[] = new int[10]; void foo () { b[a]=a; }} ‣C • array variables can store static arrays or pointers to arrays allocated dynamically with call to malloc library procedure int a; int* b; malloc does not assign a type void foo () { b = (int*) malloc (10*sizeof(int)); b[a] = a; } How C Arrays are Different from Java ‣ Terminology • use the term pointer instead of reference; they mean the same thing ‣ Declaration • the type is a pointer to the type of its elements, indicated with a * ‣ Allocation • malloc allocates a block of bytes; no type; no constructor ‣ Type Safety • any pointer can be type cast to any pointer type ‣ Bounds checking • C performs no array bounds checking • out-of-bounds access manipulates memory that is not part of array • this is the major source of virus vulnerabilities in the world today # of bytes to allocate 23 Question: Can array bounds checking be perform statically? * what does this say about a tradeoff that Java and C take differently? 24 Static vs Dynamic Arrays ‣ Declared and allocated differently, but accessed the same int a; int* b; void foo () { b = (int*) malloc (10*sizeof(int)); b[a] = a; inta; int b[10]; void foo () { b[a] = a; } } ‣ Static allocation • for static arrays, the compiler allocates the array • for dynamic arrays, the compiler allocates a pointer 0x2000: value of b[0] 0x2000: value of b 0x2004: value of b[1] ... 0x2024: value of b[9] ‣ Then when the program runs • the dynamic array is allocated by a call to malloc, say at address 0x3000 • the value of variable b is set to the memory address of this array 0x2000: value of b[0] 0x2004: value of b[1] ... 0x2024: value of b[9] 0x2000: 0x3000 0x3000: value of b[0] 0x3004: value of b[1] ... 0x3024: value of b[9] ‣ Generating code to access the array • for the dynamic array, the compiler generates an additional load for b r[0] ← 0x1000 r[1] ← m[r[0]] r[2] ← 0x2000 m[r[2]+r[1]*4] ← r[1] r[0] ← 0x1000 r[1] ← m[r[0]] load a r[2] ← 0x2000 r[3] ← m[r[1]] load b m[r[3]+r[2]*4] ← r[2] b[a]=a 25 26 ‣ In assembly language Static Array Dynamic Array ld $a_data, r0 ld (r0), r1 ld $b_data, r2 st r1, (r2,r1,4) .pos 0x1000 a_data: .long 0 .pos 0x2000 b_data: .long 0 .long 0 ... .long 0 #r1=addressofa # r2 = a #r2=addressofb # b[a] = a # the variable a # the variable b[0] # the variable b[1] # the variable b[9] ld $a_data, r0 ld (r0), r1 ld $b_data, r2 ld (r2), r3 # r1 = address of a # r2 = a # r2 = address of b # r3 = b ‣ Comparing static and dynamic arrays • what is the benefit of static arrays? • what is the benefit of dynamic arrays? st r1, (r3,r1,4) # b[a] = a .pos 0x1000 a_data: .long 0 .pos 0x2000 b_data: .long 0 # the variable a # the b 27 Implementing the ISA 28 The CPU Implementation ‣ Internal state • pc • instruction - insOpCode - insOp0 - insOp1 - insOp2 - insOpImm - insOpExt ‣ Operation • fetch address of next instruction to fetch the value of the current instruction - read instruction at pc from memory, determine its size and read all of it - separate the components of the instruction into sub-registers - set pc to store address of next instruction, sequentially • execute - use insOpCode to select operation to perform - read internal state, memory, and/or register file - update memory, register file and/or pc Pointers in C 29 30 C and Java Arrays and Pointers ‣ In both languages • an array is a list of items of the same type • array elements are named by non-negative integers start with 0 • syntax for accessing element i of array b is b[i] ‣ In Java • variable a stores a pointer to the array • b[x] = 0 means m[m[b] + x * sizeof(array-element)] ← 0 ‣In C • variable a can store a pointer to the array or the array itself • b[x] = 0 means m[b + x * sizeof(array-element)] ← 0 or m[m[b] + x * sizeof(array-element)] ← 0 • dynamic arrays are just like all other pointers - stored in TYPE* - access with either a[x] or *(a+x) 31 Example ‣ The following two C programs are identical int *a; int *a; a[4] = 5; *(a+4) = 5; ‣ For array access, the compiler would generate this code r[0] ← a r[1] ← 4 r[2] ← 5 m[r[0]+4*r[1]] ← r[2] ld $a, r0 ld $4, r1 ld $5, r2 st r2, (r0,r1,4) • multiplying the index 4 by 4 (size of integer) to compute the array offset ‣ So, what does this tell you about pointer arithmetic in C? Adding X to a pointer of type Y*, adds X * sizeof(Y) to the pointer’s memory-address value. 32 Pointer Arithmetic in C ‣ Its purpose • an alternative way to access dynamic arrays to the a[i] ‣ Adding or subtracting an integer index to a pointer • results in a new pointer of the same type • value of the pointer is offset by index times size of pointer’s referent • for example - adding 3 to an int* yields a pointer value 12 larger than the original ‣ Subtracting two pointers of the same type • results in an integer • gives number of referent-type elements between the two pointers • for example -(& a[7]) - (& a[2])) == 5 == (a+7) - (a+2) ‣ other operators •& X the address of X • * X the value X points to Question (from S3-C-pointer-math.c) int *c; void foo () { // ... c = (int *) malloc (10*sizeof(int)); // ... c = &c[3]; *c = *&c[3]; // ... } ‣ What is the equivalent Java statement to • [A] c[0] = c[3]; • [B] c[3] = c[6]; • [C] there is no typesafe equivalent 33 • [D] not valid, because you can’t take the address of a static in Java 34 Looking more closely c = &c[3]; r[0] #r[0]=&c #r[1]=c # r[2] = 3 * sizeof(int) #r[2]=c+3 #c =c+3 *c = *&c[3]; 0x2000: 0x3000 0x3000: 0 0x3004: 1 0x3008: 2 0x300c: 3 0x3010: 4 0x3014: 5 0x3018: 6 0x301c: 7 0x3020: 8 0x3024: 9 0x2000: 0x300c 0x3000: 0 0x3004: 1 0x3008: 2 0x300c: 6 0x3010: 4 0x3014: 5 0x3018: 6 0x301c: 7 0x3020: 8 0x3024: 9 Before After ‣ And in assembly language r[0] ← 0x2000 r[1] ← m[r[0]] r[2] ←12 r[3] ← r[2]+r[1] m[r[0]]←r[2] r[3] ← 3 r[4] ← m[r[2]+4*r[3]] m[r[2]] ← r[4] ld $0x2000, r0 ld (r0), r1 ld $12, r2 add r1, r2 st r2, (r0) ld $3, r3 ld (r2,r3,4), r4 st r4, (r2) # r[0] = &c # r[1] = c # r[2] = 3 * sizeof(int) # r[2] = c + 3 #c =c+3 # r[3] = 3 # r[4] = c[3] # c[0] = c[3] #r0=&c #r1=c # r2 = 3*sizeof(int) #r2=c+3 #c=c+3 #r3=3 # r4 = c[3] # c[0] = c[3] ← 0x2000 r[1] ← m[r[0]] r[2] ←12 r[3] ← r[2]+r[1] m[r[0]] ← r[2] r[3] ←3 r[4] ← m[r[2]+4*r[3]] m[r[2]] ← r[4] #r[3]=3 # r[4] = c[3] # c[0] = c[3] c[0] = c[3] 35 36 Summary: Static Scalar and Array Variables ‣ Static variables • the compiler knows the address (memory location) of variable ‣ Static scalars and arrays • the compiler knows the address of the scalar value or array ‣ Dynamic arrays • the compiler does not know the address the array ‣ What C does that Java doesn’t • static arrays • arrays can be accessed using pointer dereferencing operator • arithmetic on pointers ‣ What Java does that C doesn’t • typesafe dynamic allocation • automatic array-bounds checking 37