CS计算机代考程序代写 mips assembly Java flex compiler 2020 Winter term 2

2020 Winter term 2
Unit 1b – Static Scalars and Arrays
1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder

▪ Reading
▪ Companion: 1, 2-1-2.3, 2.4.1-2.4.3
▪ Textbook: 3.1-3.2.1
▪ Reference (textbook, as needed): 3.1-3.5, 3.8, 3.9.3
▪ Learning Objectives
▪ list the basic components of a simple computer and describe their function
▪ describe ALU functionality in terms of its inputs and outputs
▪ describe the exchange of data between ALU, registers, and memory
▪ identify and describe the basic components of a machine instruction
▪ outline the steps a RISC machine performs to do arithmetic on numbers stored in memory ▪ outline the steps a simple CPU performs when it processes an instruction
▪ translate between array-element offsets and indices
▪ distinguish static and dynamic computation for access to global scalars and arrays in C
▪ translate between C and assembly language for access to global scalars and arrays
▪ translate between C and assembly for code that performs simple arithmetic
▪ explain the difference between arrays in C and Java
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 2

▪Develop a model of computation
▪that is rooted in what the machine actually does
▪by examining C, bit-by-bit (comparing to Java as we go)
▪The processor
▪we will design (and you will implement) a simple instruction set ▪based on what we need to compute C programs
▪similar to a real instruction set (MIPS)
▪The language
▪we will act as compiler to translate C into machine language
▪bit by bit, then putting the bits together to do interesting things ▪edit, debug and run using simulated processor to visualize execution
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 3

▪Java:
public class Foo {
static int a;
static int[] b;
// array not static, skip for now public void foo () {
a = 0;
b[a] = a;
} }
▪C:
int a;
int b[10]; void foo () {
a = 0;
b[a] = a; }
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
4

▪CPUs execute instructions, not C or Java code
▪Execution proceeds in three stages
▪Fetch: load the next instruction from memory
▪Decode: figure out from instruction what needs to be done ▪Execute: do what the instruction asks
▪These stages are looped over and over again forever
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 5

▪CPU instructions are very simple ▪Read a value from memory
▪Write a value to memory ▪Add/subtract/AND/OR two numbers ▪Shift a number
▪Control flow (we’ll talk about these in unit 1d)
▪Operations are carried out by an ALU (Arithmetic & Logic Unit)
▪ALU is used in the Execute stage
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 6

▪Human creation: design program and describe it in high- level language
▪Compilation: convert high-level human description into machine-executable text
▪Execution: a physical machine executes the code text
Human
Creation Compilation
Execution
Idea
Source Code
Object Dynamic Code State
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
7

▪Execution
▪Parameterized by input values unknown at compilation ▪Producing output values that are unknowable at compilation
▪Anything the compiler can compute is called static ▪Anything that can only be discovered when executing is
called dynamic Human
Creation
Compilation
Execution
Idea
Source Code
Object Dynamic Code State
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
8

Consider the code below:
int a; a = 5;
The value assigned to variable a above is: A. static
B. dynamic
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 9

Assume that a variable of type int is assigned a value that is read from the user through an input dialog. This value is:
A. static
B. dynamic
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 10

▪Implements a set of instructions
▪Each instruction is implemented using logic gates
▪Built from transistors: fundamental mechanism of computation
▪The fewer and simpler instructions the better (RISC philosophy)
▪Opposing view (CISC): more complex instructions allow for shorter assembly code
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 11

▪Say we propose an instruction that does: 𝐴 ← 𝐵 + 𝐶 ▪A, B and C are stored in memory
▪Instruction parameters: addresses of A, B, C ▪Each address is 32-bits (modern computers: 64-bits)
▪Instruction is encoded as set of bits (stored in memory): ▪Operation name (e.g., add)
▪Addresses for A, B, C
01
00001024
00001028
0000102c
Operation
ABC
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
12

▪Accessing memory is slow
▪~100 cycles for every memory access
▪fast programs avoid accessing memory when possible
▪Big instructions are costly
▪Memory addresses are big (so instructions are big)
▪Big instructions lead to big programs
▪Reading instructions from memory is slow (less caching options) ▪Large instructions use more CPU resources (transfer, storage)
01
00001024
00001028
0000102c
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 13

▪Register file
▪Small, fast memory stored in CPU itself ▪Roughly single cycle access
▪Registers
▪Each register named by a number (e.g., 0-7) ▪Size: architecture’s common integer (32 or 64 bits)
01
00001024
00001028
0000102c
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 14
Regs

▪Memory instructions handle only memory ▪Load data from memory into register (slow) ▪Store data from register into memory (slow)
▪Other instructions access data in registers ▪Small and fast
▪To improve instruction size: share register for destination and one source
01
00001024
00001028
0000102c
01
0
1
2
01
0
1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 15
Regs

▪A special-purpose register can only be used for certain purposes
▪May not be accessible by all instructions (e.g., can’t be used as argument for an add instruction)
▪May have special meaning or be treated specially by CPU
▪Examples:
▪PC (Program Counter): address of the next instruction to execute
▪IR (Instruction Register): instruction that has just been loaded from memory
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 16

▪ISA is formal interface to a processor implementation ▪defines the instructions the processor implements
▪defines the format of each instruction
▪Types of instructions:
▪math and logic
▪memory access
▪control transfer: “gotos” and conditional “gotos”
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 17

▪Design alternatives:
▪CISC: simplify compiler design (e.g., Intel x86, x86-64) ▪RISC: simplify processor implementation
▪Instruction format
▪sequence of bits: opcode and operand values ▪all represented as numbers
▪Assembly language:
▪Symbolic (textual) representation of machine code
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 18

▪RTL: simple, convenient pseudo language to describe semantics
▪easy to read/write, describes machine steps
▪Syntax:
▪each line is of the form:LHS ← RHS
▪LHS is memory or register that receives a value
▪RHS is constant, memory, register or expression on two registers ▪m[a] is memory in address a
▪r[i] is register with number i
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 19

▪Register 0 receives 10: ▪r[0] ← 10
▪Register 1 receives memory whose address is in register 0: ▪r[1] ← m[r[0]]
▪Register 2 is incremented by value of register 1: ▪r[2] ← r[2] + r[1]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 20

▪Assume the value 7 is stored at address 0x1234.
▪What is the result of the following sequence of instructions? r[0] ← 10
r[1] ← 0x1234
r[2] ← m[r[1]]
r[2] ← r[0] + r[2] m[r[1]] ← r[2]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 21

▪Variables are named storage locations for values
▪Features:
▪ Name
▪ Type/size
▪ Scope
▪Lifetime
▪Memory location (address) ▪Value
▪Which of these are static? Which are dynamic? ▪Which are determined or can change while the program is
running?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 22

▪In Java & C
▪Every variable has a type
▪ Compiler enforces type safety: restricts what you can do with a variable
▪In Java
▪ Strong type safety: references can’t be cast to incompatible types (apparent
type)
▪ Runtime type information: every object carries its actual type in memory
▪In C
▪A type defines the size and interpretation of a bunch of bytes
▪ sizeof operator can determine the size at compile-time
▪ No runtime type information: all that matters is how code accesses bytes
▪In Assembly
▪No types! Everything is a number! Instructions decide how to interpret them.
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 23

int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
int a;
int b[10];
▪Allocation is
▪assigning a memory location to store a specified value (e.g., variable)
▪associating the variable to the address of that memory location (its name for reading and writing)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 24

int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
int a;
int b[10];
▪Static vs. dynamic computation
▪Global/static variables can exist before program starts
▪Compiler allocates static variables, giving them a constant address
▪No dynamic computation required to allocate the variables, they just exist
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 25

Assume a is a global variable in C. When is space for a allocated? In other words, when is its address determined?
A. The compiler assigns the address when it compiles the program
B. The compiler calls the memory to allocate a when it compiles the
program
C. The compiler generates code to allocate a before the program starts running
D. The program locates available space for a before it starts running
E. The program locates available space as soon as the variable is used
for the first time.
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 26

int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
a = 0;
▪Key Observation:
▪ Address of a, b[0], b[1], … are constants known to the compiler
▪Exercise: use RTL to specify instructions needed for a = 0
▪ What if a is assigned a dynamic value? What if it’s assigned another variable?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 27

What is the proper RTL instruction for:
a = 0;
A. r[0x1000] ← 0
B. m[0x1000] ← 0
C. 0x1000 ← r[0]
D. m[r[0x1000]] ← 0 E. 0x1000 ← m[0]
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
28

int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
b[a] = a;
▪Key Observation:
▪ Value of a is usually unknown at compilation time
▪Even though address of b[0], b[1]… are known statically, which one is used here is dynamic
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 29

▪Compiler doesn’t know address of b[a]
▪Unless it knows the value of a statically (which it might here, but not
in general)
▪Array access is computed from base and index
▪Address of element is base plus offset
▪Offset is index times size of each element
▪The base address (0x2000) and element size (4, as it is an integer) are static, but the index value is dynamic (a’s value can change)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 30

int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
b[a] = a;
▪Exercise: use RTL to specify the instructions for b[a] = a ▪Assume that the compiler does not know the value of a
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 31

What is the proper RTL instruction for:
b[a] = a;
A. m[0x2000+m[0x1000]] ← m[0x1000] B. m[0x2000+4*m[0x1000]] ← m[0x1000] C. m[0x2000+m[0x1000]] ← 4*m[0x1000] D. m[0x2000]+4*m[0x1000] ← m[0x1000] E. m[0x2000]+m[0x1000] ← m[0x1000]
Static Memory Layout
0x1000: value of a

0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 32

What instructions can we use for a = 0? ▪Option 1: static address and value
m[0x1000] ← 0x0
▪9 bytes (1-byte instruction code plus two 4-byte integers)
▪Option 2: static address, dynamic value r[0] ← 0x0
m[0x1000] ← r[0]
▪5 bytes for immediate value instruction
▪5 bytes for memory instruction (instruction code plus one integer)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 33

▪Option 3: dynamic address and value r[0] ← 0x0
r[1] ← 0x1000
m[r[1]] ← r[0]
▪2 bytes for memory instruction
▪More flexibility (address and value can be dynamic)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 34

What instructions can we use for b[a] = a? ▪Option 1: static base address, index and value
m[0x2000+m[0x1000]*4] ← m[0x1000]
▪ Inputs:
▪Base address (0x2000) ▪Index address (0x1000) ▪Input value address (0x1000)
▪13 bytes (instruction code plus three integers)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 35

▪Option 2: calculate address explicitly r[0] ← 0x1000
r[1] ← m[r[0]]
r[3] ← r[1]*4 (or r[3] ← r[1]<<2 ) r[2] ← 0x2000 r[2] ← r[2]+r[3] m[r[2]] ← r[1] ▪Uses same instruction as before ▪Issue: array accesses are common CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 36 ▪Option 3: dynamic base address, index and value r[0] ← 0x1000 r[1] ← m[r[0]] r[2] ← 0x2000 m[r[2]+r[1]*4] ← r[1] ▪2 bytes for memory instruction (opcode plus three registers) CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 37 01 00001024 00001028 0000102c 01 0 1 13 bytes ▪Minimize the number of memory instructions in ISA ▪At most 1 memory access per instruction ▪No other operation in the same instruction ▪Minimize the total number of instructions in ISA ▪Minimize the size of each instruction 2 bytes CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 38 Regs ▪Requirements for scalars (e.g., assign value to global variable): ▪Load constant into a register r[0] ← v (where v is a constant integer) ▪Store a value in a register into memory at a constant address m[r[y]] ← r[x] (assume r[y] stores address of value) ▪Load value in memory into a register r[x] ← m[r[y]] (assume r[y] stores address of value) CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 39 ▪Additional requirements for arrays ▪Store value in a register into memory at address in register * 4 plus base address ▪m[r[z] + r[x]*4] ← r[y] ▪Load value in memory at address in register * 4 plus base address into register ▪r[y] ← m[r[z] + r[x]*4] CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 40 ▪Generalizing and simplifying, we get: ▪r[x] ← constant ▪m[r[x]] ← r[y] ▪r[y] ← m[r[x]] ▪m[r[x] + r[y]*4] ← r[z] ▪r[z] ← m[r[x] + r[y]*4] CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 41 ▪The compiler’s semantic translation ▪It uses these instructions to compile the program snippets Name Semantics Assembly Machine load immediate r[d] ← v ld $v, rd 0d-- vvvvvvvv load base+offset r[d] ← m[r[s]] ld (rs), rd 1?sd load indexed r[d] ← m[r[s]+4*r[i]] ld (rs,ri,4), rd 2sid store base+offset m[r[d]] ← r[s] st rs, (rd) 3s?d store indexed m[r[d]+4*r[i]] ← r[s] st rs, (rd,ri,4) 4sdi CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 42 r[0] ←0 r[1] ← 0x1000 m[r[1]] ← r[0] r[2] ← m[r[1]] r[3] ← 0x2000 m[r[3]+r[2]*4] ← r[2] int a; int b[10]; void foo () { a = 0; b[a] = a; } Name Semantics Assembly Machine load immediate r[d] ← v ld $v, rd 0d-- vvvvvvvv load base+offset r[d] ← m[r[s]] ld (rs), rd 10sd load indexed r[d] ← m[r[s]+4*r[i]] ld (rs,ri,4), rd 2sid store base+offset m[r[d]] ← r[s] st rs, (rd) 3s0d store indexed m[r[d]+4*r[i]] ← r[s] st rs, (rd,ri,4) 4sdi CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 43 ld $0, r0 ld $0x1000, r1 st r0, (r1) ld (r1), r2 ld $0x2000, r3 st r2, (r3, r2, 4) r[0] ←0 r[1] ← 0x1000 m[r[1]] ← r[0] r[2] ← m[r[1]] r[3] ← 0x2000 m[r[3]+r[2]*4] ← r[2] int a; int b[10]; void foo () { a = 0; b[a] = a; } Name Semantics Assembly Machine load immediate r[d] ← v ld $v, rd 0d-- vvvvvvvv load base+offset r[d] ← m[r[s]] ld (rs), rd 10sd load indexed r[d] ← m[r[s]+4*r[i]] ld (rs,ri,4), rd 2sid store base+offset m[r[d]] ← r[s] st rs, (rd) 3s0d store indexed m[r[d]+4*r[i]] ← r[s] st rs, (rd,ri,4) 4sdi CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 44 int a; int b[10]; void foo () { a = 0; b[a] = a; } ld $0, r0 ld $0x1000, st r0, (r1) ld (r1), r2 ld $0x2000, st r2, (r3, r1 r3 r2, 4) # r0 = 0 # r1 = address of a #a = 0 # r2 = a # r3 = address of b # b[a] = a # the variable a # the variable b[0] # the variable b[1] # the variable b[9] .pos 0x1000 a: .long 0 .pos 0x2000 b_data .long 0 .long 0 ... .long 0 00-- 00000000 01-- 00001000 3001 1102 03-- 00002000 4232 CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 45 ▪Architecture ▪Register file: 8 general purpose registers, each 32-bits long ▪CPU: One cycle per instruction (fetch and execute) ▪Main memory: byte-addressed, big endian ▪Instruction format ▪2-byte or 6-byte instructions CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 46 ▪Binary format represented as (each character is a hex digit): x-01, xxsd, x0vv, x-sd vvvvvvvv ▪Where: ▪x is an opcode (instruction unique identification) ▪- means unused portion (ignored by instruction) ▪s and d are register numbers ▪vv and vvvvvvvv are immediate/constant values CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 47 ▪Set of lines containing: [label:] [instruction|directive] [# comment] ▪where: directive :: (.pos number) | (.long number) instruction :: opcode operand+ operand reg literal offset number :: $literal|reg|offset(reg)|(reg,reg,4) :: r 0..7 :: number :: number :: decimal | 0x hex CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 48 ▪Operands have 4 possible addressing modes: ▪Immediate: constant value obtained from instruction itself ▪Register: operand is register number; register stores value ▪Base+Offset: operands are register number and optional offset; register stores memory address (via offset) of value; offset o=p×4 ▪Indexed: operands are two register numbers; registers store based address and index of value Name Semantics Assembly Machine load immediate r[d] ← v ld $v, rd 0d-- vvvvvvvv load base+offset r[d] ← m[r[s]+4*p] ld o(rs), rd 1psd load indexed r[d] ← m[r[s]+4*r[i]] ld (rs,ri,4), rd 2sid store base+offset m[r[d]+4*p] ← r[s] st rs, o(rd) 3spd store indexed m[r[d]+4*r[i]] ← r[s] st rs, (rd,ri,4) 4sdi CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 49 Name Semantics Assembly Machine register move r[d] ← r[s] mov rs, rd 60sd add r[d] ← r[d] + r[s] add rs, rd 61sd and r[d] ← r[d] & r[s] and rs, rd 62sd inc r[d] ← r[d] + 1 inc rd 63-d inc address r[d] ← r[d] + 4 inca rd 64-d dec r[d] ← r[d] – 1 dec rd 65-d dec address r[d] ← r[d] – 4 deca rd 66-d not r[d] ← ~ r[d] not rd 67-d shift left r[d] ← r[d] << s shl $s, rd 7dss shift right r[d] ← r[d] >> -s
shr $s, rd
7dss
halt
Halt the machine
halt
F0–
nop
Do nothing
Nop
FF–
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 50

What instruction is represented by the bytes: 0x46 0x24?
A. ld (r4,r4,4), r6 B. st r4, (r6,r2,4) C. st r6, (r2,r4,4) D. st r6, 2(r4)
E. st r6, 8(r4)
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d– vvvvvvvv
load base+offset
r[d] ← m[r[s]+4*p]
ld o(rs), rd
1psd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]+4*p] ← r[s]
st rs, o(rd)
3spd
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
51

What instruction is represented by the bytes: 0x75 0xF8?
A. shl $5, r8
B. shl $8, r5
C. shr $5, r8
D. shr $8, r5
E. shl $0xf8, r5
Name
Semantics
Assembly
Machine
shift left
r[d] ← r[d] << s shl $s, rd 7dss shift right r[d] ← r[d] >> -s
shr $s, rd
7dss
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
52

▪PC (program counter): address of next instruction to fetch ▪Instruction: value of the current instruction
▪separated into components:
Ins Op Code Ins Op 0
Ins Op 1
Ins Op 2
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
53
Ins Op Imm
Ins Op Ext

▪Fetch stage:
▪Read instruction at PC ▪Determine size ▪Separate components ▪Update PC
▪Execute stage:
▪Read internal state and/or memory ▪Perform specified computation ▪Update internal state and/or memory
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 54

▪Internal registers:
▪instruction, insOpCode, insOp0, insOp1, insOp2, insOpImm,
insOpExt, pc
▪Read with get(), change with set(value)
▪General purpose registers: ▪Read with reg.get(regNumber) ▪Change reg.set(regNumber, value)
▪Main memory:
▪Read with mem.readInteger(address)
▪Change with mem.writeInteger(address, value)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 55

▪ Java:
public class Foo {
static int a;
static int[] b = new int[10]; public void foo () {
b[a] = a; }
}
▪C:
int a;
int* b;
void foo () {
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder }
56
b = malloc(10*sizeof(int)); b[a] = a;

▪Arrays in Java
▪ Store reference to array allocated dynamically with new statement int[] b = new int[10];
▪Arrays in C
▪Can store static arrays
int bst[10];
▪Can store pointers to other arrays
int *bptr = &bst[0]; // or int *bptr = bst;
▪ Can store pointers to arrays allocated dynamically with call to malloc library function
int *bdyn = malloc(10 * sizeof(int));
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 57

▪Java references are equivalent to C pointers ▪We will use term pointers to refer to either
▪Variable type: pointer to element type
▪Example: pointer to int (represented as int *)
▪Stored internally as an address, regardless of element type
▪Any pointer can be cast to any pointer type
▪C performs no array bounds checking
▪Out-of-bounds access manipulates memory that is not part of array ▪Performance is faster, but source of vulnerability
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 58

▪Dynamic allocation with call to malloc
▪Returns block of bytes with no associated type or initialization
▪Element type and dereferencing determines how the data is interpreted
▪Dynamic allocation will be discussed in more details in future units
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 59

▪Arrays are accessed in C the same way, but declared and allocated differently
▪Static allocation: int b[10];
▪Dynamic allocation:
int *b = malloc(10 * sizeof(int));
0x2000: value of b
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
60

▪When the program runs: ▪Static allocation:
int b[10];
▪Dynamic allocation:
int *b = malloc(10 * sizeof(int));
0x2000: b = 0x3000
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
0x3000: value of b[0] 0x3004: value of b[1] …
0x3024: value of b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
61

▪Assembly/machine code requires an additional load b[a] = a;
▪Static allocation: r[0]
← 0x1000 ← m[r[0]] ← 0x2000
▪Dynamic allocation:
r[0] ← r[1] ←
r[2] ← r[3] ← m[r[3]+r[1]*4] ← r[1]
r[1]
r[2]
m[r[2]+r[1]*4] ← r[1]
0x1000 m[r[0]] 0x2000 m[r[2]]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
62

▪Static allocation:
ld $a, r0 # r0 = address of a ld (r0), r1 # r1 = a
ld $b, r2 # r2 = address of b st r1, (r2, r1, 4) # b[a] = a
.pos 0x1000
a: .long 0 # the variable a .pos 0x2000
b: .long 0 # the variable b[0]
.long 0 # the variable b[1] …
.long 0 # the variable b[9]
▪Dynamic allocation:
ld $a, r0 ld (r0), r1 ld $b, r2
# r0 = address of a
# r1 = a
# r2 = address of b
For simulator (since it has no malloc):
ld (r2), r3 # r3 = b = &b[0]
s…t r1, (r3, r1, 4) # b[a] = a .pos 0x2000
.bp:os.l0oxn1g00b0_data # malloc result a.:po.slo0nxg3000# the variable a
.bp_odsat0ax:20.0l0ong 0 # b[0]
b: .long ??? # the variable b .long 0 # b[1]

.long 0 # b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
63

How many memory load operations are required to handle this instruction?
b[a[0]] = a[b[0]];
A. 2
B. 3
C. 4
D. 5
E. I don’t know
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 64

▪In Java and C:
▪An array is a list of items of the same type
▪Array elements are named by non-negative integers starting at 0 ▪Syntax for accessing element i of array b is b[i]
▪In Java:
▪Variable stores a pointer to the array
▪In C
▪Variable can store a pointer to the array or the array itself
▪Distinguished by variable type
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 65

▪Pointers are declared as: type *varname;
▪Pointers may be accessed as: varname[0] or *varname varname[i] or *(varname+i)
▪The address of a variable can be obtained with: type a;
type *ptr = &a;
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 66

▪Adding an integer to a pointer:
▪integer is multiplied by type size before adding
▪Example: if a is a pointer of int, then (a+i) is equivalent to 4i bytes ahead of a
▪Works with subtraction as well
▪Subtracting two pointers of the same type
▪returns the number of elements of that type between the addresses ▪equivalent to dividing address difference (in bytes) by type size ▪Example:&a[7] – &a[2] == 5
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 67

int *c;
void foo () {
c = malloc(10*sizeof(int)); c = &c[3];
*c = *&c[3];
}
Which element of the original 10-element array is modified with the instructions on the left?
A. Element with index 3
B. Element with index 6
C. Elementwithindex0
D. No element is modified
E. More than one element is modified
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
68

ld $0x2000, r0 ld (r0), r1
ld $12, r2
add r1, r2
st r2, (r0)
ld $3, r3
ld (r2,r3,4), r4 # r4 = c[3] st r4, (r2) # *c = c[3]
.pos 0x2000
c: .long 0 # or some data used in simulator to emulate malloc
# r0 = address of c
# r1 = c (malloc result) # r2 = 3*sizeof(int) #r2=c+3
#c=c+3
#r3=3
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 69

▪It’s possible to declare variables as pointers to any types, even pointers!
int **q;
▪ q is a pointer to an int pointer (int **), or a “double-pointer” to int
▪When declaring multiple variables in a line, pointers apply individually:
int *p, i, **q;
▪p is a pointer to an int (int *)
▪i is an int
▪ q is a pointer to an int pointer (int **)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 70

▪In C, Strings are implemented as arrays of characters ▪Each element is a one-byte integer (type char), representing the
character as an entry in the ASCII table
▪E.g., ‘A’ is represented as 65, ‘a’ as 97, ‘0’ as 48, ‘ ’ (space) as 32 ▪The value 0 (‘\0’) indicates the end of the string
▪So the string “CPSC 213” is represented as:
▪Special characters:
▪‘\n’: line break
▪‘\r’: carriage return (back to start of line, but don’t move to next line)
‘C’
‘P’
‘S’
‘C’
‘’
‘2’
‘1’
‘3’
‘\0’
67
80
83
67
32
50
49
51
0
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 71

▪The main function in C typically receives two arguments:
int main(int argc, char *argv[]) { …
▪argc: the number of arguments
▪argv: an array of strings, each representing an argument
▪argv[0] representes the program name itself, so argc is always >=1
▪So if you call your program:
./myprogram -x 7
▪Your program will receive:
▪argc = 3
▪argv[0] = “./myprogram”, argv[1] = “-x”, argv[2] = “7”
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 72

▪To output text to the console, use printf
▪First argument: format
▪Remaining arguments replace placeholders in format, in same order ▪%s: string
▪%d or %i: signed integer in base 10
▪%u: unsigned integer in base 10
▪%x: hexadecimal integer
▪E.g.,printf(“My grade in %s was %d.\n”, “CPSC 213”, g);
▪To read text from the console, use scanf
▪Similar arguments as printf
▪Arguments must be pointers to destination variables
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 73

▪Static scalars and arrays:
▪Compiler knows the address (memory location) of variable
▪Dynamic arrays:
▪Compiler does not know address of array directly
▪What C does that Java doesn’t
▪Static arrays
▪Array access using pointer dereferencing ▪Pointer arithmetic
▪What Java does that C doesn’t ▪Type-safe dynamic allocation ▪Automatic array-bounds checking
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 74