2020 Winter term 2
Unit 1b – Static Scalars and Arrays
1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
▪ Reading
▪ Companion: 1, 2-1-2.3, 2.4.1-2.4.3
▪ Textbook: 3.1-3.2.1
▪ Reference (textbook, as needed): 3.1-3.5, 3.8, 3.9.3
▪ Learning Objectives
▪ list the basic components of a simple computer and describe their function
▪ describe ALU functionality in terms of its inputs and outputs
▪ describe the exchange of data between ALU, registers, and memory
▪ identify and describe the basic components of a machine instruction
▪ outline the steps a RISC machine performs to do arithmetic on numbers stored in memory ▪ outline the steps a simple CPU performs when it processes an instruction
▪ translate between array-element offsets and indices
▪ distinguish static and dynamic computation for access to global scalars and arrays in C
▪ translate between C and assembly language for access to global scalars and arrays
▪ translate between C and assembly for code that performs simple arithmetic
▪ explain the difference between arrays in C and Java
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 2
▪Develop a model of computation
▪that is rooted in what the machine actually does
▪by examining C, bit-by-bit (comparing to Java as we go)
▪The processor
▪we will design (and you will implement) a simple instruction set ▪based on what we need to compute C programs
▪similar to a real instruction set (MIPS)
▪The language
▪we will act as compiler to translate C into machine language
▪bit by bit, then putting the bits together to do interesting things ▪edit, debug and run using simulated processor to visualize execution
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 3
▪Java:
public class Foo {
static int a;
static int[] b;
// array not static, skip for now public void foo () {
a = 0;
b[a] = a;
} }
▪C:
int a;
int b[10]; void foo () {
a = 0;
b[a] = a; }
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
4
▪CPUs execute instructions, not C or Java code
▪Execution proceeds in three stages
▪Fetch: load the next instruction from memory
▪Decode: figure out from instruction what needs to be done ▪Execute: do what the instruction asks
▪These stages are looped over and over again forever
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 5
▪CPU instructions are very simple ▪Read a value from memory
▪Write a value to memory ▪Add/subtract/AND/OR two numbers ▪Shift a number
▪Control flow (we’ll talk about these in unit 1d)
▪Operations are carried out by an ALU (Arithmetic & Logic Unit)
▪ALU is used in the Execute stage
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 6
▪Human creation: design program and describe it in high- level language
▪Compilation: convert high-level human description into machine-executable text
▪Execution: a physical machine executes the code text
Human
Creation Compilation
Execution
Idea
Source Code
Object Dynamic Code State
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
7
▪Execution
▪Parameterized by input values unknown at compilation ▪Producing output values that are unknowable at compilation
▪Anything the compiler can compute is called static ▪Anything that can only be discovered when executing is
called dynamic Human
Creation
Compilation
Execution
Idea
Source Code
Object Dynamic Code State
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
8
Consider the code below:
int a; a = 5;
The value assigned to variable a above is: A. static
B. dynamic
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 9
Assume that a variable of type int is assigned a value that is read from the user through an input dialog. This value is:
A. static
B. dynamic
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 10
▪Implements a set of instructions
▪Each instruction is implemented using logic gates
▪Built from transistors: fundamental mechanism of computation
▪The fewer and simpler instructions the better (RISC philosophy)
▪Opposing view (CISC): more complex instructions allow for shorter assembly code
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 11
▪Say we propose an instruction that does: 𝐴 ← 𝐵 + 𝐶 ▪A, B and C are stored in memory
▪Instruction parameters: addresses of A, B, C ▪Each address is 32-bits (modern computers: 64-bits)
▪Instruction is encoded as set of bits (stored in memory): ▪Operation name (e.g., add)
▪Addresses for A, B, C
01
00001024
00001028
0000102c
Operation
ABC
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
12
▪Accessing memory is slow
▪~100 cycles for every memory access
▪fast programs avoid accessing memory when possible
▪Big instructions are costly
▪Memory addresses are big (so instructions are big)
▪Big instructions lead to big programs
▪Reading instructions from memory is slow (less caching options) ▪Large instructions use more CPU resources (transfer, storage)
01
00001024
00001028
0000102c
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 13
▪Register file
▪Small, fast memory stored in CPU itself ▪Roughly single cycle access
▪Registers
▪Each register named by a number (e.g., 0-7) ▪Size: architecture’s common integer (32 or 64 bits)
01
00001024
00001028
0000102c
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 14
Regs
▪Memory instructions handle only memory ▪Load data from memory into register (slow) ▪Store data from register into memory (slow)
▪Other instructions access data in registers ▪Small and fast
▪To improve instruction size: share register for destination and one source
01
00001024
00001028
0000102c
01
0
1
2
01
0
1
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 15
Regs
▪A special-purpose register can only be used for certain purposes
▪May not be accessible by all instructions (e.g., can’t be used as argument for an add instruction)
▪May have special meaning or be treated specially by CPU
▪Examples:
▪PC (Program Counter): address of the next instruction to execute
▪IR (Instruction Register): instruction that has just been loaded from memory
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 16
▪ISA is formal interface to a processor implementation ▪defines the instructions the processor implements
▪defines the format of each instruction
▪Types of instructions:
▪math and logic
▪memory access
▪control transfer: “gotos” and conditional “gotos”
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 17
▪Design alternatives:
▪CISC: simplify compiler design (e.g., Intel x86, x86-64) ▪RISC: simplify processor implementation
▪Instruction format
▪sequence of bits: opcode and operand values ▪all represented as numbers
▪Assembly language:
▪Symbolic (textual) representation of machine code
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 18
▪RTL: simple, convenient pseudo language to describe semantics
▪easy to read/write, describes machine steps
▪Syntax:
▪each line is of the form:LHS ← RHS
▪LHS is memory or register that receives a value
▪RHS is constant, memory, register or expression on two registers ▪m[a] is memory in address a
▪r[i] is register with number i
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 19
▪Register 0 receives 10: ▪r[0] ← 10
▪Register 1 receives memory whose address is in register 0: ▪r[1] ← m[r[0]]
▪Register 2 is incremented by value of register 1: ▪r[2] ← r[2] + r[1]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 20
▪Assume the value 7 is stored at address 0x1234.
▪What is the result of the following sequence of instructions? r[0] ← 10
r[1] ← 0x1234
r[2] ← m[r[1]]
r[2] ← r[0] + r[2] m[r[1]] ← r[2]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 21
▪Variables are named storage locations for values
▪Features:
▪ Name
▪ Type/size
▪ Scope
▪Lifetime
▪Memory location (address) ▪Value
▪Which of these are static? Which are dynamic? ▪Which are determined or can change while the program is
running?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 22
▪In Java & C
▪Every variable has a type
▪ Compiler enforces type safety: restricts what you can do with a variable
▪In Java
▪ Strong type safety: references can’t be cast to incompatible types (apparent
type)
▪ Runtime type information: every object carries its actual type in memory
▪In C
▪A type defines the size and interpretation of a bunch of bytes
▪ sizeof operator can determine the size at compile-time
▪ No runtime type information: all that matters is how code accesses bytes
▪In Assembly
▪No types! Everything is a number! Instructions decide how to interpret them.
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 23
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
int a;
int b[10];
▪Allocation is
▪assigning a memory location to store a specified value (e.g., variable)
▪associating the variable to the address of that memory location (its name for reading and writing)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 24
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
int a;
int b[10];
▪Static vs. dynamic computation
▪Global/static variables can exist before program starts
▪Compiler allocates static variables, giving them a constant address
▪No dynamic computation required to allocate the variables, they just exist
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 25
Assume a is a global variable in C. When is space for a allocated? In other words, when is its address determined?
A. The compiler assigns the address when it compiles the program
B. The compiler calls the memory to allocate a when it compiles the
program
C. The compiler generates code to allocate a before the program starts running
D. The program locates available space for a before it starts running
E. The program locates available space as soon as the variable is used
for the first time.
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 26
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
a = 0;
▪Key Observation:
▪ Address of a, b[0], b[1], … are constants known to the compiler
▪Exercise: use RTL to specify instructions needed for a = 0
▪ What if a is assigned a dynamic value? What if it’s assigned another variable?
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 27
What is the proper RTL instruction for:
a = 0;
A. r[0x1000] ← 0
B. m[0x1000] ← 0
C. 0x1000 ← r[0]
D. m[r[0x1000]] ← 0 E. 0x1000 ← m[0]
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
28
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
b[a] = a;
▪Key Observation:
▪ Value of a is usually unknown at compilation time
▪Even though address of b[0], b[1]… are known statically, which one is used here is dynamic
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 29
▪Compiler doesn’t know address of b[a]
▪Unless it knows the value of a statically (which it might here, but not
in general)
▪Array access is computed from base and index
▪Address of element is base plus offset
▪Offset is index times size of each element
▪The base address (0x2000) and element size (4, as it is an integer) are static, but the index value is dynamic (a’s value can change)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 30
int a;
int b[10];
void foo () {
a = 0;
b[a] = a; }
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
b[a] = a;
▪Exercise: use RTL to specify the instructions for b[a] = a ▪Assume that the compiler does not know the value of a
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 31
What is the proper RTL instruction for:
b[a] = a;
A. m[0x2000+m[0x1000]] ← m[0x1000] B. m[0x2000+4*m[0x1000]] ← m[0x1000] C. m[0x2000+m[0x1000]] ← 4*m[0x1000] D. m[0x2000]+4*m[0x1000] ← m[0x1000] E. m[0x2000]+m[0x1000] ← m[0x1000]
Static Memory Layout
0x1000: value of a
…
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 32
What instructions can we use for a = 0? ▪Option 1: static address and value
m[0x1000] ← 0x0
▪9 bytes (1-byte instruction code plus two 4-byte integers)
▪Option 2: static address, dynamic value r[0] ← 0x0
m[0x1000] ← r[0]
▪5 bytes for immediate value instruction
▪5 bytes for memory instruction (instruction code plus one integer)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 33
▪Option 3: dynamic address and value r[0] ← 0x0
r[1] ← 0x1000
m[r[1]] ← r[0]
▪2 bytes for memory instruction
▪More flexibility (address and value can be dynamic)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 34
What instructions can we use for b[a] = a? ▪Option 1: static base address, index and value
m[0x2000+m[0x1000]*4] ← m[0x1000]
▪ Inputs:
▪Base address (0x2000) ▪Index address (0x1000) ▪Input value address (0x1000)
▪13 bytes (instruction code plus three integers)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 35
▪Option 2: calculate address explicitly r[0] ← 0x1000
r[1] ← m[r[0]]
r[3] ← r[1]*4 (or r[3] ← r[1]<<2 ) r[2] ← 0x2000
r[2] ← r[2]+r[3]
m[r[2]] ← r[1]
▪Uses same instruction as before ▪Issue: array accesses are common
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 36
▪Option 3: dynamic base address, index and value r[0] ← 0x1000
r[1] ← m[r[0]]
r[2] ← 0x2000 m[r[2]+r[1]*4] ← r[1]
▪2 bytes for memory instruction (opcode plus three registers)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 37
01
00001024
00001028
0000102c
01
0
1
13 bytes
▪Minimize the number of memory instructions in ISA ▪At most 1 memory access per instruction
▪No other operation in the same instruction
▪Minimize the total number of instructions in ISA ▪Minimize the size of each instruction
2 bytes
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 38
Regs
▪Requirements for scalars (e.g., assign value to global variable):
▪Load constant into a register
r[0] ← v (where v is a constant integer)
▪Store a value in a register into memory at a constant address m[r[y]] ← r[x] (assume r[y] stores address of value)
▪Load value in memory into a register
r[x] ← m[r[y]] (assume r[y] stores address of value)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 39
▪Additional requirements for arrays
▪Store value in a register into memory at address in register * 4 plus
base address
▪m[r[z] + r[x]*4] ← r[y]
▪Load value in memory at address in register * 4 plus base address
into register
▪r[y] ← m[r[z] + r[x]*4]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 40
▪Generalizing and simplifying, we get:
▪r[x] ← constant
▪m[r[x]] ← r[y] ▪r[y] ← m[r[x]]
▪m[r[x] + r[y]*4] ← r[z] ▪r[z] ← m[r[x] + r[y]*4]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 41
▪The compiler’s semantic translation
▪It uses these instructions to compile the program snippets
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d-- vvvvvvvv
load base+offset
r[d] ← m[r[s]]
ld (rs), rd
1?sd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]] ← r[s]
st rs, (rd)
3s?d
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 42
r[0] ←0
r[1] ← 0x1000
m[r[1]] ← r[0]
r[2] ← m[r[1]]
r[3] ← 0x2000
m[r[3]+r[2]*4] ← r[2]
int a;
int b[10];
void foo () { a = 0;
b[a] = a; }
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d-- vvvvvvvv
load base+offset
r[d] ← m[r[s]]
ld (rs), rd
10sd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]] ← r[s]
st rs, (rd)
3s0d
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 43
ld $0, r0
ld $0x1000, r1 st r0, (r1)
ld (r1), r2
ld $0x2000, r3
st r2, (r3, r2, 4)
r[0] ←0
r[1] ← 0x1000
m[r[1]] ← r[0]
r[2] ← m[r[1]]
r[3] ← 0x2000
m[r[3]+r[2]*4] ← r[2]
int a;
int b[10];
void foo () { a = 0;
b[a] = a; }
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d-- vvvvvvvv
load base+offset
r[d] ← m[r[s]]
ld (rs), rd
10sd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]] ← r[s]
st rs, (rd)
3s0d
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 44
int a;
int b[10];
void foo () { a = 0;
b[a] = a; }
ld $0, r0
ld $0x1000,
st r0, (r1)
ld (r1), r2
ld $0x2000,
st r2, (r3,
r1
r3
r2, 4)
# r0 = 0
# r1 = address of a #a = 0
# r2 = a
# r3 = address of b
# b[a] = a
# the variable a
# the variable b[0]
# the variable b[1]
# the variable b[9]
.pos 0x1000
a: .long 0
.pos 0x2000 b_data .long 0
.long 0 ...
.long 0
00-- 00000000 01-- 00001000 3001
1102
03-- 00002000
4232
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 45
▪Architecture
▪Register file: 8 general purpose registers, each 32-bits long ▪CPU: One cycle per instruction (fetch and execute)
▪Main memory: byte-addressed, big endian
▪Instruction format
▪2-byte or 6-byte instructions
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 46
▪Binary format represented as (each character is a hex digit): x-01, xxsd, x0vv, x-sd vvvvvvvv
▪Where:
▪x is an opcode (instruction unique identification) ▪- means unused portion (ignored by instruction) ▪s and d are register numbers
▪vv and vvvvvvvv are immediate/constant values
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 47
▪Set of lines containing:
[label:] [instruction|directive] [# comment]
▪where:
directive :: (.pos number) | (.long number) instruction :: opcode operand+
operand reg literal offset number
:: $literal|reg|offset(reg)|(reg,reg,4) :: r 0..7
:: number
:: number
:: decimal | 0x hex
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 48
▪Operands have 4 possible addressing modes:
▪Immediate: constant value obtained from instruction itself
▪Register: operand is register number; register stores value
▪Base+Offset: operands are register number and optional offset; register stores memory address (via offset) of value; offset o=p×4
▪Indexed: operands are two register numbers; registers store based address and index of value
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d-- vvvvvvvv
load base+offset
r[d] ← m[r[s]+4*p]
ld o(rs), rd
1psd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]+4*p] ← r[s]
st rs, o(rd)
3spd
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 49
Name
Semantics
Assembly
Machine
register move
r[d] ← r[s]
mov rs, rd
60sd
add
r[d] ← r[d] + r[s]
add rs, rd
61sd
and
r[d] ← r[d] & r[s]
and rs, rd
62sd
inc
r[d] ← r[d] + 1
inc rd
63-d
inc address
r[d] ← r[d] + 4
inca rd
64-d
dec
r[d] ← r[d] – 1
dec rd
65-d
dec address
r[d] ← r[d] – 4
deca rd
66-d
not
r[d] ← ~ r[d]
not rd
67-d
shift left
r[d] ← r[d] << s
shl $s, rd
7dss
shift right
r[d] ← r[d] >> -s
shr $s, rd
7dss
halt
Halt the machine
halt
F0–
nop
Do nothing
Nop
FF–
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 50
What instruction is represented by the bytes: 0x46 0x24?
A. ld (r4,r4,4), r6 B. st r4, (r6,r2,4) C. st r6, (r2,r4,4) D. st r6, 2(r4)
E. st r6, 8(r4)
Name
Semantics
Assembly
Machine
load immediate
r[d] ← v
ld $v, rd
0d– vvvvvvvv
load base+offset
r[d] ← m[r[s]+4*p]
ld o(rs), rd
1psd
load indexed
r[d] ← m[r[s]+4*r[i]]
ld (rs,ri,4), rd
2sid
store base+offset
m[r[d]+4*p] ← r[s]
st rs, o(rd)
3spd
store indexed
m[r[d]+4*r[i]] ← r[s]
st rs, (rd,ri,4)
4sdi
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
51
What instruction is represented by the bytes: 0x75 0xF8?
A. shl $5, r8
B. shl $8, r5
C. shr $5, r8
D. shr $8, r5
E. shl $0xf8, r5
Name
Semantics
Assembly
Machine
shift left
r[d] ← r[d] << s
shl $s, rd
7dss
shift right
r[d] ← r[d] >> -s
shr $s, rd
7dss
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
52
▪PC (program counter): address of next instruction to fetch ▪Instruction: value of the current instruction
▪separated into components:
Ins Op Code Ins Op 0
Ins Op 1
Ins Op 2
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
53
Ins Op Imm
Ins Op Ext
▪Fetch stage:
▪Read instruction at PC ▪Determine size ▪Separate components ▪Update PC
▪Execute stage:
▪Read internal state and/or memory ▪Perform specified computation ▪Update internal state and/or memory
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 54
▪Internal registers:
▪instruction, insOpCode, insOp0, insOp1, insOp2, insOpImm,
insOpExt, pc
▪Read with get(), change with set(value)
▪General purpose registers: ▪Read with reg.get(regNumber) ▪Change reg.set(regNumber, value)
▪Main memory:
▪Read with mem.readInteger(address)
▪Change with mem.writeInteger(address, value)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 55
▪ Java:
public class Foo {
static int a;
static int[] b = new int[10]; public void foo () {
b[a] = a; }
}
▪C:
int a;
int* b;
void foo () {
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder }
56
b = malloc(10*sizeof(int)); b[a] = a;
▪Arrays in Java
▪ Store reference to array allocated dynamically with new statement int[] b = new int[10];
▪Arrays in C
▪Can store static arrays
int bst[10];
▪Can store pointers to other arrays
int *bptr = &bst[0]; // or int *bptr = bst;
▪ Can store pointers to arrays allocated dynamically with call to malloc library function
int *bdyn = malloc(10 * sizeof(int));
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 57
▪Java references are equivalent to C pointers ▪We will use term pointers to refer to either
▪Variable type: pointer to element type
▪Example: pointer to int (represented as int *)
▪Stored internally as an address, regardless of element type
▪Any pointer can be cast to any pointer type
▪C performs no array bounds checking
▪Out-of-bounds access manipulates memory that is not part of array ▪Performance is faster, but source of vulnerability
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 58
▪Dynamic allocation with call to malloc
▪Returns block of bytes with no associated type or initialization
▪Element type and dereferencing determines how the data is interpreted
▪Dynamic allocation will be discussed in more details in future units
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 59
▪Arrays are accessed in C the same way, but declared and allocated differently
▪Static allocation: int b[10];
▪Dynamic allocation:
int *b = malloc(10 * sizeof(int));
0x2000: value of b
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
60
▪When the program runs: ▪Static allocation:
int b[10];
▪Dynamic allocation:
int *b = malloc(10 * sizeof(int));
0x2000: b = 0x3000
0x2000: value of b[0] 0x2004: value of b[1] …
0x2024: value of b[9]
0x3000: value of b[0] 0x3004: value of b[1] …
0x3024: value of b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
61
▪Assembly/machine code requires an additional load b[a] = a;
▪Static allocation: r[0]
← 0x1000 ← m[r[0]] ← 0x2000
▪Dynamic allocation:
r[0] ← r[1] ←
r[2] ← r[3] ← m[r[3]+r[1]*4] ← r[1]
r[1]
r[2]
m[r[2]+r[1]*4] ← r[1]
0x1000 m[r[0]] 0x2000 m[r[2]]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
62
▪Static allocation:
ld $a, r0 # r0 = address of a ld (r0), r1 # r1 = a
ld $b, r2 # r2 = address of b st r1, (r2, r1, 4) # b[a] = a
.pos 0x1000
a: .long 0 # the variable a .pos 0x2000
b: .long 0 # the variable b[0]
.long 0 # the variable b[1] …
.long 0 # the variable b[9]
▪Dynamic allocation:
ld $a, r0 ld (r0), r1 ld $b, r2
# r0 = address of a
# r1 = a
# r2 = address of b
For simulator (since it has no malloc):
ld (r2), r3 # r3 = b = &b[0]
s…t r1, (r3, r1, 4) # b[a] = a .pos 0x2000
.bp:os.l0oxn1g00b0_data # malloc result a.:po.slo0nxg3000# the variable a
.bp_odsat0ax:20.0l0ong 0 # b[0]
b: .long ??? # the variable b .long 0 # b[1]
…
.long 0 # b[9]
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder
63
How many memory load operations are required to handle this instruction?
b[a[0]] = a[b[0]];
A. 2
B. 3
C. 4
D. 5
E. I don’t know
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder 64
▪In Java and C:
▪An array is a list of items of the same type
▪Array elements are named by non-negative integers starting at 0 ▪Syntax for accessing element i of array b is b[i]
▪In Java:
▪Variable stores a pointer to the array
▪In C
▪Variable can store a pointer to the array or the array itself
▪Distinguished by variable type
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 65
▪Pointers are declared as: type *varname;
▪Pointers may be accessed as: varname[0] or *varname varname[i] or *(varname+i)
▪The address of a variable can be obtained with: type a;
type *ptr = &a;
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 66
▪Adding an integer to a pointer:
▪integer is multiplied by type size before adding
▪Example: if a is a pointer of int, then (a+i) is equivalent to 4i bytes ahead of a
▪Works with subtraction as well
▪Subtracting two pointers of the same type
▪returns the number of elements of that type between the addresses ▪equivalent to dividing address difference (in bytes) by type size ▪Example:&a[7] – &a[2] == 5
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 67
int *c;
void foo () {
c = malloc(10*sizeof(int)); c = &c[3];
*c = *&c[3];
}
Which element of the original 10-element array is modified with the instructions on the left?
A. Element with index 3
B. Element with index 6
C. Elementwithindex0
D. No element is modified
E. More than one element is modified
CPSC 213 2020 WT2 © 2021 Jonatan Schroeder
68
ld $0x2000, r0 ld (r0), r1
ld $12, r2
add r1, r2
st r2, (r0)
ld $3, r3
ld (r2,r3,4), r4 # r4 = c[3] st r4, (r2) # *c = c[3]
.pos 0x2000
c: .long 0 # or some data used in simulator to emulate malloc
# r0 = address of c
# r1 = c (malloc result) # r2 = 3*sizeof(int) #r2=c+3
#c=c+3
#r3=3
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 69
▪It’s possible to declare variables as pointers to any types, even pointers!
int **q;
▪ q is a pointer to an int pointer (int **), or a “double-pointer” to int
▪When declaring multiple variables in a line, pointers apply individually:
int *p, i, **q;
▪p is a pointer to an int (int *)
▪i is an int
▪ q is a pointer to an int pointer (int **)
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 70
▪In C, Strings are implemented as arrays of characters ▪Each element is a one-byte integer (type char), representing the
character as an entry in the ASCII table
▪E.g., ‘A’ is represented as 65, ‘a’ as 97, ‘0’ as 48, ‘ ’ (space) as 32 ▪The value 0 (‘\0’) indicates the end of the string
▪So the string “CPSC 213” is represented as:
▪Special characters:
▪‘\n’: line break
▪‘\r’: carriage return (back to start of line, but don’t move to next line)
‘C’
‘P’
‘S’
‘C’
‘’
‘2’
‘1’
‘3’
‘\0’
67
80
83
67
32
50
49
51
0
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 71
▪The main function in C typically receives two arguments:
int main(int argc, char *argv[]) { …
▪argc: the number of arguments
▪argv: an array of strings, each representing an argument
▪argv[0] representes the program name itself, so argc is always >=1
▪So if you call your program:
./myprogram -x 7
▪Your program will receive:
▪argc = 3
▪argv[0] = “./myprogram”, argv[1] = “-x”, argv[2] = “7”
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 72
▪To output text to the console, use printf
▪First argument: format
▪Remaining arguments replace placeholders in format, in same order ▪%s: string
▪%d or %i: signed integer in base 10
▪%u: unsigned integer in base 10
▪%x: hexadecimal integer
▪E.g.,printf(“My grade in %s was %d.\n”, “CPSC 213”, g);
▪To read text from the console, use scanf
▪Similar arguments as printf
▪Arguments must be pointers to destination variables
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 73
▪Static scalars and arrays:
▪Compiler knows the address (memory location) of variable
▪Dynamic arrays:
▪Compiler does not know address of array directly
▪What C does that Java doesn’t
▪Static arrays
▪Array access using pointer dereferencing ▪Pointer arithmetic
▪What Java does that C doesn’t ▪Type-safe dynamic allocation ▪Automatic array-bounds checking
CPSC 213 2020 ST1 © 2020 Jonatan Schroeder 74