CSE 141 Homework 2 – Solutions
Problem 1:
begin loop
finish
addi $t0,$zero,0
addi $t1,$zero,1
slt $t2,$a0,$t1 #
bne $t2,$zero,finish add $t0,$t0,$t1
addi $t1,$t1,2
j loop #
add $v0,$t0,$zero
# return v
$a = n (positive integer) $v = output
# initialize variable v=0; # initialize step=1
# if step >=n goto finish # v=v+step
# step=step+2
The code sum all odd integers between 0 and n
Problem 2: [2 pts each starting from b]
a) move $t5,$t3
add $t5,$zero,$t3
b) clear $t5
add $t5,$zero,$zero
c) li $t5,small
addi $t5,small,$zero
d) li $t5,big
lui $t5,bigH
addi $t5,$t5,bigL
bigH – decimal value of the 16 higher bits of number big;
e) lw $t5,big($t3)
lui $at,bigH
addi $at,$at,bigL add $at,$at,$t3
lw $t5,0($at)
f) addi $t5,$t3,big
lui $at,bigH
addi $at,$at,bigL addi $t5,$at,$t3
g) beq $t5,small,L
addi $at,$zero,small beq $at,$t5,L
h) beq $t5,big,L
addi $at,$zero,big beq $at,$t5,L
i) ble $t5,$t3,L
slt $at,$t5,$t3 beq $at,$zero,L beq $t5,$t3,L
j) bgt $t3,$t5,L
blt $t3,$t5,L
k) bge $t5,$t3,L
ble $t3,$t5,L
# see (f)
#see (i)
Problem 3: [15 pts code + 5 pts for the final statistics]
for(i=0;i<=100;i=i+1){a[i] =b[i] +c}
$a0 - a (base address) $a1 - b (base address) $t0 - i
$s0 - c
add $t0,$zero,$zero
addi $t1,$zero,101 loop slt $t2,$t0,$t1
bne $t2,$zero,exit
# i=0
#limit = 100
add $t3,$t0,$t0 add $t4,$t3,$t3 add $t5,$a1,$t4 lw $t6,0($t5) add $t2,$t6,$s0 add $t3,$a0,$t4 sw $t2,0($t3) addi $t0,$t0,1
j loop
Number of instructions executed = 2+101.11 = 1113 Number of memory data references = 101.2 =202
Problem 4:
a) Percentage of all memory accesses that are for data: Pdata = 33/133 = 25%
b) Percentage of all memory accesses that are reads: Preads = (2/3).33/133 = 16%
Problem 5:
CPIgcc = 0.48*1.3+0.33*1.6+0.17*1.7+0.02*2=1.48 CPIspice=0.5*1.3+0.41*1.6+0.08*1.7+0.01*2=1.46
Problem 6: [10 pts each item]
a)
Execution Time=IC*CPI*ClockPeriod We know that:
CPInew=CPIold ClockPeriodnew=1.1*ClockPeriodold
In order to keep the same performance we must have: Execution Timenew= Execution Timeold
where,
Execution Timenew=ICnew*CPInew*ClockPeriodnew Execution Timeold=ICold*CPIold*ClockPeriodold
Dividing term by term the two equations above we get: ICnew=0.91*ICold
This means that the new gcc must have 91% of the size of the old gcc. This reduction comes from eliminating loads/add pairs. A total of (100-91)/(33*2/3)=41% loads of gcc must be eliminated at least.
b)
Assume that in the above sequence we make the following replacement:
addm $2, addr($3) sub $3,$2,$8
In such a case the information that should be loaded to register $8 is not present there when the subtraction operation is performed. The semantics of the second program is therefore incorrect.
Problem 7: [10 pts code + 10 pts statistics]
a=(x*y)-(b*c)
Acc:
Load[b] Mult[c] Store[temp] Load[x] Mult[y] Subt[temp] Store[a]
Load/Store:
Load $1,[b] Load $2,[c]
lw $8,addr($3) add $2,$2,$8 sub $3,$2,$8
Mult $1,$1,$2 Load $2, [x] Load $3, [y] Mult $2, $2, $3 Sub $1, $2, $1 Store $1, [a]
Stack:
Push[b] Push[c] Mult Push[x] Push[y] Mult Subt Pop[a]
Mem:
Mult [a], [x], [y] Mult [temp], [b], [c] Subt [a], [a], [temp]
Instruction bytes fetched
Acc: 3+3+3+3+3+3+3=21 bytes Load/Store: 4+4+4+4+4+4+4+4=32 bytes Stack: 3+3+1+3+3+1+1+3=18 bytes Mem: 7+7+7=21 bytes
Memory bytes transferred Acc: 7*4=28 bytes Load/Store: 5*4=20 bytes Stack: 5*4=20 bytes
Mem: 9*4=36 bytes
In both cases the most efficient archictecture is the stack.
Problem 8: [20 pts]
faverg: sub $sp,$sp,8 sw $s0,4($sp) sw $s1,0($sp)
# $s0=i
# $s1=sum # sum=0
# i=0
# Important !!! $t1=4*i
loop:
exit:
add $s1,$zero,$zero add $s0,$zero,$zero add $t0,$s0,$s0
add $t1,$t0,$t0
add $t2,$a0,$t1
lw $t3,0($t2)
beq $t3,NULL,exit add $s1,$s1,$t3 addi $t1,$t1,4
j loop
addi $s0,$s0,1
div $s1,$s0
add $v0,$Lo,$zero lw $s0,4($sp)
lw $s1,0($sp)
addi $sp,$sp,8
jr $ra