Objectives:
Assignment 7
CMPT 295 – Fall 2018
- Manipulating floating point numbers in x86-64 assembly code
- Understanding recursion in x86-64 assembly code
- Designing and evaluating instruction sets (ISA)
Submission:
- Staple your assignment.
- When creating your assignment, make sure you include
o the question (or the question number) before the answer,
o the questions/answers in a numerical order (i.e., following the order in which thequestions are listed in the assignment).
- To help the TA’s in their marking:
o Write your name (first name then last name) and your student number on the top right of page 1 of your assignment.
o Make sure you clearly identify your final answers. For example, you may put a box around each of your final answers, when appropriate.
- Submit a hard copy of your completed assignment in the correct CMPT 295’s assignment box, i.e., based on the first letter of your last name.
Due: Assignment 7 is a little unusual (like our Assignment 4)
- Submit your solution to all questions by Wednesday, November 7, 2018, 3:00pm
- Unfortunately, there is no possibility of “late assignments for the 1% bonus” since the
solution will be posted on the submission day: Wednesday, November 7, 2018.
Marking scheme:
This assignment will be marked as follows:o Allquestionswillbemarkedforcorrectness.
The amount of marks for each question is indicated as part of the question.
pxor movl cvttss2si movsd movl cvtsd2ss pxor cvtsi2sdq movss movsd pxor cvtsi2sd ret
%xmm1, %xmm1 (%rdi), %eax (%rsi), %r8d (%rdx), %xmm0 %r8d, (%rdi) %xmm0, %xmm1 %xmm0, %xmm0 %rcx, %xmm0 %xmm1, (%rsi) %xmm0, (%rdx) %xmm0, %xmm0 %eax, %xmm0
You can find a description of each of the above instructions on the Resources web page of our course web site.
2. [7 marks] Understanding recursion in x86-64 assembly code
CMPT 295 – Fall 2018
1. [4 marks] Manipulating floating point numbers in x86-64 assembly code
For the following C code, the expressions val1 to val4 all map to the program variables
i, f, d, and l:
double mystery(int *ip, float *fp, double *dp, long l) { int i = *ip;
float f=*fp;
double d = *dp;
*ip = (int) val1; *fp = (float) val2; *dp=(double) val3; return (double) val4;
}
Determine the mapping (i.e., determine for which variable val1, val2, val3 and val4
stand) based on the following x86-64 code for the function mystery:
.globl mystery mystery:
Consider the following recursive implementation of the function factorial fact written in x86-64 implementation:
1 # Buggy version 2 .globl fact 3 #ninedi
4 fact:
-
5 cmpl $1, %edi
- 6 jg endif
-
7 movq $1, %rax
8 ret
9 endif:
- 10 decl %edi
- 11 pushq %rdi
- 12 call fact
- 13 imulq (%rsp), %rax
- 14 leaq 8(%rsp), %rsp
- 15 popq %rdi
- 16 ret
CMPT 295 – Fall 2018
Refresher:
Mathematically, the formula for a factorial is as follows.
If n is an integer greater than or equal to 1, then n ! = n ( n – 1)( n – 2)( n – 3) … (3)(2)(1)
If n = 0, then n ! = 1.
When we compile and execute the above function fact, we get a segmentation fault error. Your task is to debug the function so that it produces the expected results. You are to do so by hand tracing it with the test case n= 4 (expected result 4! = 24) and by creating its stack diagram and its register table. Remember that the registers are shared amongst all invocations of the function. On your stack diagram, make sure you clearly indicate the %rsp, the value stored in each stack cell and the stack frame for each invocation of fact. Indicate the stack frames on the right hand side of the stack diagram. Also, make use of the line numbers on the left hand side of the code above to indicate return addresses in the stack diagram.
Once you have figured out where the problem(s) is/are located, fix the above code by writing its working version below:
1 # Working version 2 .globl fact
3 #ninedi
4 fact:
endif:
ret
Finally, comment the code. In your comment, make sure you indicate the base case and the recursive case.
CMPT 295 – Fall 2018
3. [9 marks] Designing and evaluating instruction sets (ISA)
Instruction Set 1 – x295
During our lectures 23 and 24 (see Lecture Notes 23), we specified an instruction set architecture (ISA) with the following components:
- Memory model of the computer
o Size of external memory (RAM): 212 × 16 o Memoryaddress:12bits
o Wordsize:16bits
o Numberofregisters:0 - Instruction set
o Maximumnumberofinstructions:16 o Opcode size: 4 bits (24 = 16)
o Operand Model:
- Memory (only) – only memory locations are used to hold values (no registers)
- 3 operands
- Operand order: Dst, Src, Src
o Memoryaddressingmode:Direct o Instructions (so far):
CMPT 295 – Fall 2018
ADD a,b,c SUB a,b,c MUL a,b,c
o Template:
Meaning: M[c] <- M[a] + M[b] Meaning: M[c] <- M[a] – M[b] Meaning: M[c] <- M[a] * M[b]
opcode |
Dest C (12 bits) |
XXXX |
Src A (12 bits) |
XXXX |
Src B (12 bits) |
All three instructions fit in this one template. Data size : 16 bit
We wrote the following C program using the instructions in this instruction set (which we called x295):
C program |
X295 program |
z = (x + y) * (x – y) |
ADD x, y, tmp1 |
In order to evaluate our instruction set, we counted the number of memory accesses our program (written in x295) made during its execution. In other words, we counted how many time the execution of our program required a word (16 bits) to be read from or written to memory.
Note that we first did ascertain the fact that we were able to completely write the above C program using the instructions found in our instruction set (x295) and hand tracing it allowed to ascertain that it would produce the expected result (e.g., test case: x = 3, y = 2, expected result: 5).
The table below show the results of our evaluation (using the metric called “memory traffic”, i.e., counting the number of word size memory accesses):
CMPT 295 – Fall 2018
Instructions of x295 program |
Fetch + Decode |
Execute |
ADD x, y, tmp1 |
since the binary encoding of the ADD instruction is 3-word wide since the ADD instruction requires the value of two operands read from memory 3+2 |
1 since the ADD instruction writes its result to memory |
SUB x, y, tmp2 |
since the binary encoding of the SUB instruction is 3-word wide since the SUB instruction requires the value of two operands read from memory 3+2 |
1 since the SUB instruction writes its result to memory |
MUL tmp1, tmp2, z |
since the binary encoding of the MUL instruction is 3-word wide since the MUL instruction requires the value of two operands read from memory 3+2 |
1 since the MUL instruction writes its result to memory |
Observe that the x295 program comprises of 3 instructions (“static code size” metric). A strategy to reduce the number of memory accesses: registers.
Instruction Set 2 – x295+
With the idea of reducing the number of memory accesses, we specified a second instruction set architecture (ISA) with the same components as x295, but with the addition of:
Memory model of the computer
o Numberofregisters:8×16-bitregisters
CMPT 295 – Fall 2018
Grand Total: 18 |
Total: 9 + 6 |
Total: 3 |
Instruction set
o Operand Model:
Registers o Instructions:
ADD rA,rB,rC SUB rA,rB,rC MUL rA,rB,rC COPY rA,rC
LOAD a,rC
STORE rA,c o Template 1:
Meaning: rC <- rA + rB Meaning: rC <- rA – rB Meaning: rC <- rA * rB Meaning: rC <- rA Meaning: rC <- M[a] Meaning: M[c] <- rA
opcode |
Dst C |
Src A |
Src B |
XXX |
The instructions ADD, SUB, MUL and COPY are defined by this template.
o Create template 2 (by adding to the opcode block below): opcode
Hint: LOAD and STORE must both be defined this template. This template is two word wide. Also, remember some of the ISA design principles (Slide 12 of Lecture 22), namely when creating formats (templates) to encode instruction set, use as few formats as possible and fields in different formats that have the same purpose should appear in the same position.
Now, write the C program using the instructions in this new instruction set (which we called x295+):
CMPT 295 – Fall 2018
C program |
x295+ program |
z = (x + y) * (x – y) |
Show the results of your evaluation (using the metric “memory traffic”, i.e., counting the number of word size memory accesses):
CMPT 295 – Fall 2018
Write each line of your x295+ program |
Fetch + Decode |
Execute |
Grand Total: |
Total: |
Total: |
How many instructions does the x295+ programs contain (“static code size” metric): ______
What happens when we reduce the number of operands?
Instruction Set 2 – x295++
Let’s specified a third instruction set architecture (ISA) with the same components as x295+ and x295, but with the addition of:
Instruction set
o Operand Model:
2 operands o Instructions:
ADD rA,rC SUB rA,rC MUL rA,rC COPY rA,rC LOAD a,rC STORE rA,c
Meaning: rC <- rA + rC Meaning: rC <- rA – rC Meaning: rC <- rA * rC Meaning: rC <- rA Meaning: rC <- M[a] Meaning: M[c] <- rA
o Create both templates (by adding to the opcode blocks below) and indicate which instructions would be defined by each template:
opcode
opcode
CMPT 295 – Fall 2018
Now, write the C program using the instructions in this new instruction set (which we called x295++):
CMPT 295 – Fall 2018
C program |
x295++ program |
z = (x + y) * (x – y) |
Show the results of your evaluation (using the metric “memory traffic”, i.e., counting the number of word size memory accesses):
CMPT 295 – Fall 2018
Write each line of your x295++ program |
Fetch + Decode |
Execute |
Grand Total: |
Total: |
Total: |
How many instructions does the x295++ programs contain (“static code size” metric): ____
Can you rewrite your program (using the instructions from this new instruction set which we called x295++) such that the total number of word size memory accesses is reduced by 3:
Conclusion
Considering the “memory traffic” metric (number of memory accesses required by our test program), which instruction set (x295, x295+ or x295++) wins (i.e., produces the fastest program)?
Considering the “static code size” metric (number of instructions required to implement our test program), which instruction set (x295, x295+ or x295++) wins (i.e., produces the shortest program)?
CMPT 295 – Fall 2018
C program |
x295++ program |
z = (x + y) * (x – y) |