编程代写 TUT07 – Processor Architecture

PowerPoint Presentation

TUT07 – Processor Architecture
COMP1411: Introduction to Computer Systems

Copyright By PowCoder代写 加微信 powcoder

Dr. Xianjin XIA
Department of Computing
The Polytechnic University
Spring 2022
These slides are only intended to use internally. Do not publish it anywhere without permission.

Determine the byte encoding of the Y86-64 instruction sequence that follows. The line .pos 0x100 indicates that the starting address of the object code is 0x100. Little-endian byte sequence is used.

.pos 0x100 # Start code at address 0x100
irmovq $15,%rbx
rrmovq %rbx,%rcx
rmmovq %rcx,-3(%rbx)
addq %rbx,%rcx

Y86-64 instruction set

call Dest

cmovXX rA, rB

irmovq V, rB

rmmovq rA, D(rB)

mrmovq D(rB), rA

OPq rA, rB

No Register

Solution for Problem 1
Produce the machine codes (byte encoding) for each assembly instruction
Decide the starting address for each machine code
Fill in the jump address so that the program jumps to the right position
Note that for value/displacement, using little-endian byte sequence

Solution for Problem 1
Instruction icode+ifun Ops Value/Displacement Instruction size (bytes) Start address
irmovq $15,%rbx 30 F3 0F 00 00 00 00 00 00 00 10 0x100
rrmovq %rbx,%rcx 20 31 No 2 0x10A
rmmovq %rcx,-3(%rbx) 40 13 FD FF FF FF FF FF FF FF 10 0x10C
addq %rbx,%rcx 60 31 No 2 0x116
jmp loop 70 No 0C 01 00 00 00 00 00 00 9 0x118

-310 = 0x FF FF FF FF FF FF FF FD  (low addr) FD FF FF FF FF FF FF FF (high addr)
0x10C = 0x 00 00 00 00 00 00 01 0C  (low addr) 0C 01 00 00 00 00 00 00 (high addr)
irmovq V, rB

rmmovq rA, D(rB)

No Register

For each byte sequence listed, determine the Y86-64 instruction sequence it encodes. If there is some invalid byte in the sequence, show the instruction sequence up to that point and indicate where the invalid value occurs.
For each sequence, we show the starting address, then a colon, and then the byte sequence.
A. 0X100: 30F3FCFFFFFFFFFFFFFF40630008000000000000
B. 0X200: A06F800C020000000000000030F30A00000000000000
C. 0X300: 5054070000000000000010F0B01F
D. 0X400: 611373000400000000000000
E. 0X500: 6362A0F0

Y86-64 instruction set

Dest (8 bytes)

call Dest

Dest (8 bytes)
cmovXX rA, rB

V (8 bytes)
irmovq V, rB

D (8 bytes)
rmmovq rA, D(rB)

D (8 bytes)
mrmovq D(rB), rA

OPq rA, rB

No Register

Solution for Problem 2
Parse the first byte to know the instruction type
By this information, decide the operand field and the displacement field
So you know the boundaries of consecutive machine instructions
Translate the machine instructions into Y86-64 assembly instructions
A. 0X100: 30 F3 FCFFFFFFFFFFFFFF 40 63 0008000000000000
30  irmovq; F3 is the operands field; FCFFFFFFFFFFFFFF is the value of the immediate number
F: no register; 3: %rbx; FCFFFFFFFFFFFFFF: -4
0X100: irmovq $-4, %rbx
40  rmmovq; 63 is the operands field; 0008000000000000 is the displacement
6: %rsi; 3: %rbx; 0008000000000000: 0x800
0X10A: rmmovq %rsi, 0x800(%rbx)
V (8 bytes)
irmovq V, rB

D (8 bytes)
rmmovq rA, D(rB)

Solution for Problem 2
B. 0X200: A0 6F 80 0C02000000000000 00 30 F3 0A00000000000000
A0  pushq; 6F is the operand field
0x200: pushq %rsi
80  call; 0C02000000000000 is the destination address
0C02000000000000 = 0x20C
0x202: call 0x20C
0x20B: halt
30  irmovq; F3 is the operands; 0A00000000000000 is the immediate number
F: no register; 3: %rbx; 0A00…00 = 10
0x20C: irmovq $10, %rbx (start of a sub-function)

call Dest

Dest (8 bytes)

V (8 bytes)
irmovq V, rB

Solution for Problem 2
C. 0X300: 50 54 0700000000000000 10 F0 B01F
50  mrmovq; 54 is the operands; 0700000000000000 is the displacement
5: %rbp; 4: %rsp; 0700…00 = 7
0x300: mrmovq 7(%rsp), %rbp
0x30A: nop
F0  NO SUCH INSTRUCTION IN OUR Y86-64 ISA
0x30B: F0, invalid instruction
B0  popq; 1F is the operands
0x30C: popq %rcx
D (8 bytes)
mrmovq D(rB), rA

Solution for Problem 2
D. 0X400: 61 13 73 0004000000000000 00
61  subq; 13 is the operands
1: %rcx, 3: %rbx
0x400: subq %rcx, %rbx
73  je; 0004000000000000 is the destination address
0004000000000000 = 0x0400
0x402: je 0x0400  je loop
0x40B: halt

subq rA, rB

Dest (8 bytes)

Solution for Problem 2
E. 0X500: 63 62 A0F0
63  xorq; 62 is the operands
6: %rsi; 2: %rdx
0x500: xorq %rsi, %rdx
A0  pushq; F0 is the operands
This instruction is invalid because “F” cannot be the first operand of a pushq instruction by definition
xorq rA, rB

Here you are given the generic steps of executing an mrmovq instruction which moves an 8 bytes data from a memory location with address D(rB) to a given register. Please:
(1) write out the Y86-64 machine code for the instruction
mrmovq 0x120(%rdx), %rax
(2) fill in the actual steps in each stage, given (%rdx) = 15; (%rax) = 0x400; PC = 0x300
Stages mrmovq D(rB), rA mrmovq 0x120(%rdx), %rax
Fetch icode: ifun ← M1[PC]
rA:rB ← M1[PC+1]
valC ← M8[PC+2]
valP ← PC+10
Decode valB ← R[rB]
Execute valE ← valB + valC
Memory valM ← M8[valE]
Write back R[rA] ← valM
PC update PC ← valP

Solution for Problem 3
Here you are given the generic steps of executing an mrmovq instruction which moves an 8 bytes data from a memory location with address D(rB) to a given register. Please:
(1) write out the Y86-64 machine code for the instruction
mrmovq 0x120(%rdx), %rax
mrmovq: 50
rA = %rax = 0; rB = %rdx = 2; rA:rB = 02
Displacement = 0x120  20 01 00 00 00 00 00 00

50 02 20 01 00 00 00 00 00 00
D (8 bytes)
mrmovq D(rB), rA

Solution for Problem 3
Here you are given the generic steps of executing an mrmovq instruction which moves an 8 bytes data from a memory location with address D(rB) to a given register. Please:
(1) write out the Y86-64 machine code for the instruction
mrmovq 0x120(%rdx), %rax
50 02 20 01 00 00 00 00 00 00
(2) fill in the actual steps in each stage, given (%rax) = 15; (%rdx) = 0x400; PC = 0x300
Stages mrmovq D(rB), rA mrmovq 0x120(%rdx), %rax
Fetch icode: ifun ← M1[PC]
rA:rB ← M1[PC+1]
valC ← M8[PC+2]
valP ← PC+10 icode:ifun ← M1[0x300] = 5:0
rA:rB ← M1[0x300+1] = 0:2
valC ← M8[0x300+2] = 0x120
valP ← 0x300 + 10 = 0x30A
Decode valB ← R[rB] valB ← R[%rdx] = 0x400
Execute valE ← valB + valC valE ← 0x400 + 0x120 = 0x520
Memory valM ← M8[valE] valM ← M8[0x520]
Write back R[rA] ← valM R[%rax] ← valM
PC update PC ← valP PC ← 0x30A

D (8 bytes)
mrmovq D(rB), rA

Suppose we analyze the combinational logic of the figure below and determine that it can be separated into a sequence of six partitions, named A to F, having delays of 80, 30, 60, 50, 70, and 10ps, respectively. Operating each register between two consecutive partitions takes 20ps.

A. Inserting a single register to produce a two-stage pipeline. Where should the register be inserted to maximize throughput? What would be the throughput and latency?
B. Where should two registers be inserted to maximize the throughput of a three-stage pipeline? What would be the throughput and latency?
C. Where should three registers be inserted to maximize the throughput of a 4-stage pipeline? What would be the throughput and latency?
D. What is the minimum number of stages that would yield a design with the maximum achievable throughput? Describe this design, its throughput, and its latency.

Solution for Problem 4
? stages Where? Throughput and latency
2 A | B, 80 | 220
B | C, 110 | 190
C | D, 170 | 130
D | E, 220 | 80
E | F, 290 | 10 Th = 1 / ((170 + 20) * 10-12) = 5.26 * 109 IPS
Latency = (170 + 20 ) * 2 = 380 ps
3 10 possibilities
AB | CD | EF, 110 | 110 | 80 Th = 1 / ((110 + 20) * 10-12) = 7.69 * 109 IPS
Latency = (110 + 20) * 3 = 390 ps

Hint: make the partitions as even as possible, or reduce the length of the longest stage to maximize throughput.

Solution for Problem 4
? stages Where? Throughput and latency
4 10 possibilities
A | BC | D | EF, 80 | 90 | 50 | 80 Th = 1/ ((90+20) * 10-12) = 9.09 * 109 IPS
Latency = (90 + 20) * 4 = 440 ps
Max th Each partition a stage: yes, max throughput, but can be less stages, by grouping E and F
A | B | C | D | EF
80 | 30 | 60 | 50 | 80 To achieve max through, 5 stages
Th = 1/ ((80+20) * 10-12) = 1010 IPS
Latency = (80 + 20) * 5 = 500

Hint: make the partitions as even as possible, or reduce the length of the longest stage to maximize throughput.
Rethink: for this example, to partition the execution into more stages will improve the throughput, but with the cost of more REG stages, and thus larger latency

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com