MIPS4
CS 2506 Computer Organization II MIPS 4: Branch and Memory
You are permitted to work in pairs for this assignment! 1
You may work in pairs for this assignment. If you choose to work with a partner, make sure only one of you submits a solution, and you paste a copy of the
Partners Template that contains the names and PIDs of both students at the beginning of the file.
Always justify your answer; describe your reasoning clearly. No points will be given if a GTA cannot understand your description. This will help you
practice for the real world; your team/manager will judge you poorly if your team/manager cannot understand your written communication/documentation.
What to submit: Submit a word document containing your answers. No late submissions will be accepted.
Both Pipelines have a cycle time of 200ps. Fetch, Execute, and Memory stages take 200ps each. Writeback stage takes 100ps; it
writes to the register file in the middle of clock cycle. Decode stage may take as little as 100ps.
Pipeline: Independent Insts 3
Computer Organization IICS@VT
Unsimplified Datapath with Hazard Detection
Pipeline A Pipeline B: Resolve branch early +
branch prediction
Handling Branches 24
Computer Organization IICS@VT
Branch Prediction: Misprediction
beq
(taken)sub
CS 2506 Computer Organization II MIPS 4: Branch and Memory
You are permitted to work in pairs for this assignment! 2
1. Implement J in a Pipeline
The pipelines on the cover page do not implement J. This problem asks you to implement J in Pipeline B. Assume the following:
• Pipeline B already has a Jump control output, just like the single-cycle CPU; this Jump control output is just floating (i.e.,
not connected to anything).
• There is a strict design requirement to not increase CPU cycle time.
• Due to area constraint, the design is limited to just a few logic gates.
1) List what current connection(s) to remove, if any.
2) List all new gates/module you want to add.
3) For EACH new gate/module, describe what EACH of its ports should connect to.
CS 2506 Computer Organization II MIPS 4: Branch and Memory
You are permitted to work in pairs for this assignment! 3
2. Control Hazards
The following assembly code is generated by compiling a C program to run on a single-cycle CPU:
1: beq $0, $7, label1 #$7 is currently 1
2: beq $0, $0, label1
3: add $1, $2, $0
4: label1: and $4, $5, $6
5: or $8, $9, $10
6: sub $11, $7, $0
7: beq $11, $0, label2
8: slt $7, $8, $9
9: label2: add $10, $11, $12
a) After the original code is recompiled for pipeline A, how many cycles of penalty will pipeline A encounter due to software
bubbles and/or instruction squash? Assume a simple compiler that uses NOPs to ensure correctness. Show your work if
you want partial credit.
b) After the original code is recompiled for pipeline B, how many cycles of penalty will pipeline B encounter due to software
bubbles and/or instruction squash? Assume a simple compiler than uses adds NOPs to ensure correctness. Show your
work if you want partial credit.
CS 2506 Computer Organization II MIPS 4: Branch and Memory
You are permitted to work in pairs for this assignment! 4
3. Dynamic Branch Prediction
The following code creates a histogram of the number of true and false in a very large Boolean array. Currently, array has a
repeating value of true, false, true, false, true, false… (These array values are unknown at compile time). histogram is an array
with 2 elements.
for (int bucket=0; bucket<=1; bucket++){
for(int i=0;i
matchesFound++;
break;
}
list2Iter = list2Iter ->next;
}
list1Iter=list1Iter->next;
}
a) For a 16KB fully-associative LRU cache with 32B/cacheline, what is the hit rate of the two nested loops as a whole? Only
two digits of significance are required for your answer.
b) For a 16KB fully-associative LRU cache with 8B/cacheline, what is the hit rate of the two nested loops as a whole? Only
two digits of significance are required for your answer.
c) For a 16KB fully-associative LRU cache with 64B/cacheline, how would you optimize the nest loops to achieve >90% hit
rate? After your optimization, the result of matchesFound should still be same as before. Show your optimized code.