CIS 371: Computer Organization & Design
Unit 10: Static & Dynamic Scheduling
Slides originally developed by
, , and at University of Pennsylvania
Copyright By PowCoder代写 加微信 powcoder
CIS 371: Comp. Org & Design | Prof. | Scheduling 1
This Unit: Static & Dynamic Scheduling
System software
• Code scheduling
• To reduce pipeline stalls
• To increase ILP (insn level parallelism)
• Static scheduling by the compiler • Approach & limitations
• Dynamic scheduling in hardware • Register renaming
• Instruction selection
• Handling memory operations
CIS 371: Comp. Org & Design | Prof. | Scheduling 2
Code Scheduling & Limitations
CIS 371: Comp. Org & Design | Prof. | Scheduling 3
Code Scheduling
• Scheduling: act of finding independent instructions • “Static” done at compile time by the compiler (software) • “Dynamic” done at runtime by the processor (hardware)
• Why schedule code?
• Scalar pipelines: fill in load-to-use delay slots to improve CPI • Superscalar: place independent instructions together
• As above, load-to-use delay slots
• Allow multiple-issue decode logic to let them execute at the same time
CIS 371: Comp. Org & Design | Prof. | Scheduling 4
Compiler Scheduling
• Compiler can schedule (move) instructions to reduce stalls • Basicpipelinescheduling:eliminateback-to-backload-usepairs • Example code sequence: a = b + c; d = f – e;
• sp stack pointer, sp+0 is “a”, sp+4 is “b”, etc… After
ld [sp+4]➜r2
ld [sp+8]➜r3
add r2,r3➜r1 //stall st r1➜[sp+0]
ld [sp+16]➜r5
ld [sp+20]➜r6
sub r6,r5➜r4 //stall st r4➜[sp+12]
ld [sp+4]➜r2
ld [sp+8]➜r3
ld [sp+16]➜r5
add r2,r3➜r1 //no stall ld [sp+20]➜r6
st r1➜[sp+0]
sub r6,r5➜r4 //no stall st r4➜[sp+12]
CIS 371: Comp. Org & Design | Prof. | Scheduling 5
Compiler Scheduling Requires
• Largeschedulingscope
• Independent instruction to put between load-use pairs
+ Original example: large scope, two independent computations – This example: small scope, one computation
ld [sp+4]➜r2
ld [sp+8]➜r3
add r2,r3➜r1 //stall st r1➜[sp+0]
After (same!)
ld [sp+4]➜r2
ld [sp+8]➜r3
add r2,r3➜r1 //stall st r1➜[sp+0]
• Compiler can create larger scheduling scopes • For example: loop unrolling & function inlining
CIS 371: Comp. Org & Design | Prof. | Scheduling 6
Scheduling Scope Limited by Branches
r1 and r2 are inputs
jz r1, not_found ld [r1+0]➜r3
sub r2,r3➜r4
jz r4, found
ld [r1+4]➜r1
Aside: what does this code do?
Legal to move load up past branch?
Searches a linked list for an element
No: if r1 is null, will cause a fault
CIS 371: Comp. Org & Design | Prof. | Scheduling 7
Compiler Scheduling Requires
• Enoughregisters
• To hold additional “live” values
• Examplecodecontains7differentvalues(includingsp)
• Before: max 3 values live at any time ® 3 registers enough • After: max 4 values live ® 3 registers not enough
ld [sp+4]➜r2
ld [sp+8]➜r1
add r1,r2➜r1 //stall st r1➜[sp+0]
ld [sp+16]➜r2
ld [sp+20]➜r1
sub r2,r1➜r1 //stall st r1➜[sp+12]
ld [sp+4]➜r2
ld [sp+8]➜r1
ld [sp+16]➜r2
add r1,r2➜r1 // wrong r2 ld [sp+20]➜r1
st r1➜[sp+0] // wrong r1 sub r2,r1➜r1
st r1➜[sp+12]
CIS 371: Comp. Org & Design | Prof. | Scheduling 8
Compiler Scheduling Requires
• Aliasanalysis
• Ability to tell whether load/store reference same memory locations
• Effectively, whether load/store can be rearranged
• Previous example: easy, loads/stores use same base register (sp)
• Newexample:cancompilertellthatr8!=r9? • Must be conservative
ld [r9+4]➜r2
ld [r9+8]➜r3
add r3,r2➜r1 //stall st r1➜[r9+0]
ld [r8+0]➜r5
ld [r8+4]➜r6
sub r5,r6➜r4
st r4➜[r8+8]
ld [r9+4]➜r2
ld [r9+8]➜r3
ld [r8+0]➜r5 //does r8==r9? add r3,r2➜r1
ld [r8+4]➜r6 //does r8+4==r9? st r1➜[r9+0]
sub r5,r6➜r4
st r4➜[r8+8]
CIS 371: Comp. Org & Design | Prof. | Scheduling 9
A Good Case: Static Scheduling of SAXPY
• SAXPY(Single-precisionAXPlusY)
• Linear algebra routine (used in solving systems of equations)
for (i=0;i