Carnegie Mellon
Assignment 3: GraphRats
GraphRats 15-418/618 Spring 2019
1
Carnegie Mellon
Topics
¢ Application
¢ Implementation Issues
¢ Optimizing for Parallel Performance
¢ Useful Advice
15-418/618 Spring 2019
2
Carnegie Mellon
Basic Idea t=0
012
345
678
¢ Graph
§ KXKgrid
¢ Initial State
§ Start with all R rats in corner
¢ Transitions
§ Each rat decides where to move next
§ Don’t like crowds
§ But also don’t like to be alone § Weighted random choice
t=1
012 345 678
15-418/618 Spring 2019
3
Carnegie Mellon
Node Count Representation (K = 12)
t = 0. +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | | | | | | | | |576| +—+—+—+—+—+—+—+—+—+—+—+—+
15-418/618 Spring 2019
4
t = 1. +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | | | | | | | |165|164| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | | | | | | | |168|79| +—+—+—+—+—+—+—+—+—+—+—+—+
Carnegie Mellon
Simulation Example
t = 0. +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | | | | | | | | |576| +—+—+—+—+—+—+—+—+—+—+—+—+
15-418/618 Spring 2019
5
t = 30. +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||2||1| +—+—+—+—+—+—+—+—+—+—+—+—+ |||||||||||3|2| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||1|2|3|2|2| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | |1| |3| |9|7|11| | +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|2|2|1| |5|16|10|11| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |2|1| |13|12|10|11|12|7| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | |1|9|6|5|8|14|13|11| +—+—+—+—+—+—+—+—+—+—+—+—+ | | |1| |9|9|11|7|10|14|13|13| +—+—+—+—+—+—+—+—+—+—+—+—+ | | |1|1|10|9|6|11|10|12|12|17| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|10|7|12|11|14|12|11|10| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|4|11|11|10|9|10|12|12| +—+—+—+—+—+—+—+—+—+—+—+—+
Carnegie Mellon
Visualizations
Text (“a” for ASCII) Heat Map (“h”)
t = 30. +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||||| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||||2||1| +—+—+—+—+—+—+—+—+—+—+—+—+ |||||||||||3|2| +—+—+—+—+—+—+—+—+—+—+—+—+ ||||||||1|2|3|2|2| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | |1| |3| |9|7|11| | +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|2|2|1| |5|16|10|11| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |2|1| |13|12|10|11|12|7| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | | |1|9|6|5|8|14|13|11| +—+—+—+—+—+—+—+—+—+—+—+—+ | | |1| |9|9|11|7|10|14|13|13| +—+—+—+—+—+—+—+—+—+—+—+—+ | | |1|1|10|9|6|11|10|12|12|17| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|10|7|12|11|14|12|11|10| +—+—+—+—+—+—+—+—+—+—+—+—+ | | | |1|4|11|11|10|9|10|12|12| +—+—+—+—+—+—+—+—+—+—+—+—+
15-418/618 Spring 2019
6
Carnegie Mellon
Running it yourself
linux> cd some directory
linux> git clone https//github.com/cmu15418/asst3-s19.git linux> cd asst3-s19/code
Linux> make demoX
X from 1 to 10
¢ Demos
§ 1: Text visualization, synchronous updates § 2: Heap-map, synchronous updates
15-418/618 Spring 2019
7
Carnegie Mellon
Determining Rat Moves
¢ Count number of rats at current and adjacent locations § Adjacency structure represented as graph
¢ Compute reward value for each location § Based on load factor l = count/average count
15-418/618 Spring 2019
8
§ l * § 𝛂
Ideal load factor (ILF) (varying) Fitting parameter (= 0.4)
Reward ! =
$
$% &'() $%*!+!∗ )
300
8
83
40
120
Carnegie Mellon
Reward Function
Reward ! =
$
$% &'() $%*!+!∗ )
1.2 1 0.8 0.6 0.4 0.2 0
ILF 1.25 ILF 1.75 ILF 2.25
0 1 2 3 4 5 6 7 8 9 10
Load Factor
§ Maximized at ILF
Reward Function
§ Just above average population
§ Drops for smaller loads (too few) and larger loads (too crowded) 15-418/618 Spring 2019
9
Carnegie Mellon
1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
ILF 1.25 ILF 1.75 ILF 2.25
100 1000
Reward Function (cont.)
$
$% &'() $%*!+!∗ )
Reward ! =
§ Falls off gradually
§ Reward(1000) = 0.0132
15-418/618 Spring 2019
10
Reward Function
1 10
Load Factor
Carnegie Mellon
Computing Ideal Load Factor (ILF)
¢ Suppose node has count cl and neighbor has count cr
¢ Compute imbalance as
β(cl,cr) = Clamp log10 cr,−1,+ 1
[c] l
0.01 0.1
1 0.8 0.6 0.4 0.2 0
-0.2 1 -0.4 -0.6 -0.8
-1
10 100
15-418/618 Spring 2019
11
Carnegie Mellon
Computing Ideal Load Factor (cont.) ¢ For node u with population p(u)
[]
¢ Define ILF as
¢ Minimum 1.25
̂ l*(u) = 1.75+ 0.5⋅β(u)
̂
β(u) = Avg(u,v)∈ E β( p(u), p(v))
§ When adjacent nodes much less crowded
¢ Maximum 2.25
§ When adjacent nodes much more crowded
¢ Changes as rats move around
15-418/618 Spring 2019
12
Carnegie Mellon
Selecting Next Move Population
40
Reward (avg load = 10)
300
0.071
8
83
Weighted Choices
(node followed by row-major ordering of neighbors)
x = 1.24
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70
§ Choose random number between 0 and sum of rewards
§ Move according to interval hit 15-418/618 Spring 2019
13
0.678
0.225
0.538
120
0.153
Carnegie Mellon
Update Models
¢ Synchronous § Demo 2
§ Compute next positions for all rats, and then move them § Causes oscillations/instabilities
¢ Rat-order § Demo 3
§ For each rat, compute its next position and then move it § Smooth transitions, but costly
¢ Batch
§ Demo 4
§ For each batch of B rats, compute next moves and then move them § B=0.02*R
§ Smooth enough, with better performance possibilities
15-418/618 Spring 2019
14
Carnegie Mellon
What We Provide
¢ Python version of simulator § Demo 4
§ Very slow
¢ C version of simulator
§ Fast sequential implementation
§ Demo 5: 36X36 grid, 1,290 rats
§ Demo 6: 180X180 grid, 1,036,800 rats
§ That’s what we’ll be using for benchmarks!
¢ Generate visualizations by piping C simulator output into Python simulator
§ Operating in visualization mode § See Makefile for examples
15-418/618 Spring 2019
15
Carnegie Mellon
Correctness
¢ Simulator is Deterministic § Global random seed
§ Random seeds for each rat § Process rats in fixed order
¢ You Must Preserve Exact Same Behavior
§ Python simulator generates same result as C simulator § Use regress.py to check
§ Only checks small cases
§ Useful sanity check
§ Benchmark program compares your results to reference solution
§ Handles full-sized graphs
15-418/618 Spring 2019
16
Carnegie Mellon
Graphs: Tiled (Demos 1–6)
Rats spread quickly within region More slowly across regions
Hub nodes tend to have high counts
¢ Base grid
§ K X K nodes, each with nearest neighbor connectivity
¢ Hub (red) nodes connect to all other nodes in region
¢ ForK=180
§ Most nodes have degree ≤ 5
§ Hubs have degree 899 15-418/618 Spring 2019
17
Other graphs Horizontal
Vertical
15-418/618 Spring 2019
18
¢ Larger regions
¢ k = 180: Max degree = 2,699
Carnegie Mellon
Other graphs
Parquet
15-418/618 Spring 2019
19
¢ Larger regions
¢ k = 180: Max degree = 2,699
Carnegie Mellon
Carnegie Mellon
Initial States (Parquet Graph)
t=0
Right Corner (r) Demo 8
Diagonal (d) Demo 9
Uniform (u) Demo 10
15-418/618 Spring 2019
20
t=5
Graph Representation
N node, M edges
012 345 678
neighbor_start (length = N+1)
0 3 7 10 14 19 23 26 30 33
neighbor
01310242153046 413575248637
Includes self edges length = N+M
15-418/618 Spring 2019
21
7468857
Having pointer to end is useful (why?)
Carnegie Mellon
Carnegie Mellon
Sample Code
¢ From sim.c
¢ Compute reward value for node
/* Compute weight for node nid */
static inline double compute_weight(state_t *s, int nid)
{
int count = s->rat_count[nid];
double ilf = neighbor_ilf(s, nid);
return mweight((double) count/s->load_factor, ilf);
}
¢ Simulation state stored in state_t struct
¢ Reward function computed by mweight
15-418/618 Spring 2019
22
Carnegie Mellon
Sample Code
¢ From sim.c
¢ Compute sum of reward values for node
¢ Store for later reuse
15-418/618 Spring 2019
23
/* Compute sum of weights in region of nid */
static inline double compute_sum_weight(state_t *s, int nid)
{
graph_t *g = s->g;
double sum = 0.0;
int eid;
int eid_start = g->neighbor_start[nid];
int eid_end = g->neighbor_start[nid+1];
for (eid = eid_start; eid < eid_end; eid++) {
int nbrnid = g->neighbor[eid];
double w = compute_weight(s, nbrnid);
s->node_weight[nbrnid] = w;
sum += w;
}
return sum; }
Carnegie Mellon
Sample Code
¢ Compute next move for rat
15-418/618 Spring 2019
24
static inline int next_random_move(state_t *s, int r)
{
int nid = s->rat_position[r];
random_t *seedp = &s->rat_seed[r];
} }
double tsum
graph_t *g
double val
double psum
int eid;
int eid_start
int eid_end
for (eid = eid_start; eid < eid_end; eid++) {
= compute_sum_weight(s, nid);
= s->g;
= next_random_float(seedp, tsum);
= 0.0;
= g->neighbor_start[nid];
= g->neighbor_start[nid+1];
psum += s->node_weight[neighbor[eid]];
if (val < psum) {
return g->neighbor[eid];
}
Carnegie Mellon
Sequential Efficiency Considerations
¢ Consider move computation for rat at node with degree D
§ How many (on average) iterations of loop in next_random_move?
§ Is there a better way?
¢ Provided code uses many optimizations § Precompute weights at start of batch
§ Fast search
15-418/618 Spring 2019
25
Carnegie Mellon
Finding Parallelism
¢ Sequential constraints
§ Must complete time steps sequentially
§ Must complete each batch before starting next
§ ILF values and weights then need to be recomputed
¢ Sources of parallelism § Over nodes
§ Computing ILFs and reward functions § Over rats (within a batch)
§ Computing next moves § Updating node counts
15-418/618 Spring 2019
26
Carnegie Mellon
Performance Measurements
¢ Nanoseconds per move (NPM) § R rats running for S steps
§ Requires time T
§ NPM=109 *T/(R*S)
§ Reference solution:
§ 665 NPM for 1 thread § 84 NPM for 12 threads § 7.9 X speedup
15-418/618 Spring 2019
27
Carnegie Mellon
Performance Targets
¢ Benchmarks
§ 6 combinations of graph/initial state § Each counts 15 points
¢ Target performance
§ T = measured time
§ Tr = time for reference solution
§ Tr / T = How well you reach reference solution performance
§ Full credit when ≥ 0.9 § Partial when ≥ 0.5
15-418/618 Spring 2019
28
Carnegie Mellon
Machines
¢ Latedays cluster
§ 16 worker nodes + 1 head node
§ Each is 12-core Xeon processor (dual socket with 6 cores each) § You submit jobs to batch queue
§ Assigned single processor for entire run
§ Python script provided
¢ Code Development
§ OK to do code development and testing on other machines
§ But, they have different performance characteristics
§ Make sure to use 6 or 12 threads to ensure correct partitioning of
nodes across processors
15-418/618 Spring 2019
29
Carnegie Mellon
Instrumenting Your Code
¢ How do you know how much time each activity takes?
§ Create simple library using cycletimer code
§ Bracket steps in your code with library calls
§ Use macros so that you can disable code for maximum performance
15-418/618 Spring 2019
30
START_ACTIVITY(ACTIVITY_NEXT);
#pragma omp parallel for schedule(static)
for (ri = 0; ri < local_count; ri++) {
int rid = ri + local_start;
s->rat_position[rid] = fast_next_random_move(s, rid);
}
FINISH_ACTIVITY(ACTIVITY_NEXT);
Carnegie Mellon
Evaluating Your Instrumented Code
1 thread
12 threads
194 ms
2077 ms
4029 ms
11733 ms
651 ms
3 ms
1.0 %
11.1 %
21.6 %
62.8 %
3.5 % 0.0 %
startup
compute_weights
compute_sums
find_moves
set_ops
unknown
192 ms
426 ms
940 ms
3168 ms
1325 ms
2 ms
3.2 %
7.0 %
15.5 %
52.3 %
21.9 %
0.0 %
startup
compute_weights
compute_sums
find_moves
set_ops
unknown
¢ Can see which activities account for most time
¢ Can see which activities limit parallel speedup
15-418/618 Spring 2019
31
Carnegie Mellon
Some Logos
15-418/618 Spring 2019
32