Parallel Programming Home Schedule Piazza
Project #4: Cellular Automata in MPI
Due date: Thursday November 19, 11:59:59 pm
The goal of this assignment is to learn to use MPI for lock-step time-series simulation with geometric
decomposition. This is a common paradigm for MPI programs in the sciences.
We will explore these issues in the same simulation of cellular automata that we used for Dask. Cellular automata are of wide interest. Wolfram puts forth that simple automata exhibit the complexity that underlies natural phenomena. This is tenet of the eld of digital physics (http://en.wikipedia.org/wiki/Cellular_automata). To keep it simple, we will focus on Conway¡¯s game of life. One of the earliest and best studied 2-d cellular automata.
Game of Life
The following resources may be helpful in understanding the game of life.
http://psoup.math.wisc.edu/mcell/whatis_life.html gives a simple and intuitive description of the rules of Life and how to evaluate them on a grid. In fact, I used this example to debug my program, making sure that I implemented the rules correctly. http://www.shodor.org/cserd/Resources/Tutorials/MPIExamples/Life.php provides an MPI implementation of Life. I found this to be not very helpful at all, but you are welcome to use it as a reference. Were you to turn this in, you would get no points.
http://www.bitstorm.org/gameo ife/ provides another description of the game of life. The applet there includes many common life patterns that you may use to evaluate your code. You will actually turn in some code that implements the glider pattern.
An Implementation
Submit an MPI program which is a completion of the gameo ife.c code stub. We will use a Makefile to compile your code, much like this:
CC=mpicc
all: gol gol:
clean:
rm -f gameoflife
$(CC) gameoflife.c -o gameoflife
Do not alter the code that prints the nal output. You should print after every iteration, following the format in the code stub, or our grade scripts will not be able to parse your output.
The program should implement the following:
A 16¡Á16 square grid of life that wraps around both the x and y axis, i.e squares x,0 and x,15 are adjacent as are squares 0,y and 15,y.
The glider pattern from http://www.bitstorm.org/gameo ife/ should start in the upper left corner. In the end, your simulation should run for 64 iterations so that the glider returns to its original location.
Initially, test your code using fewer iterations by specifying a lower number on the cmd line (code stub has cmd line parameter to specify # of iterations)
Once it works for lower iterations, try your code using the full 64 iterations
On every iteration, you should output the con guration of the board to standard out from process 0. This means that you need to communicate the grid updates from processes 1..n-1 to process 0. In practice, one would only collect the global data structure at the end of the simulation or every kth time step.
A data decomposition that supports 1,2,4, and 8 processors based on slicing the array horizontally. E.g., for 4 processors, P0 gets rows 0-3, P1 gets 4-7, P2 gets 8-11, P3 gets 12-15.
Implement a safe, deadlock-free messaging discipline using MPI point-to-point send and receive.
Using MPI
You can install the MPI libraries on personal computer, you can spin up an instance on AWS, or you can develop in the terminal in Jupyter in the Gigantum project gigantum.com/jhupp/project3-spring20 which already has MPI installed,
Questions
Submit answers to the following questions in a typeset PDF document with separate pages for each subquestion.
1. Compare the horizontal data decomposition that you implemented with a 2-d geometric decomposition (as in the Dask workbook). This decomposition divides the grid recursively into smaller squares, e.g. an n x n grid may consist of 4 (n/2) x (n/2) squares or 16 (n/4) x (n/4) squares or 64 (n/8) x (n/8) squares. Give exact counts as a function of n (the size of the input) and p (the number of processes).
1. What is the total number of messages sent in each decomposition?
2. What is the total amount of data sent by a node in each decomposition? Express your answer in
number of cells/words transmitted (not bytes).
2. The answers to question 1 de ne a tradeo in which you would choose di erent messaging disciplines:
1. When messages are small and latency dominates messaging performance which decomposition would you prefer and why?
2. When messages are large and throughput dominates messaging performance which decomposition would you prefer and why?
3. For the 2-d decomposition, design a deadlock-free messaging discipline for transmitting the top, bottom, left, and right sides and the top left, top right, bottom left and bottom right corners among neighboring partitions.
Draw and submit pictures of the messages sent and received by a single node. Each drawing will describe one of multiple rounds, akin to the gure in Lecture 16. In each round a node may either send or receive from multiple neighbors.
Note: We understand that not all nodes conduct the same protocol. Draw pictures from the perspective of any node.
Submission
Submit your code le a PDF of your answers to GradeScope. There will be an autograder to verify code. Autograder instructions will be posted by Tuesday November 10.