High Performance Computing
Dr Ligang He
Guest lecture
Copyright By PowCoder代写 加微信 powcoder
Design of a Low-Level Interconnection Network
Dean Chester
• Principle and benefits of parallel I/O
• Implementation of parallel I/O (in MPI)
High Performance Parallel I/O •Why are we looking at parallel I/O?
p I/O is a major bottleneck in some parallel applications
• Processing sensor data in earth science
• Biological sequency analysis in computational biology
Parallel I/O – version 1.0
Assume 4 processes compute the elements in a matrix in parallel and the results need to be written into the disk.
Early solutions:
All processes send data to process 0, which then writes to a file
Parallel I/O – version 1.0
Assume 4 processes compute the elements in a matrix in parallel and the calculation results need to be written into the disk.
Early solutions:
All processes send data to process 0, which then writes to file
Parallel I/O
Bad things about version 1.0
1. Single node bottleneck 2. Single point of failure 3. Poor performance
4. Poor scalability
Good things about version 1.0
1. The IO system only needs to deal with I/O from one process 2. Do not need specialized I/O library
3. Results in a single file which is easy to manage
version 1.0
Parallel I/O
Each process writes to a separate file
version 2.0
Parallel I/O
Each process writes to a separate file
version 2.0
Good things about version 2.0
1. Now we are doing things in parallel 2. High performance
Parallel I/O
Each process writes to a separate file
version 2.0
Bad things about version 2.0
1. We now have lots of small files to manage
2. How do we read the data back when #procs changes? 3. Does not interoperate well with other applications
Parallel I/O
Multiple processes of parallel program access (read/write) data from a common file
version 3.0
Parallel I/O
Good things about version 3.0
p Simultaneous I/O from any number of processes
p Excellent performance and scalability
p Results in a single file which is easy to manage and interoperates well with other applications
p Maps well onto collective operations
Bad things about version 3.0
p Requires more complex I/O library support
p Traditionally, when one process is accessing a file, it blocks
the file and another process cannot access it.
p Needs the support of simultaneous access by multiple processes
version 3.0
What is Parallel I/O?
Multiple processes of a parallel program accessing (reading or writing) different parts of a common file at the same time
Parallel I/O is the integral part since MPI-2
I/O optimization: Data Sieving reading
Data sieving is used to combine lots of small accesses into a single larger one
pReducing the number of I/O operations
I/O optimization: Data Sieving Writes
Using data sieving for writes is more complicated
p read the entire region first p Then make the changes p Then write the block back
Requires locking in the file system
p Can result in false sharing
I/O optimization: Collective I/O
– Problems with independent, noncontiguous access
p Lots of small accesses (9 separate accesses in this example)
-Collective operations
p Underlying I/O layers know what data are being requested
by each process
p First phase reads the entire block
p Second “phase” moves data to final destinations
I/O optimization: Collective I/O
Collective I/O is coordinated access to storage by a group of processes
pCollective I/O functions must be called by all processes participating in I/O at the same time
pAllows I/O layers to know more as a whole about the data to be accessed
Parallel I/O example
p Consider a 16×16 array stored in the disk in the row-major order p Each of 16 processes accesses a 4×4 subarray
Parallel I/O example
p Consider a 16×16 array stored in the disk in the row-major order p Each of16 processes access a 4×4 subarray
Access pattern 1:
MPI_File_seek
Updates the individual file pointer
int MPI_File_seek( MPI_File mpi_fh, MPI_Offset offset, int whence );
mpi_fh : [in] file handle (handle) offset : [in] file offset (integer) whence : [in] update mode (state)
MPI_File_Seek updates the individual file pointer according to offset and whence, which has the following possible values:
MPI_SEEK_SET: the pointer is set to offset
MPI_SEEK_CUR: set to the current pointer position plus offset MPI_SEEK_END: set to the end of file plus offset
Access pattern 1: MPI_File_read Read using individual file pointer
int MPI_File_read( MPI_File mpi_fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status );
mpi_fh: [in] file handle (handle)
buf: [out] initial address of buffer (where the data to be put) count: [in] number of elements in buffer (nonnegative integer) datatype: [in] datatype of each buffer element (handle)
status: [out] status object (Status)
Access pattern 1
p One independent read request is done for each row in the local array
MPI_File_open(… , “filename”, … , &fh) for(i=0; i < n_local_rows; i++) {
... set offset ...
MPI_File_seek (fh, offset, ...) MPI_File_read (fh, row[i], 4,...)
MPI_File_close (&fh)
64 independent I/O operations
p Individual file pointers per process per file handle
p Each process sets the file pointer with some suitable offset
p The data is then read into the local array
p This is not a collective operation
Access pattern 2: MPI_File_read_all MPI_FILE_READ_ALL is a collective version of MPI_FILE_READ
int MPI_File_read_all( MPI_File mpi_fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status );
fh : [in] file handle (handle)
buf : [out] initial address of buffer (choice)
count : [in] number of elements in buffer (nonnegative integer) datatype : [in] datatype of each buffer element (handle)
status : [out] status object (Status)
Access pattern 2
p Similar to access pattern 1 but using collectives
p All processes that opened file will read data together (with own
access information)16
MPI_File_open(... , “filename”, ... , &fh) for(i=0; i < n_local_rows; i++) {
MPI_File_seek (fh, offset, ...) MPI_File_read_all (fh, row[i], ...)
MPI_File_close (&fh)
16 I/O operations
p read_all is a collective version of the read operation
p This is blocking read
p Each process accesses the file at the same time
p This may be useful as independent I/O operations do not convey what other procs are doing at the same time
Access pattern 3: Definitions p File view
p View is the set of data visible to a process in a file, defined by displacement, etype and filetype
p Displacement
p Defines the location where a view begins p Position relative to the beginning of a file
Access pattern 3: Definitions p File view
p View is the set of data visible to a process in a file, defined by displacement, etype and filetype
p etype (elementary datatype)
p Unit of data access and positioning
p Can be a predefined or derived datatype
p Displacement is expressed as multiples of etypes
Access pattern 3: Definitions p File view
p View is the set of data visible to a process in a file, defined by displacement, etype and filetype
p Filetype
p Defines a template/pattern in a file accessible by a process
Access pattern 3: Definitions p File view
p View is the set of data visible to a process in a file, defined by displacement, etype and filetype
p View is a repeated pattern defined by filetype (in units of etypes) beginning at the displacement
p Can construct the derived datatype for the filetype
• In this case, use MPI_Type_Vector(count, blocklen, stride, oldtype, newtype) 29
Access pattern 3: complementary views of multiple processes
proc. 0 filetype proc. 1 filetype proc. 2 filetype
Group of processes using complementary views to achieve global data distribution
Different processes can have different views
displacement
Partitioning a file among parallel processes
MPI_File_set_view
Describes a part of the file accessed by a MPI process.
int MPI_File_set_view( MPI_File mpi_fh, MPI_Offset disp, MPI_Datatype etype, MPI_Datatype filetype, char *datarep, MPI_Info info );
mpi_fh :[in] file handle (handle)
disp :[in] displacement (nonnegative integer) etype :[in] elementary datatype (handle) filetype :[in] filetype (handle)
datarep :[in] data representation (string)
info :[in] info object (handle)
MPI_Type_create_subarray
Create a datatype for a subarray of a big array
int MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );
Parameters
oldtype :[in] array element datatype (handle) newtype :[out] new datatype (handle)
MPI_Type_create_subarray
Create a datatype for a subarray of a regular, multidimensional array
int MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );
Parameters
oldtype :[in] array element datatype (handle)
newtype :[out] new datatype (handle)
ndims :[in] number of array dimensions (positive integer)
MPI_Type_create_subarray
Create a datatype for a subarray of a regular, multidimensional array
int MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );
Parameters
oldtype :[in] array element datatype (handle)
newtype :[out] new datatype (handle)
ndims :[in] number of array dimensions (positive integer)
array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers)
MPI_Type_create_subarray
Create a datatype for a subarray of a regular, multidimensional array
int MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );
Parameters
oldtype :[in] array element datatype (handle)
newtype :[out] new datatype (handle)
ndims :[in] number of array dimensions (positive integer)
array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers)
array_of_subsizes :[in] number of elements of type oldtype in each dimension of the subarray (array of positive integers)
MPI_Type_create_subarray
Create a datatype for a subarray of a regular, multidimensional array
int MPI_Type_create_subarray( int ndims, int array_of_sizes[], int array_of_subsizes[], int array_of_starts[], int order, MPI_Datatype oldtype, MPI_Datatype *newtype );
Parameters
oldtype :[in] array element datatype (handle)
newtype :[out] new datatype (handle)
ndims :[in] number of array dimensions (positive integer)
array_of_sizes :[in] number of elements of type oldtype in each dimension of the full array (array of positive integers)
array_of_subsizes :[in] number of elements of type oldtype in each dimension of the subarray (array of positive integers)
array_of_starts :[in] starting coordinates of the subarray in each dimension (array of nonnegative integers) order :[in] array storage order flag (state)
Example of using the Subarray Datatype
gsizes[0] = 16; /* no. of rows in global array */ gsizes[1] = 16; /* no. of columns in global array*/
psizes[0] = 4; /* no. of procs. in vertical dimension */ psizes[1] = 4; /* no. of procs. in horizontal dimension */
lsizes[0] = 16/psizes[0]; /* no. of rows in local array */ lsizes[1] = 16/psizes[1]; /* no. of columns in local array*/
dims[0] = 4; dims[1] = 4;
periods[0] = periods[1] = 1;
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 0, &comm); MPI_Comm_rank(comm, &rank);
MPI_Cart_coords(comm, rank, 2, coords);
/* global indices of first element of local array */
start_indices[0] = coords[0] * lsizes[0];
start_indices[1] = coords[1] * lsizes[1];
MPI_Type_create_subarray(2, gsizes, lsizes, start_indices, MPI_ORDER_C, MPI_FLOAT, &filetype);
MPI_Type_commit(&filetype);
Cartesian Topology
naming the processes in a communicator using Cartesian coordinates
int MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart)
Access pattern 3
q Each process creates a derived datatype to describe the non- contiguous access pattern
q We thus have a file view and independent access
q Creates a datatype describing a subarray of a multi-dimentional array
q Commits the datatype
q Open the file as before
MPI_Type_create_subarray (... , &subarray, ...) MPI_Type_commit (&subarray)
MPI_File_open(... , “filename”, ... , &fh) MPI_File_set_view (fh, disp, MPI_INT, subarray, ...) MPI_File_read (fh, local_array, 1, subarray,...) MPI_File_close (&fh)
Access pattern 3
q Each process creates a derived datatype to describe the non- contiguous access pattern
q We thus have a file view and independent access
16 independent requests
Each request contain 4 non-contiguous accesses
q Now changes the process’s view of the data in the file using set_view
q set_view is collective q Although the reads
are still independent
MPI_Type_create_subarray (... , &subarray, ...) MPI_Type_commit (&subarray)
MPI_File_open(... , “filename”, ... , &fh) MPI_File_set_view (fh, disp, MPI_INT, subarray, ...) MPI_File_read (fh, local_array, 1, subarray,...) MPI_File_close (&fh)
Access pattern 4
q Each process creates a derived datatype to describe the non- contiguous access pattern
q We thus have a file view and independent access
q Creates and commits
MPI_Type_create_subarray
(... , &subarray, ...)
MPI_Type_commit (&subarray) MPI_File_open(... , “filename”, ... , &fh) MPI_File_set_view (fh, ..., subarray, ...) MPI_File_read_all (fh, local_array, 1, subarray,...) MPI_File_close (&fh)
Single collective
datatype as before
q Now changes the processe’s view of the data in the file using set_view
q set_view is collective
q Reads are now collective
Access patterns
q We discussed four different styles of parallel I/O
q You should choose your access pattern depending on
the applications
q Combine multiple small I/O requests to a bigger request q Collectives are going to do better than individual reads q Pattern 4 offers (potentially) the best performance
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com