This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
The OSU College of Engineering DGX System for Advanced GPU Computing
Computer Graphics
Copyright By PowCoder代写 加微信 powcoder
dgx_system.pptx
mjb – March 10, 2022
OSU’s College of Engineering has six Nvidia DGX-2 systems
Each DGX server:
• Has 16 NVidia Tesla V100 GPUs
• Has 28TB of disk, all SSD
• Has two 24-core Intel Xeon 8168 Platinum 2.7GHz CPUs
• Has 1.5TB of DDR4-2666 System Memory
• Runs the CentOS 7 Linux operating system
Overall compute power:
• Each V100 NVidia Tesla card has 5,120 CUDA Cores and 640 Tensor Cores
• This gives each16-V100 DGX server a total of 81,920 CUDA cores and 10,240 Tensor cores
• This gives the entire 6-DGX package a total of 491,520 CUDA Cores and 61,440 Tensor Cores
Computer Graphics
mjb – March 10, 2022
Performance Comparison with one of our previous Systems
Computer Graphics
BTW, you can also use the rabbit machine:
ssh rabbit.engr.oregonstate.edu
It is a good place to write your code and get it to compile. It is not a good place to do the final run of your code.
mjb – March 10, 2022
How to SSH to the DGX Systems
flip3 151% ssh submit-c.hpc.engr.oregonstate.edu
submit-c 142% module load slurm
Computer Graphics
Type this right away to set your path correctly
ssh over to a DGX submission machine — submit-a and submit-b will also work
mjb – March 10, 2022
How to Check on the DGX Systems
Check on the queues
submit-c 143% squeue
JOBID PARTITION 3923 mime4 3963 mime4 3876 share 3971 nerhp 3881 dgx2 3965 dgx2 3645 dgx2 3585 dgx2 3583 dgx2
submit-c 144% sinfo
NAME USER ST
TIME 1-10:32:19
16:21:03 1-23:36:45 8:59:45 1-22:50:44 13:47:36 5-16:48:09 6-17:34:00 6-18:26:44
NODES NODELIST(REASON) 1 compute-e-1
1 compute-e-2
1 compute-2-6
1 compute-h-8
1 compute-dgx2-1 1 compute-dgx2-4 1 compute-dgx2-5 1 compute-dgx2-3 1 compute-dgx2-3
c_only 2Dex CH3COOH_ tcsh bash bash bash bash bash
jayasurw jayasurw chukwuk dionnec heli chenju3 mishrash azieren azieren
R R R R R R R R R
mix compute-dgxs-1
idle compute-dgxs-[2-3],compute-gpu
drain compute-dgx2-2
mix compute-dgx2-[1,3-6] mix compute-gpu[3-4]
idle compute-gpu2 down compute-gpu1
mix compute-dgx2-[4-6]
mix compute-dgxs-1 idle compute-dgxs-[2-3]
mix compute-dgxs-1 idle compute-dgxs-[2-3]
mix compute-2-6
System Information
PARTITION AVAIL share* up share* up sharegpu up sharegpu up dgx2 up dgx2 up gpu up gpu up gpu up dgx up dgxs up dgxs up class up class up eecs up
TIMELIMIT 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00 7-00:00:00
1:00:00 7-00:00:00
NODES 2 1 1 3 1 5 2 1 1 3 1 2 1 2 1
STATE NODELIST
drain compute-4-[3-4]
mix compute-2-6
Your partitions
Computer Graphics
mjb – March 10, 2022
Submitting a CUDA job to the DGX Systems using Slurm
Your Job Name
Double dash
/usr/local/apps/cuda/cuda-10.1/bin/nvcc -o matrixMul matrixMul.cu ./matrixMul
Create a bash shell file that looks like this
submit.bash:
#!/bin/bash
#SBATCH -J MatrixMult
#SBATCH -A cs475-575
#SBATCH -p class
#SBATCH –gres=gpu:1
#SBATCH -o matrixmul.out
#SBATCH -e matrixmul.err
#SBATCH –mail-type=BEGIN,END,FAIL
Our class account
This is the partition name that we use for classes
submit-c 143% sbatch submit.bash Submitted batch job 474
submit-c 144% cat matrixmul.err
Computer Graphics
These 2 lines are actual bash code Submit the job described in your shell file
mjb – March 10, 2022
Check the output
(I like sending my output to standard error, not standard output)
Note: A single dash (-) is used for a single character flag
A double dash (–) is used for a word (more than a single character) flag
Auto-Notifications via Email
You don’t have to do this, but if you do,
please be sure you get your own email address right!
Our IT people are getting real tired of fielding the bounced emails when people spell their own email address wrong.
Computer Graphics
mjb – March 10, 2022
Computer Graphics
What Showed up in my Email (which I spelled correctly)
mjb – March 10, 2022
Submitting a Loop
#!/bin/bash
#SBATCH -J
#SBATCH -A
#SBATCH -p
#SBATCH –gres=gpu:1
#SBATCH -o matrixmul.out
#SBATCH -e matrixmul.err
#SBATCH –mail-type=BEGIN,END,FAIL
for t in 1 2 4 8 16 32 do
submitloop.bash:
MatrixMul cs475-575 class
/usr/local/apps/cuda/cuda-10.1/bin/nvcc -DNUMT=$t -o matrixMul matrixMul.cu
./matrixMul done
submit-c 153% sbatch submitloop.bash Submitted batch job 475
submit-c 154% tail –f matrixmul.err
Computer Graphics
These 5 lines are actual bash code
Displays the latest output added to matrixmul.err. Keeps doing it forever.
Control-c to get out of it.
mjb – March 10, 2022
Computer Graphics
700.0 600.0 500.0 400.0 300.0 200.0 100.0
Results for Multiplying two 1024×1024 Matrices: Varying the CUDA Block Size
GigaFlops during Matrix Multiplication
0 5 10 15 20 25 30 35
(Each CUDA block was actually NUMTxNUMT threads)
mjb – March 10, 2022
Use slurm’s scancel if your Job Needs to Be Killed
submit-c 163% sbatch submitloop.bash Submitted batch job 476
submit-c 164% scancel 476
Computer Graphics
mjb – March 10, 2022
Submitting an OpenCL job to the DGX System using Slurm
submit.bash:
#!/bin/bash
#SBATCH -J
#SBATCH -A
#SBATCH -p
#SBATCH –gres=gpu:1
#SBATCH -o printinfo.out
#SBATCH -e printinfo.err
#SBATCH –mail-type=BEGIN,END,FAIL
MatrixMult cs475-575 class
g++ -o printinfo printinfo.cpp /usr/local/apps/cuda/cuda-10.1/lib64/libOpenCL.so.1.1 -lm -fopenmp ./printinfo
Computer Graphics
mjb – March 10, 2022
Here’s what printinfo got on one graphics card on the DGX System
Number of Platforms = 1
Platform #0:
Name = ‘NVIDIA CUDA’
Vendor = ‘NVIDIA Corporation’ Version = OpenCL 1.2 CUDA 11.2.153′ Profile = ‘FULL_PROFILE’
Number of Devices = 1
Device #0:
Type = 0x0004 = CL_DEVICE_TYPE_GPU
Device Vendor ID = 0x10de (NVIDIA)
Device Maximum Compute Units = 80
Device Maximum Work Item Dimensions = 3
Device Maximum Work Item Sizes = 1024 x 1024 x 64 Device Maximum Work Group Size = 1024
Device Maximum Clock Frequency = 1530 MHz
Device Extensions:
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd
cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
For reference, rabbit’s graphics card has 15 Compute Units
Computer Graphics
mjb – March 10, 2022
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com