CS计算机代考程序代写 compiler Java algorithm Hive 



KIT308 Multicore Architecture and Programming

Assignment 1 Multithreading

Aims of the assignment

The purpose of this assignment is to give you experience at writing a multithreaded program for the CPU.
This assignment will give you an opportunity to demonstrate your understanding of:

creating and handling multiple threads;
partitioning work;
dynamic scheduling; and
data-sharing between threads.

Due Date

Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have been followed
(see the School Office or this link for details)
assignments which are submitted late without good reason will be subject to mark
penalties if they are accepted at all
(see the School office or this link for details on this as well).

Forms to request extensions of time to submit assignments are available from the School of Engineering and ICT
office. Requests must be accompanied by suitable documentation and should be submitted before the assignment due date.

Due to the tight schedule of this unit (and the reliance of future assignments on a solution to this one), extensions will be unable to
be granted and late assignments will not be accepted. If exception circumstances occur for an individual student they must contact the lecture-in-charge
before the assignment due date to arrange alternative methods of assessment in this unit.

Assignment Submission

Your assignment is to be submitted electronically via MyLO and should contain:

An assignment
cover sheet;

A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in the format
provided in the downloadable materials provided below).

A document containing:
A table of timing information comparing the original single-threaded times against each
of three stages on each scene file.
An analysis of the above timing data.

You do not need to (and shouldn’t) submit executables, temporary object files, or images. In particular, you must delete
the “.vs” diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however delete the “Scenes” folder or the
“Outputs” folder (but do delete the images within this one).

Marking

This assignment will be marked out of 100. The following is the breakdown of marks:

Task/Topic
Marks

1. Basic Multithreaded CPU Implementation 30%

Correct (5%) and elegant (5%) thread management

(i.e. the correct image for any number of threads (even >64 threads) 10%

Correct use of “-thread” command-line argument 5%

Successful use of “-testMode” and “-colourise” command-line arguments 5%

Correct partitioning of render for images with height equally divisible by thread count

(i.e. the correct image, with exactly equal work area for each thread, and with the correct number of samples rendered for: Tests 1, 2, 4, 5, and 7 with 8 threads) 5%

Correct partitioning of render for images with height NOT equally divisible by thread count

(i.e. the correct image, with work area allocated “evenly” across threads, and with the correct number of samples rendered for: Tests 3 and 6 with 8 threads, and also Test 1 with 3, 5, 6, 7, and 513 threads) 5%

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation 20%

Correct (5%) and elegant (5%) thread management

10%

Correct (5%) and elegant (5%) data-sharing between threads

(i.e. no possibility of threads “fighting” over shared memory locations) 10%

3. Dynamically Balanced Square-Based Multithreaded CPU Implementation 30%

Correct and elegant thread management 5%

Correct and elegant data-sharing between threads 5%

Correct use of “-blockSize” command-line argument 5%

Correct partitioning of render into squares for images with size that is equally divisible by block size

(i.e. the correct image, with exactly equal work area for each block, rendered for: Tests 1, 2, 5, and 7 with block size 8 and 64) 5%

Correct partitioning of render into squares for images with size that is NOT equally divisible by block size

(i.e. the correct image, with work area for each block either at block size or smaller at edges, rendered for: Tests 3, 4, and 6 with block size 8 and 64) 5%

Correct partitioning of render with no extra samples generated

(i.e. the correct image, with work area for each block either at block size or smaller at edges, and with the correct number of samples rendered for: Tests 3, 4, and 6 with block size 8 and 64) 5%

Documentation 20%

Outputs showing timing information for each stage on all applicable scene files with all thread counts

(to get full marks for this part your code needs to pass all tests for all stages above) 10%

Analysis of data with respect to CPU used

(to get full marks for this part your code needs to pass all tests for all stages above) 10%

Penalties

Failure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, submission of unnecessary temporary or image files,
abnormal solution/project structure, etc.) -10%

Poor programming style (eg. insufficient / poor comments, poor variable names, poor
indenting, obfuscated code without documentation, compiler warnings, etc.) up to -20%

Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days)

Late assignments will not be accepted
up to -100%

 

Programming Style

This assignment is not focussed on programming style, but you should endeavour to follow good programming
practices. You should, for example:

comment your code;
use sensible variables names;
use correct and consistent indenting; and
internally document (with comments) any notable design decisions.

[NOTE: any examples in the provided assignment materials that don’t live up to the above criteria, should be
considered to be deliberate examples of what not to do and are provided to aid your learning ;P]

The Assignment Task

You are to implement a “simple” raytracer in a multithreaded fashion (for a quick definition of raytracing
see the wikipedia page).

From the provided single-threaded raytracer implementation, you will create multiple subsequent versions as follows:

A basic multithreaded CPU implementation.
A dynamically balanced line-based multithreaded CPU implementation.
A dynamically balanced square-based multithreaded CPU implementation.

Implementation

1. Basic Multithreaded CPU Implementation

This stage involves splitting up the render across multiple threads by splitting up the render into equal sized chunks
(or almost equal sized chunks when the number of threads doesn’t divide evenly into the image height) with each chunk being
rendered entirely by an individual thread.

In order to complete this step you will need to:

Write code to partition the rendering job into chunks (so that each thread generates a different part
of the final image).
Write code for the to create and manage multiple threads.

At the end of this stage the
program should be able to handle any image size (any dimensions that are multiples of four,
at least, up to the maximum of 2048×2048).

To be eligible for full marks, your assignment should also use the command-line option “-threads” to set the
number of threads to any value that is less than or equal to the height of the image being rendered and
also respect the “-colourise” option to tint each section of the image with a set of colours (colour re-use is expected for
a large number of threads).

2. Dynamically Balanced Line-Based Multithreaded CPU Implementation

This stage involves splitting up the render into lines which are dynamically allocated to threads from the thread pool
as they request more work.

In order to complete this step you will need to:

Start as many threads as required (specified from the command-line arguments) for the thread pool.
Manage some shared data between the threads so that each do the correct work.

3. Dynamically Balanced Square-Based Multithreaded CPU Implementation

This stage involves splitting up the render into squares which are dynamically allocated to thread from the thread pool
as they request more work.

In order to complete this step you will need to:

Partition the render into squares (requires more carefulness when writing results to the image).

NOTE: this implementation is (slighty) harder than Stage 2, but might not result in an increase in performance.

Hints / Tips

The techniques required to complete each stage rely heavily on work done in the tutorials 2 and 3 —
refer to them often.

Documentation

For each stage of the assignment you attempt you should provide:

average (of 3 runs) timing information for each scene file for the total time taken for a render for particular thread counts. See the next section for an example format for this timing table that specifies which tests are required for which scenes; and
an explanation of the results (e.g. why there’s no difference between the performance of stages 1, 2, and 3 (NOTE: this is a made up example and isn’t necessarily what to expect), or why a particular type of implementation works well (or poorly) on a particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the tests, and you should discuss how the architectural features of the CPU explain the results.

Tests / Timing

The following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests that need to be performed in order to fully
complete the documentation section of the assignment.

In order to confirm your images match the images created by the base version of the assignment code, it’s strongly recommended you use a image comparison tool. For part of
the marking for this, Image Magick will be used:

Image Magick: download for Windows.
e.g. running this from the command-line to test:

    magick.exe compare -metric mae Outputs\cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp Outputs_REFERENCE\cornell.txt_1024x1024x4_RayTracerAss1.exe.bmp diff.bmp

This produces an image (“diff.bmp”) showing the differences between the two source images (with differences shown as red pixels), and also a numeric summary of the difference, here these images are 0.13266% different):

    0.33827 (0.0013266)

(this example deliberately compares the 1×1 sampled to the 4×4 sampled image to show there is a difference).

Note: to fully complete the empty cells in the table below, around 213 total images will need be calculated (each test needs 3 runs, and the
first six tests need to be run with the base code, and with 8 threads for each stage of the assignment, and test 7 needs
to be run for the base code, 1–8 threads, and 16 and 32 threads for stage 1, stage 2, and stage 3 with two different block sizes), so plan your time accordingly.

Timing Test Average Time of 3 Runs (Milliseconds)
Single Threaded Multithreaded Technique / (Stage) Threads
1 2 3 4 5 6 7 8 16 32
1. -input Scenes/cornell.txt -size 1024 1024 -samples 1 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
2. -input Scenes/cornell.txt -size 1024 1024 -samples 4 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
3. -input Scenes/cornell.txt -size 500 300 -samples 1 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
4. -input Scenes/allmaterials.txt -size 1000 1000 -samples 1 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
5. -input Scenes/cornell-199lights.txt -size 256 256 -samples 1 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
6. -input Scenes/5000spheres.txt -size 480 270 -samples 1 Static Chunks (Stage 1) X X X X X X X X X
Dynamic Lines (Stage 2) X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 8 X X X X X X X X X
Dynamic Squares (Stage 3) — blocksize 64 X X X X X X X X X
7. -input Scenes/donuts.txt -size 256 256 -samples 1 Static Chunks (Stage 1)
Dynamic Lines (Stage 2)
Dynamic Squares (Stage 3) — blocksize 8
Dynamic Squares (Stage 3) — blocksize 64

 

 

The following tests will be run on your code to test the -colourise and -testMode options using the cornell.txt scene file (NOTE: these will NOT be compared using ImageMagick as colourisation
can be done in a variety of acceptable ways, and testMode will potentially produce a different pattern for each execution):

Test
Image Result
1.
Stage1.exe -threads 8 -input -size 1024 1024 -samples 1 -colourise

2.
Stage1.exe -threads 8 -input -size 1024 1024 -samples 1 -testMode

3.
Stage1.exe -threads 513 -input -size 1024 1024 -samples 1 -testMode

4.
Stage2.exe -threads 8 -input -size 1024 1024 -samples 1 -testMode

5.
Stage3.exe -threads 8 -input -size 500 300 -samples 1 -testMode -blockSize 8

6.
Stage3.exe -threads 8 -input -size 500 300 -samples 1 -testMode -blockSize 64

 

The following tests will be run on your code for the cornell.txt scene file for stage 1:

Test
Image Result
1.

-size 1024 1024 -samples 1 -threads 3

-size 1024 1024 -samples 1 -threads 5

-size 1024 1024 -samples 1 -threads 6

-size 1024 1024 -samples 1 -threads 7

-size 1024 1024 -samples 1 -threads 513

 

Provided Materials

The materials provided with this assignment contain:

the source code of the base single-threaded version of the raytracer;
a set of scene files to be supplied to the program;
a set of reference output files created from the single-threaded version of the program;
four batch files that will run all 7 timing tests for the base code, and each of the three stages; and
a further set of five batch files that will run comparison tests (assuming Image Magick is installed in its default location) for each of the three stages.

Download the materials as an ZIP file.

Source Code

The provided code consists of 19 source files.

Raytracing logic:
Raytrace.cpp: this file contains the main function which reads
the supplied scene file, begins the raytracing, and writes the output BMP file. The main
render loop, ray trace function, and handling of reflection and refraction is also in this
file.
Intersection.h and Intersection.cpp: these files define a datastructure
for capturing relevant information at the point of intersection between a ray and a scene
object and functions for testing for individual ray-object collisions and ray-scene collisions.
Lighting.h and Lighting.cpp: these files provide functions to apply a lighting
calculation at a single intersection point.
Texturing.h and Texturing.cpp: these files provide functions for the reading points from
3D procedural textures.
Constants.h: this header provide constant definitions used in the raytracing.

Basic types:
Primitives.h: this header contains definitions for points, vector, and rays. It also
provides functions and overloaded operators for performing calculations with vectors and points.
SceneObjects.h: this header file provides definitions
for scene objects (ie. materials, lights, spheres, planes, and cylinders).
Colour.h: this header defines a datastructure for representing colours (with each colour
component represented as a float) and simple operations on colours, including
conversions to/from the standard BGR pixel format.

Scene definition and I/O:
Scene.h and Scene.cpp: the header file contains the datastructure to represent
a scene and a single function that initialises this datastructure from a file. The scene
datastructure itself consists of properties of the scene and lists of the various scene
objects as described above. The implementation
file contains many functions to aide in the scene loading process. Scene loading relies upon
the functionality provided by the Config class.
Config.h and Config.cpp: this class provide facilities for parsing the scene file.
SimpleString.h: this is helper string class used by the Config class.

Image I/O:
ImageIO.h and ImageIO.cpp: these files contain the definitions of functions
to read and write BMP files.

Miscellaneous:
Timer.h: this class provides a simple timer that makes use of different system
functions depending on whether TARGET_WINDOWS, TARGET_PPU, or
TARGET_SPU is defined (we don’t use the latter two, but I left this file unchanged
in case anyone wanted to see how such cross-platform stuff can be handled).

Executing

The program has the following functionality:

By default it will attempt to load the scene “Scenes/cornell.txt” and render it at 1024×1024 with 1×1 samples.

By default it will output a file named “Outputs/[scenefile-name]_[width]x[height]x[sample-level]_[executable-filename].bmp” (e.g. with all the default options, “Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp”)

It takes command line arguments that allow the user to specify the width and height, the anti-aliasing level (must be a power of two),
the name of the source scene file, the name of the destination BMP file, and the number of times to perform the render (to improve
the timing information).
Additionally it accepts some arguments (that are currently unused) for setting the number of threads, whether
each thread will tint the area that it renders, and the size of the block to render (only applicable for stage 3).
Further it accepts an argument “-testMode” that will fill the output with white. When executing with multiple threads
this should be a greyscale colour that represents the thread number.
It loads the specified scene.
It renders the scene (as many times as requested).
It produces timing information for the average time taken to produce a render
ignoring all file IO, and the number of samples generated per run.
It outputs the rendered scene as a BMP file.

For example, running the program at the command line with no arguments would perform the first test (as described in the scene file section):

On execution this would produce output similar to the following (as well as writing the resultant BMP file
to Outputs/cornell.txt_1024x1024x1_RayTracerAss1.exe.bmp):

rendered 1048576 samples, average time taken (1 run(s)): 3672ms

Testing Batch Files

A number of batch files are provided that are intended to be executed from the command line (or run from Windows Explorer), e.g.

For timing: baseTiming.bat will perform all the timing tests required for the base (single-threaded) code (i.e. 3 runs with 8 threads one each Test scene).

stage1Timing.bat will perform all the timing tests required for Stage 1 (i.e. 3 runs with 8 threads one each Test scene).

stage2Timing.bat will perform all the timing tests required for Stage 2 (i.e. 3 runs with 8 threads one each Test scene).

stage3Timing.bat will perform all the timing tests required for Stage 3 at a particular block size (i.e. 3 runs with 8 threads and block size 8 on each Test scene).

For testing (requires Image Magick installation): stage1Tests1.bat will perform all the comparisons required for Stage 1 Tests where the thread count evenly divides into the image height.

stage1Tests2.bat will perform all the comparisons required for Stage 1 Tests where the thread count doesn’t evenly divide into the image height.

stage2Tests1.bat will perform all the comparisons required for Stage 2 Tests.

stage3Tests1.bat will perform all the comparisons required for Stage 3 Tests where the thread count evenly divides into the image height.

stage3Tests2.bat will perform all the comparisons required for Stage 3 Tests where the thread count doesn’t evenly divide into the image height.

(baseTests.bat will compare the base single-threaded code running on your machine against the reference images — this is just to confirm nothing strange is going on.)

 

Ray Tracing

The materials provided with this assignment include an implementation of a simple raytracer based on the
codermind raytracing tutorial. This tutorial also provides a pretty good
introduction to the concepts of ray tracing.

Apart from a general structure rewrite and simplification of the code, the supplied implementation has the
following similarities and differences to the codermind tutorial:

it retains support for:
spheres;
diffuse (Lambert) and specular (Blinn-Phong) lighting;
shadows;
super-sampled anti-aliasing;
simple exposure;
a conic perspective camera model; and
reflection and refraction.

it is extended, in that is provides additional support for:
(infinite) planes;
cylinders;
(simple) procedural 3D textures;
setting the camera’s position and (xz) rotation;
reading and writing BMP files.

it is simplified, in that it doesn’t provide support for:
blobs;
gamma calculations;
perlin noise based procedural textures;
bump mapping;
auto exposure estimation;
bitmap texturing;
cubemap texturing;
an environment cubemap;
depth of field; and
support for reflection and refraction in the same material.

For those of you who want more information (as it’s not entirely necessary to understand these concepts to be
able to complete the assignment) on some of the general and/or mathematical aspects of raytracing, see:

General explanations:
Codermind Tutorial
(archived mirror):
A step-by-step guide and the original source of the codebase of the assignment materials.
Project Amiga Juggler:
A step-by-step guide to the maths and implementation of a raytracer
(handling only spheres and a ground plane) in Java:
Step 1: vectors, rays, dot-product, and cross-product.
Step 5: camera model.
Step 8: ray-sphere intersections and Phong shading.
Step 9: ray-plane intersections.
Step 13: reflection.

Wikipedia’s ray tracing page:
a basic outline of the concepts.
Princeton Ray Casting Lectures:
concepts, maths, and some pseudo code.
Princeton Illumination lectures:
concepts, maths, and some pseudo code.
Softsurfer Algorithms:
geometry algorithms.

Intersections:
Ray-sphere intersections:
wikipedia,
codermind (part 1),
Project Amiga Juggler (step 8),
Princeton Ray Casting Lectures (Slide 11–14).

Ray-plane intersections:
wikipedia,
Princeton Ray Casting Lectures (Slide 17),
Softsurfer.com Algorithms.

Ray-cylinder intersections:
NYU Ray Casting Lectures,
Math Stackexchange,
Interactive Computer Graphics Lectures.

Ray-AABB (axis-aligned bounding box) intersections:
medium.

Ray-triangle intersections:
wikipedia,
source paper,
lighthouse3d,
one more explanation.

Ray-something intersections:
Real-time Rendering.

Lighting:
Diffuse lighting:
codermind (part 1),
Princeton Illumination Lectures (Slide 19–21).

Specular lighting:
codermind (part 2),
Princeton Illumination Lectures (Slide 23–25).

Shadowing:
Project Amiga Juggler (step 8).