View Submission | Gradescope 11/4/22, 6:12 AM
requires an index
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Pa 1 of 21
Q1 Multiple Choices 30 Points
Copyright By PowCoder代写 加微信 powcoder
Please pick the most accurate answer among the options.
Regarding the performance scaling on programs on multicore processors, please identify the INCORRECT statement.
! If the program has 50% of parallelizable part, having unlimited number of processor can speedup the execution time by 4x
” If parallelize part of the program will lead to overhead on the non-parallelized part, the program may be slowed down
” The complexity of the parallel algorithm doesn’t really matter that much for the overall performance if the amount of parallel processors is unlimited
” If we have unlimited number of parallel processors but not every piece of the program can be parallelized, single-core performance will eventually dominate the application performance
Assuming your cache is smaller than your memory space… Your colleague, John, says a “tag” field will be required. Choose the most accurate answer.
” He’s partially right, it does require a tag BUT it also always
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 2 of 21
descope 11/4/22,
requires an index
! He’s sometimes right, it depends on whether your cache is direct-mapped or associative
” He’s right, all caches require tags in order to make sure the correct data is in that block
” He’s right, all caches require tags in order to know what block in the cache to access
” He’s wrong – only index bits are required
In the instruction set of the mega-processor, there are 4 types of instructions: W, X, Y, and Z-type. For the most common programs on the mega-processor, each type of instruction takes the following percentages of execution time:
W-type: 25%
X-type: 50%
Y-type: 15%
Z-type: 10%
Your hardware engineers tell you they can only improve the execution time of one type of instruction for the next generation of processor. Which single change would translate to the best overall performance improvement?
” Improving W-type instructions by a factor of 5 ” Improving Z-type instructions by a factor of 10 ” Improving Y-type instructions by a factor of 8 ! Improving X-type instructions by a factor of 2
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 3 of 21
descope 11/4/22,
Comparing the straight-forward/baseline matrix multiplication algorithm and block algorithm, what is/are the main factor(s) in performance equation allowing the block algorithm to outperform the other?
for(i = 0; i < ARRAY_SIZE; i++) {
for(j = 0; j < ARRAY_SIZE; j++) {
for(k = 0; k < ARRAY_SIZE; k++) {
c[i][j] += a[i][k]*b[k][j];
Block Algorithm:
for(i = 0; i < ARRAY_SIZE; i+=(ARRAY_SIZE/n)) {
for(j = 0; j < ARRAY_SIZE; j+=(ARRAY_SIZE/n)) {
for(k = 0; k < ARRAY_SIZE; k+=(ARRAY_SIZE/n)) {
for(ii = i; ii < i+(ARRAY_SIZE/n); ii++)
for(jj = j; jj < j+(ARRAY_SIZE/n); jj++)
for(kk = k; kk < k+(ARRAY_SIZE/n); kk++)
c[ii][jj] += a[ii][kk]*b[kk][jj];
" Instruction Count
" Instruction Count & Cycle time
" Cycle time
! Cycles per instruction
" Instruction Count & Cycles per instruction
View Submission | Gra 6:12 AM
" Change from direct-mapped to 4-way set associativity https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 4 of 21
descope 11/4/22,
Regarding CISC (e.g., x86) and RISC (e.g. MIPS and RISC-V) ISAs, please identify the correct statement.
" The number of instructions needed to implement a program is typically more when using CISC ISA.
" The same source code typically performs better when compiled into RISC ISA and running on RISC hardware.
" A CISC instruction typically needs more bytes to encode then a RISC instruction.
! The hardware implementation of CISC ISA typical requires the support of more operations.
Your companies’ code often traverses large arrays and spends much of its time executing code that closely resembles the following:
for(int i=0; i<1000000; i++)
A[i] = A[i] + X;
Which of the following change can improve the cache hit rate?
! Increase the cache size and increase the way associativity
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 5 of 21
descope 11/4/22,
" Change from direct-mapped to 4-way set associativity " Increase the cache size to 64KB
" Increase the block size to 64 byte blocks
" Change from direct-mapped to 2-way set associativity
Which of the following tool can help identify the most time consuming function in an application?
" lat_mem_rd " gdb
" valgrind
Suppose you experience a cache miss on a block (let's call it block A). You have accessed block A in the past. There have been precisely 1027 different blocks accessed between your last access to block A and your current miss. Your block size is 64 bytes and you have a 32KB cache. What kind of miss was this?
" Capacity Miss
" Compulsory Miss
! Both Conflict Miss and Compulsory Miss
View Submission | Gra 6:12 AM
The instruction count of the program running on Core i7 5960x
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 6 of 21
descope 11/4/22,
" Conflict Miss
Prof. Usagi was debating with Prof. Alvarado regarding the nonsense of computation complexity in algorithms in the era of thousands of processor cores. Which of the following law should Prof. Usagi use to convince Prof. Alvarado?
" CPU Performance Equation " Moore's Law
! Amdahl's Law
The following chart shows the execution time of two x86 processors in seconds when using Adobe Photoshop CC to apply 6 filters to a 16 MB TIF image.
Which statement explains why the performance of Core i7 is better?
! The CPI of AMD FX-8350 is higher
" The cycle time of Core i7 5960x is longer " The cycle time of AMD FX-8350 is shorter
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 7 of 21
descope 11/4/22,
Under what circumstance, can we use inference per second (IPS) as a fair performance metric?
" Comparing the performance of running EfficientNet model using cifar-100 datasets using PyTorch on CPU and and tflite on EdgeTPU.
! Comparing the performance of running EfficientNet model using cifar-100 datasets using PyTorch on CPU and tensorflow on GPU.
" Comparing the performance of running EfficientNet model using cifar-100 datasets using PyTorch on CPU and GPU.
Regarding virtual memory, please identify the INCORRECT statement.
! Virtual memory abstraction is completely transparent to the programmer as the processor hardware will maintain the abstraction without any software intervention
" Virtual memory abstraction allows programmers to be agnostic to the capacity of physically installed memory.
" In an multiprogrammed environment with virtual memory abstraction, if the total memory usage of all running programs surpasses the installed memory, the system can continue
" The instruction count of the program running on Core i7 5960x
execution.
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 8 of 21
descope 11/4/22,
" Virtual memory abstraction can help improve the performance of single-thread program when the system resource is exclusive to that program
In x86-64 architecture, how many main memory accesses would be required if an instruction cache is a miss but also a TLB miss?
!1 "5 "3 "4 "2
Which of the following mechanism does not help improve conflict misses?
! Prefetching
" Increasing way-associativity " Missing caching
" Victim caching
execution.
View Submission | Gra 6:12 AM
https://www.gradescope.com/courses/430719/assignments/2379059/submissions/145199646 Page 9 of 21
descope 11/4/22,
Considering the following data structure
struct movie
int *ratings;
double average;
If a program declared an array of struct movie as
struct movie movies[1024]
Assume &movies[0] is 0x10000, what's &movies[1023]?
" 0x14FEC " 0x17FE0 " 0x15FE8 " 0x16EF4 ! 0x18000
Q2 Short Answer Questions 22 Points
Please use less than 30 words in each field to explain your answer. If you use more than 30 words, your answer won't count.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com