CS代写 IA32- family. So she wrote her own software implementation of various funct

15-213 Introduction to Computer Systems
Final Exam May 3, 2005
Name: ID: Recitation Section:
• This is an open-book exam.

Copyright By PowCoder代写 加微信 powcoder

• Notes and calculators are permitted, but not computers. • Write your answer legibly in the space provided.
• You have 180 minutes for this exam.

1. Data Representation (15 points)
An engineer suspected an error in the floating point unit of her processor in the IA32- family. So she wrote her own software implementation of various functions on floating point numbers conforming to the IEEE standard. Below is her implementation of a func- tion fdouble, to be applied to single-precision floating point numbers, manipulated as 32-bit integers.
typedef int float_t;
float_t fdouble (float_t x) { int sign = x & (1 << 31); int exp = (x >> 23) & 0xFF; int frac = x & ((1 << 23) - 1); /* check for NAN or infinity */ if _________________________ /* return x; /* if (exp == 0) { /* /* double denormalized value */ _________________________ /* LINE 2 */ /* check if overflow into a normalized number */ if (frac >= (1 << 23)) { /* fix fractional part */ frac = _________________________ /* LINE 3 */ exp = 1; /* change exponent */ else { /* normalized */ /* double normalized value */ _________________________ /* check if infinity */ if _________________________ /* fix result */ _________________________ return sign | (exp << 23) | frac; } /* get sign bit */ /* get the biased exponent */ /* get the fractional part */ return x if NAN or infinity */ check if denormalized */ /*LINE4*/ /*LINE5*/ /*LINE6*/ 1. (12 pts) Fill in the 6 missing lines to complete the implementation. 2. (3pts)Assumefloatto_float(intx);andintto_int(floatx);letus interpret the bit pattern of an integer as a single-precision floating point number and vice versa. When the engineer tested fdouble with the following function void test(float x) { float x2 = to_float(fdouble(to_int(x))); if (2*x != x2) { printf("Not equal!\n"); exit(0); } she found some discrepancies, even though her implementation of fdouble was correct. Explain these discrepancies in a sentence or two. 2. Assembly Language (23 points) The following is the assembly (GAS) result of compiling a source program in C. pushl %ebp movl %esp, %ebp pushl %esi movl 12(%ebp), %ecx pushl %ebx xorl %ebx, %ebx movl 8(%ebp), %esi decl %ecx cmpl %ecx, %ebx jge .L7 movl (%esi,%ebx,4), %edx movl (%esi,%ecx,4), %eax movl %eax, (%esi,%ebx,4) incl %ebx movl %edx, (%esi,%ecx,4) decl %ecx cmpl %ecx, %ebx popl %ebx popl %esi popl %ebp void mystery (mystery_t A[], int int i = 0; int j = ____________________; while (____________________) { int temp = ____________________; ____________________; ____________________; ____________________; ____________________; 1. (4 pts) Show the association between program variables and registers C Variable 2. (14 pts) Fill in the gaps in the shown C source. 3. (5 pts) Which of the following type definitions are consistent with the assembly code? Circle yes or no. (a) Yes No typedef int mystery_t; (b) Yes No typedef float mystery_t; (c) Yes No typedef double mystery_t; (d) Yes No typedef struct { } *mystery_t; (e) Yes No typedef union { } mystery_t; 3. Out-of-Order Execution (21 points) Consider the following program which counts the positive elements in a linked list. void count_pos (List *p) { int i = 0; while (p) { if (p->data > 0) i++;
p = p->next; }
return i; }
On the left we show the main loop of the program in assembly language, generated with gcc -O2. The corresponding executing unit operations are given on the right.
movl 4(%eax), %ecx
testl %ecx, %ecx jle .L47
movl (%eax), %eax
testl %eax, %eax jne .L48
load 4(%eax.0) testl %ecx.1, %ecx.1 jle-not-taken cc.1 incl %edx.0
load (%eax.0)
testl %eax.1, %eax.1 jne-taken cc.2
→ %ecx.1 → cc.1
→ %eax.1 → cc.2
Consider the data dependency diagram on the next page which only shows the first iteration. It is drawn assuming the inner branch is not taken and the outer branch is taken which is justified if we know that most list elements are positive. We assume an issue time of 1 and latency of 3 for a load operation that is a cache hit. The diagram ignores any processor resource limitations.

1 2 3 4 5 6 7 8 9
testl testl jle jne
Iteration 1
1. (7 pts) Label the empty boxes with the appropriately renamed registers from the
execution unit.
2. (5 pts) What is the theoretically optimal CPE for this loop as drawn, assuming no resource limitations and perfect branch prediction?

3. (3 pts) The processor has to perform branch prediction for the two branch instruc- tions. Which would easier to predict correctly? Explain your answer.
4. (3 pts) Explain how a conditional move instruction might improve the efficiency of this code if the list does not contain predominantly positive or negative elements.

5. (3 pts) Now assume that the processor has only one load unit, but you may still assume an unbounded number of functional units. What is the theoretically optimal CPE under these assumptions. Briefly explain your answer, using a diagram if you find it helpful.

4. Cache Memory (26 points)
Assume we have a 2-way set associative 1024-byte data cache with 16 byte blocks. We assume the machine uses 32-bit addresses and memory is byte-addressable.
1. (5 pts) Determine the following cache parameters.
(a) The block offset takes bits. (b) The set index takes bits.
(c) The tag takes (d) There are
(e) There are
bits. total sets.
total cache lines.
Now Consider the following code, where N is a compile-time constant. This code sums up the first eight columns of the array.
int A[N][N];
int sum8col()
int sum = 0;
for (j = 0; j < 8; j++) for (i = 0; i < N; i++) sum += A[i][j]; return sum; } Assume we are on a machine where integers take up 4 bytes. You may also assume that sum, i, and j are held in a register, so that the only data cache accesses are to elements of the array A. Assume that the array A starts at 0x800000. We consider the cache behavior for N = 32 and N = 16, given an LRU eviction pol- icy. Give your answers in hexadecimal form. 6. (6 pts) First, consider the case where N = 32. (a) TheaddressofA[1][0]is . (b) With i = 0 and j = 0, a block will be read into the cache containing array elements A[0][0], A[0][1], A[0][2], A[0][3]. Will this block be evicted? If so, for what value of i and j will this block be first evicted? Yes, withi = [Hint: Remember that the cache is 2-way set associative, consider the order of the iterations, and keep in mind the LRU eviction policy.] 7. (6 pts) Second, we consider the case where N = 16. (a) TheaddressofA[1][0]is . (b) With i = 0 and j = 0, a block will be read into the cache containing array elements A[0][0], A[0][1], A[0][2], A[0][3]. Will this block be evicted? If so, for what value of i and j will this block be first evicted? Yes, withi = 8. (3 pts) If we changed the order of iterations to int A[N][N]; int sum8col() int sum = 0; for (i = 0; i < N; i++) for (j = 0; j < 8; j++) sum += A[i][j]; return sum; } would the answers change for N = 32? Circle 9. (3 pts) Would the answers change for N = 16? Circle 10. (3pts)Ingeneral,isthefirst(joutermost)orsecond(ioutermost)orderofiteration preferable? Explain your answer briefly. Yes or No. 5. Signals (20 points) Consider the following code, which has been written with the assumption that an unpre- dictable number of SIGINT interrupts can arrive asynchronously. void handler (int sig) { i=0; 7 int main() { 9 sigset_t s; 10 sigemptyset(&s); 11 sigaddset(&s, SIGINT); 12 signal(SIGINT, handler); 13 for(j=0;j<100;j++){ 16 printf("i = %d\n", i); 17 exit(0); Now we consider the following values for i that may be printed at the printf com- mand. 0, 1, 100, 101 For each question, indicate if and where the given calls to sigprocmask need to be in- serted in order to obtain precisely the indicated set of possible outputs among 0, 1, 100, 101. Note that any given run prints just one value, and that the program may also print other values, but we are only interested in 0, 1, 100, and 101. 1. (5 pts) All of 0, 1, 100, 101. (a) Neither needs to be inserted (leave next two questions blank). (b) Insertsigprocmask(SIG_BLOCK,&s,0);rightafterline . Insert sigprocmask(SIG_UNBLOCK, &s, 0); right after line 2. (5 pts) Just 0 and 101, but not 1, or 100. (a) Neither needs to be inserted (leave next two questions blank). (b) Insertsigprocmask(SIG_BLOCK,&s,0);rightafterline . Insert sigprocmask(SIG_UNBLOCK, &s, 0); right after line 3. (5 pts) Just 101, but not 0, 1, or 100. (a) Neither needs to be inserted (leave next two questions blank). (b) Insertsigprocmask(SIG_BLOCK,&s,0);rightafterline . Insert sigprocmask(SIG_UNBLOCK, &s, 0); right after line 4. (5 pts) Just 100 and 101, but not 0, 1. (a) Neither needs to be inserted (leave next two questions blank). (b) Insertsigprocmask(SIG_BLOCK,&s,0);rightafterline . Insert sigprocmask(SIG_UNBLOCK, &s, 0); right after line 6. Network Programming (15 points) In this problem we look at a simple client/server system for a spell-checking service. We first consider the client, then the worker thread on the server, then the main server program. For simplicity, we assume the service runs on 128.2.222.158, port 3340 and the strings being spell-checked never exceed MAXBUF in length. We forego some error checking for the sake of brevity. In each case we first present a program that you should read carefully and then answer the questions. We begin with the client program. #include "csapp.h" #define DEFAULT_PORT 3340 int main(int argc, char** argv) { int fd, num; char buf[MAXBUF]; struct sockaddr_in serveraddr; char *msg = argv[1]; if (strlen(msg) >= MAXBUF-1) exit(1);
if ((fd = socket(AF_INET, SOCK_STREAM, 0) < 0)) exit(1); serveraddr.sin_family = AF_INET; serveraddr.sin_addr.s_addr = ntohl(0x8002de9e); serveraddr.sin_port = ntohs(DEFAULT_PORT); if (connect(fd, (struct sockaddr*)&serveraddr, sizeof(serveraddr)) < 0) if ((num = write(fd, msg, strlen(msg)+1)) < 0) exit(1); if ((num = read(fd, buf, MAXBUF-1)) < 0) exit(1); buf[num] = 0; printf("%s", buf); 1. (5 pts) Circle all that apply. (a) It would be preferable to use rio readn and rio writen instead of read and write because of possible short counts. (b) ntohl and ntohs should be changed to htonl and htons, respectively. (c) The last character of the server response will be lost. (d) We need to close the socket explicitly in order to free system resources. (e) We should never read and write on the same socket. Next, the worker thread for the server. It should be designed so the main server can spawn a separate thread for each request. We assume that spell check(buf) modifies buf so that it contains only the incorrectly spelled words in buf and returns the number of characters in the resulting string. #include "csapp.h" #define DEFAULT_PORT 3340 int bytes_served = 0; void* spell_thread(void *arg) { int n; int clientfd = *(int *)arg; char buf[MAXBUF]; if ((n = rio_readn(clientfd, buf, MAXBUF)) < 0) return NULL; bytes_served += n; n = spell_check(buf); rio_writen(clientfd, buf, n); close(clientfd); return NULL; 2. (5 pts) Circle all that apply. (a) The address passed to spell_thread should not be allocated on the stack, because that could create a race condition. (b) The rio_readn should be contained in a loop rather than an if because of the possibility of short counts from a socket. (c) The thread may run detached or not. (d) There should be a mutex to protect bytes_served. (e) Thespell_checkfunctionshouldprotectbufwithamutex. Finally, the main server function. int main(int argc, char **argv) { int listenfd, clientlen, optval=1; struct sockaddr_in serveraddr, clientaddr; if ((listenfd = socket(AF_INET, SOCK_STREAM, 0)) < 0) exit(1); if (setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, (const void *)&optval, sizeof(int)) < 0) exit(1); serveraddr.sin_family = AF_INET; serveraddr.sin_addr.s_addr = htonl(INADDR_ANY); serveraddr.sin_port = htons(DEFAULT_PORT); if (bind(listenfd, (struct sockaddr*)&serveraddr, sizeof(serveraddr)) < 0) if (listen(listenfd, 5) < 0) exit(0); clientlen = sizeof(clientaddr); while (1) { int clientfd; pthread_t thread; if ((clientfd = accept(listenfd, (struct sockaddr*)&clientaddr, &clientlen)) < 0) pthread_create(&thread, NULL, spell_thread, (void *)&clientfd); pthread_detach(thread); 3. (5 pts) Circle all that apply. (a) Thecallstobindandlistenareinthewrongorder. (b) Thecalltopthread_createcreatesaracecondition. (c) The calls to htonl and htons should be replaced by ntohl and ntohs, re- spectively. (d) Weshouldcallpthread_join(thread)insteadofpthread_detach(thread) in order to avoid consuming unnecessary system resources. (e) It would make more sense to create processes with fork instead of threads with pthread_create because there are no shared resources anyway. 7. Memory Allocation (10 points) In this problem we consider the interaction between malloc/free and multiple threads. 1. (5 pts) Circle all that apply. (a) Multiple threads share the same heap, so we have to consider the possibility of race conditions for malloc. (b) When a correct program executes, at most one thread should call free on any given pointer. (c) We do not need to worry about race conditions for free because on a correct program, at most one thread will try to call free on any given pointer. (d) The internal data structures of malloc affect where best to place critical re- gions. (e) For a correct program, memory allocated in one thread must always be freed in the same thread. 2. (5 pts) Describe how you would make malloc and free thread-safe. State first what kind of basic implementation strategy (e.g., segregated free lists) is the starting point of your analysis, what kind of semaphore(s) (e.g., split binary semaphores) you would use, and where in the code you would add them. Your choices do not need to be efficient, just sound. 8. Semaphores (20 points) In this problem we develop two symmetric solutions to the Dining Philosophers problem as introduced in lecture. In this problem, there is a round table with 5 plates. There are also 5 forks positioned between the plates. Philosophers think and occasionally get hungry. When they get hungry they sit down at their place and eat, for which they need the forks on both sides of their plate. When they are sated, they go back to thinking. The following code describes a simulation of the dining philosophers for one day, assuming they eat three meals a day. sem_t frk[5]; void *philosopher(void *vargp) { int i = (int)vargp; int j = 0; while (j < 3) { printf("%d thinking\n", i); P(&frk[i]); P(&frk[(i+1)%5]); printf("%d eating\n", i); j++; V(&frk[(i+1)%5]); V(&frk[i]); int main() { int i; pthread_t tid; for (i = 0; i < 5; i++) sem_init(&frk[i], 0, 1); for (i = 0; i < 5; i++) Pthread_create(&tid, NULL, philosopher, (void *)i); Pthread_exit(NULL); 1. (3 pts) Explain the protocol followed by each philosopher in terms of thinking, eat- ing, picking up, or putting down a fork which is embodied in the code. 2. (3 pts) Explain briefly why this simulation can deadlock. 3. (7pts)Inordertoavoidthepotentialdeadlock,weadoptaprotocolwhereweallow at most 4 philosophers to sit down at the table at a time. Add a semaphore to the code to implement this new protocol. Modify the code below by adding new lines to implement this protocol. Indicate clearly where the new code should go. You should not modify existing lines of code. Remember to declare and initialize your semaphore. sem_t frk[5]; void *philosopher(void *vargp) { int i = (int)vargp; int j = 0; while (j < 3) { printf("d thinking\n", i); P(&frk[i]); P(&frk[(i+1)5]); printf("d eating\n", i); j++; V(&frk[(i+1)5]); V(&frk[i]); int main() { int i; pthread_t tid; for (i = 0; i < 5; i++) sem_init(&frk[i], 0, 1); for (i = 0; i < 5; i++) Pthread_create(&tid, NULL, philosopher, (void *)i); Pthread_exit(NULL); 4. (7 pts) Another way to avoid the deadlock is to force every philosopher to pick up the forks to his right and left simultaneously (that is, atomically). Modify the code below by adding new lines to implement this protocol. Indicate clearly where the new code should go. You should not modify existing lines of code. sem_t frk[5]; void *philosopher(void *vargp) { int i = (int)vargp; int j = 0; while (j < 3) { printf("d thinking\n", i); P(&frk[i]); P(&frk[(i+1)5]); printf("d eating\n", i); j++; V(&frk[(i+1)5]); V(&frk[i]); int main() { int i; pthread_t tid; for (i = 0; i < 5; i++) sem_init(&frk[i], 0, 1); for (i = 0; i < 5; i++) Pthread_create(&tid, NULL, philosopher, (void *)i); Pthread_exit(NULL); 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com