COMP 3000B: Operating Systems
Carleton University Winter 2020 Final Exam Solutions
April 13, 2020 (7-9 PM)
There are 29 questions on 3 pages worth a total of 60 points. Answer all questions in the supplied text file template (available on cuLearn, titled comp3000-final-template.txt). Please do not corrupt the template as we will use scripts to divide up questions amongst graders. Your uploaded file should be titled comp3000-final-username.txt, replacing username with your MyCarletonOne username. Make sure you submit a text file and not another format (such as MS Word or PDF). Use a code editor, not a word processor! (You won’t be graded for spelling or grammar, just try to make it clear.)
This test is open book. You may use your notes, course materials, the course textbook, and other online resources. If you use any outside sources during the exam, you must cite the sources. Your citation may be informal but should be unambiguous and specific (i.e., if you refered to the textbook, indicate what chapter and page you looked at rather than just citing the textbook). You may not collaborate with any others on this exam. This exam should represent your own work.
All explanations should be concise and to the point (generally no more than a few sentences, sometimes much less). If you find a question is ambiguous, explain your interpretation and answer the question accordingly.
You have 120 minutes. Good luck!
1. [2] When you interact with bash (via ssh or a graphical terminal) what file does bash read from in order to obtain user input? Is this file a regular file? Explain briefly.
A: bash reads from a numbered file /dev/pts (such as /dev/pts/0) in order to obtain user input. This file is a character device, it is not a regular file.
2. [1] When you type the command ls > ls.log at a bash shell prompt, what process opens the file ls.log for writing, bash, ls, or another?
A: bash
3. [2] When you type ls -la at a bash prompt, what system call does bash use to receive user input? What system call does bash use to pass the -la argument to ls?
A: read, execve
4. [3] What system calls do the following C library functions make (on Ubuntu 18.04)? Note they may generate none, one, or multiple system calls. (a) fgets, (b) ioctl, (c) snprintf.
A: a) read, b) ioctl, c) none
5. [2] Process A creates a child process B to open and write data to a file (and then terminate). What is the standard UNIX mechanism that allows B to inform A that the write to the file failed? Explain, indicating how the error message would be sent and received.
A: B calls exit with a return code indicating failure. A calls wait to get B’s exit code.
6. [2] If you decided you didn’t want to run any pre-installed binaries and instead just wanted to run your own versions when you type in commands at a shell prompt, how could you do this? Would this change prevent other users of the system from using system binaries? Explain briefly.
A: You would change your PATH environment variable (by changing your shell configuration
files, e.g., .bashrc, .bash profile, .profile) to only list your directories, excluding /bin, /usr/bin, etc. This change wouldn’t affect any other users on the system.
7. [2] If a sleeping process receives a signal, will the signal handler run immediately or will it run after the sleep finishes? Explain briefly, giving evidence for your answer.
A: The signal handler will run immediately, and then once the signal finishes the sleep system call will return without having finished the sleep. If the restart flag is specified, the sleep system call will be automatically resumed where it left off. We could see this in 3000shell when we removed the SA RESTART flag, 3000shell would terminate immediately when receiving a signal (e.g., SIGCHLD).
8. [2]SIGPIPEissenttoprocesstoindicateabrokenpipe,i.e.,awritetoapipethathasnoreaders(but did previously). Alice, upon learning about SIGPIPE, says this is stupid, because the write would just return an error. Bob replies that SIGPIPE is useful just like SIGCHLD is. Is Alice right or is Bob? Explain how the signals are similar and a situation when SIGPIPE would be useful.
A: The key benefit of SIGPIPE is informing a process that a pipe is broken even if it isn’t in the middle of doing a write. In a producer/consumer situation using a pipe, the producer could be spending a considerable amount of time producing, assuming that the consumer is waiting; with SIGPIPE it will be immediately informed that something has gone wrong with the consumer and can take corrective action (or just terminate). This is similar to the situation with SIGCHLD, in that SIGCHLD tells the parent immediately that its child has terminated, while a call to wait would result in a significant delay in getting this information to the parent.
9. [2] If a process has a uid=1000, euid=1000, gid=1021, and egid=1021, what files can it read on the system? Why?
A: The process can read files that are owned by uid 1000 with the owner read bit set, files with a group 1021 and with the group read bit set, and any files on the system not owned by 1021 and not with a group 1021 that have the other read bit set. Such regular files must also be accessible via a directory path where each directory is both readable and executable by uid 1000, gid 1021, or by others. This is because UNIX file permission checks involve checking whether the process owner has access, then the process group, and finally whether it has other access, with regular files requiring read access and directories requiring read access to list entries and executable access to get inodes associated with directory entries.
10. [2] With ssh, what is the purpose of the id rsa file? What about id rsa.pub file? What do they contain, and how are they used?
A: The id rsa file contain’s a user’s private key, and id rsa.pub file contain’s a user’s pub- lic key. They are used to authenticate a user to a remote system by copying the contents of id rsa.pub to the remote system’s authorized keys file.
11. [2] In a filesystem, can two files share the same inode? Explain briefly.
A: Two filenames can both be hard links to the same inode. In this case, they are really two names for the same file rather than being two distinct files, as inodes represent the contents of files. Sharing an inode means they are essentially the same.
12. [2] Can you easily recover from erasing the primary and all backup superblocks of an ext4 filesys- tem? Explain why or why not.
A: You cannot easily recover from erasing the primary and backup superblocks of a filesystem, as these are essential to mounting a filesystem. If these are erased, the filesystem is effectively
2
erased, and the only way to recover would be to reconstruct the superblock (say by guessing at the key parameters contained in it for the filesystem) or by doing a forensic recovery of file contents, both of which are non-trivial tasks.
13. [1]Ifyouwantthestandardoutputofoneprogramtobefedtothestandardinputofanotherprogram directly (without storing any data on disk), how could you do this without using the | operator? Explain with a short example.
A: You would create a named pipe, and have one program use the pipe for standard out and the other for standard in. So instead of ls | wc, you would do myfifo mypipe; (ls > mypipe &); wc < mypipe
14. [2] Below is an implementaiton of sem wait(). Does this version cause the process to sleep while waiting for the lock to be freed? Do you expect this implementation to work reliably in practice? Explain briefly.
void sem_wait(int *lock)
{
}
while (*lock == 0) {
/* wait */
}
*lock = 0;
A: This version does not cause the process to sleep if the lock is currently taken; instead, it busy waits (loops, checking on each iteration whether the lock is available). This implementa- tion will not work reliably in practice as the check and assignment are separate instructions, and thus another process could change the state of lock between them. A correct implemen- tation would have to make use of special assembly language instructions (or special language mechanisms that would produce such instructions) that allow for the checking of a value and changing it both as one atomic instruction.
15. [2] From an API perspective, which is simpler to create, a thread or a process? Which do you think requires more work on the kernel’s part? Explain briefly.
A: A process is simpler to create from an API perspective than a thread, as a process can be created using fork, which takes no arguments, while threads are created using pthread create, which takes four arguments. (At the level of system calls they are equivalent as they both result in a clone system call.) Process creation takes more work on the part of the kernel as the address space needs to be (logically) copied while with threads the address space is shared.
16. [2] How could you add support for lseek operations to a character device module? Specifically, what function(s) would you add, and how could you make sure those functions were called at the right time?
A: You would add an llseek function to the module (say mymodule llseek) with the same argu- ments as lseek, but with the fd being replaced with a struct FILE *. To make sure it was called, you would add an llseek entry, referring to mymodule llseek, to the struct file operations used in the module to define available file operations for the device.
17. [2] What is the difference between a process’s uid and euid? Specifically, what does the kernel use each for?
3
A: The kernel uses the uid to determine process ownership (who can send STOP and KILL signals to control the process), while euid is used to determine what resources a process can access (mainly file access, comparing with a file’s owner uid).
18. [2] Does the kernel know the names of the groups a user belongs to? How do you know?
A: The kernel doesn’t know the groups that a user belongs to, not directly. It only knows the gid’s of processes and files. To change the currently active group, a user has to run a program such as newgrp which is setuid root (thus allowing it to escape out of the kernel’s normal security restrictions to allow the changing of a process’s gid).
19. [2] When a process mmap’s a file, can it (mostly) control where the file will be placed in virtual memory? What about physical memory?
A: A process can mostly control where a file is mmap’d into virtual memory, as it has control over its own virtual address space. A process has almost no control over where a process is mapped into physical memory, however, as that is managed by the kernel in a way that is mostly invisible to processes.
20. [2] Can two processes have data at the same virtual address? What about at the same physical ad- dress? Explain.
A: Two processes can have data at the same virtual address, but they will be completely dis- tinct, i.e., data at address 2000 in process A is completely different from data at address 2000 for process B. Indeed, this difference is the whole purpose of separate virtual address spaces. The only exception to this is if two processes have an area of shared memory, but then that shared memory would not normally have the same virtual addresses in both processes.
Two processes can share data at the same physical address by mmap’ing the same file or by creating a shared memory segment. The virtual addresses may be different but the underlying pages in RAM (and thus the underlying physical addresses) would be shared.
21. [3] For each of the following, state and explain whether they support lseek operations always, some- times, or never: regular files, pipes, character devices
A: Regular files always support lseek operations, pipes never do, and character devices some- times do, if they implement lseek support by defining the right entry in the corresponding module’s file operations struct. (In practice character devices almost never implement lseek.)
22. [2]Ateachlevelofthepagetable,welookupanentrywhichcontains(theupperbitsof)apointerto another page, until the last one points to the desired data page. Do you think these pointers contain virtual or physical addresses? Why?
A: These pointers contain physical addresses, because otherwise we’d be using virtual ad- dresses to resolve virtual addresses, and that could lead to a never-ending series of virtual to physical address lookups to figure out one virtual to physical address mapping. (Parts of the page table can be swapped to disk, but for the CPU to use them to map virtual to physical addresses, the necessary parts of the page table must be in RAM.)
23. [2] Is the size of a page consistent on all platforms that Linux runs on, or does it vary between platforms? How do you know?
A: The page size is not consistent because PAGE SIZE varies between different architectures. We can see this in the Linux source code by seeing all the architecture-dependent definitions of PAGE SHIFT and PAGE SIZE (defined in terms of PAGE SHIFT).
4
24. [3]Fillinthemissingpartsinthetablebelowrelatingtofilepermissionsinoctalandsymbolicform.
int main(int argc, char *argv[])
{
}
Octal 0666 (b) 0674 0775 (d) 0544 0511 (f) 0222
Symbolic
(a) rw-rw-rw-
rw-rwxr–
(c) rwxrwxr-x
r-xr–r–
(e) r-x--x--x
-w–w–w-
25. [2] Consider the following code:
#include
printf(“Hello! The number is %d!\n”, NUMBER);
(a) How can you compile this so, when run, it outputs “Hello! The number is 42!” (rather than reporting a compilation error)?
A: You just have to add a -DNUMBER=42 option to the compilation command line.
(b) What is one situation in which the technique you used in a) is useful (beyond those shown in class tutorials)?
A: It is useful for changing constants in code at compile time, say to define system-specific parameters, or more generally to define constants that are used for conditional compila- tion (e.g., #ifdef’s).
26. [2] Python scripts importing from the bcc library can monitor all system calls on a system and arbi- trary function calls in any process and in the kernel. Can regular python scripts or other userspace programs do this? Why or why not? Explain.
A: Regular python scripts, or indeed any non-BPF using userspace program, cannot do this kind of extensive monitoring. Processes are isolated from each other, and normally the only way to go beyond this is using ptrace, which allows one process to monitor exactly one other process. Running as root doesn’t give you additional access without BPF. To monitor arbi- trary kernelspace and userspace events, you need to use BPF. (Yes, there are traditional ways to get low-level access such as /dev/mem; however, even with these interfaces you would only get a static snapshot of the system. It wouldn’t let you watch everything that is going on. The closest traditional mechanism would be system audit logs, but those don’t give nearly the level of access as BPF.)
27. [2] Give an example of how an eBPF program can access a field in a task struct. What task struct is being accessed?
A: There are lots of examples in bpf code, as it is very common to get info from the currently running task’s task struct (i.e., current) in bpf programs. But in bpfprogram.c, the one exam- ple of this was bpf get current pid tgid() which gets the user-visible PID of the current task (i.e., the one that made the system call that is currently being processed).
5
28. [3] How often is the filter() function called in (the original version of) bpfprogram.c when tracing system calls? It it called for every system call made on the system, every system call made by the specified process, or every time a process running 3000shell makes a system call? How can you tell this from the code? What experiment(s) did you do to verify your interpretation of the code? A: It runs for every system call made on the system. You can see this from the code because in bpfprogram.c, line 69, a tracepoint probe of every system call exit is defined. If you delete lines 72 and 73 (so the code doesn’t call filter()) we get system calls from every process on the system. (Try it, it produces lots of output very quickly!)
29. [2] What is a way to load code into the kernel without it being verified for safety? What limits are placed on such code, relative to other kernel code?
A: You can load code using a kernel module (i.e., using insmod). No limits are placed on such code relative to other kernel code, kernel modules can do basically anything.
6