UCL CS 0019
Introduction
Individual Assessed Coursework 5: Implementing a UNIX Shell
Copyright By PowCoder代写 加微信 powcoder
Due date: 4 PM GMT, 21st March 2024 Value: 14% of marks for module
The design of the UNIX1 system call APIs for creating and controlling processes and plumbing together their inputs and outputs is one of the gems of modern software systems architecture. To help you become fluent in using this powerful set of mechanisms in C, in this assignment you will use fork(), exec(), and several other interesting system calls to build a command shell, or shell for short. A shell reads commands on its standard input and executes them. Modern UNIXes include multiple shells (and developers over the years have created still more shells with different syntaxes and features).
In your shell, you will implement simple commands, background commands, conditional commands (i.e., using the && and || operators), I/O redirection, and pipes. You will also imple- ment command interruption (i.e., with Ctrl-C). Your shell will implement a subset of the bash shell’s syntax, as defined at:
http://www.gnu.org/software/bash/manual/bashref.html
and will be generally compatible with bash for the features they share. You may find a tutorial on how to use the UNIX shell useful preparation for CW5; one such tutorial is available at:
http://www.linuxcommand.org/lc3_learning_the_shell.php
This coursework must be done in the official, supported 0019 Linux development environ- ment (either on your own machine in Docker or remotely over ssh to the 0019 Linux machines). Full information on how to set up Docker on your machine and how to develop remotely over ssh can be found in the CW2 handout.
Please note that CW5 does not build or run reliably on the CSRW remote desktop because of outdated software revisions on CSRW. You should not attempt CW5 using CSRW.
Only for those using Docker on ARM hardware:
gdb will not work on your cross-compiled x86-64 CW5 shell binary in Docker on ARM hardware. If you are developing in Docker on ARM hardware, and you feel you need gdb, push your code to GitHub, log into one of the x86-64 Linux ssh boxes we provide, clone your repo there, and run gdb there. Note, though, that regardless of whether you are developing on x86-64 or ARM, because a shell calls fork(), and gdb can only debug one process at a time, gdb is actually not all that great a fit for debugging a shell. So regardless of platform, you may find that debugging with printf() statements is more effective than using gdb.
As with all 0019 CWs, we caution strongly against attempting to develop your CW5 solu- tion in any development environment other than the official, supported 0019 Linux development environment. Students in the past have found that code can pass tests in a non-supported devel- opment environment, but then fail them in the official 0019 development environment (including
1Throughout this handout, by “UNIX” we include all modern variants of UNIX, including the open-source Linux variant.
on the grading server). As ever, we emphasize that the only development environment the 0019 staff will support if you have questions are the 0019 Docker Linux configuration and the 0019 ssh-accessible Linux machines, and your mark will solely be determined by the mark assigned on the grading server, regardless of what results your code might produce in some other non-0019- supported environment.
Chapter 8 of CS:APP/3e covers processes and the UNIX system calls that control them; this is vital background for this coursework. Section 8.4 in particular describes the system calls for creating and controlling processes, and Section 8.5 describes signals, which are important in handling software exceptions that arise while managing the processes your shell starts (e.g., when a process dies, when a user hits Ctrl-C on the shell’s console to interrupt a process, etc.). Chapter 10 of CS:APP/3e covers UNIX I/O system calls, such as those needed to handle redirection of input and output for processes your shell starts; Sections 10.1–10.4 describe opening, reading from, and writing to files, and Section 10.9 describes redirection of I/O. Note that CS:APP/3e doesn’t describe the pipe() system call. We will discuss pipe() in lecture (and of course the UNIX/Linux man pages for pipe() and all other system calls are an invaluable reference).
The shell functionality you will implement in CW5 includes: • execution of simple commands;
• execution of background commands;
• execution of command lists;
• execution of conditional statements; • execution of command pipelines;
• reaping of zombie processes;
• execution of I/O redirection;
• support for interrupting commands from the console;
• support for the cd command (to change directory).
You will complete the above tasks in nine stages described in detail below. We provide tests with the initial code for CW5 for all the above functionality. Details on how grades are computed from tests passed and failed appear later in this handout.
You will also find considerable guidance on how to go about implementing the functionality for each stage of the coursework in this document. Read this handout in its entirety carefully before you begin!
As ever, it is important to get started early. You will need time to work your way through implementing all nine stages of the coursework, including time for debugging. You will almost certainly need the entire two weeks allotted to complete CW5.
Getting Started
Before you follow the instructions below to retrieve the code for CW5, you MUST first complete the 0019 grading server registration process. You only need do so once for the entire term, and you probably did so before beginning work on CW1, CW2, CW3, and/or CW4, in which case you need not register again.
If, however, you did none of CW1–CW4, and have not yet registered with the 0019 grading server, STOP NOW, find the email you received with the subject line “your 0019 grading server token,” retrieve the instructions document at the link in that email, follow those instructions, and only thereafter proceed with the instructions below for retrieving the code for CW5.
We will use GitHub for CW5 in much the same manner as for CW2 through CW4. To obtain a copy of the initial code for CW5, please visit the following URL:
https://classroom.github.com/a/nrupwZPS
If you’d like a refresher on using git with your own local repository and syncing it to GitHub, please refer to the CW2 handout.
Each time you push updated code to your GitHub repository for CW5, our automatic grading server will pull a copy of your code, run our automated tests on your code, and place a grade report in a file grade_report.md in your GitHub repository. Your mark on CW5 will be that produced by the automated tests run by our automatic grading server on the latest commit (update) you make to your GitHub repository before the CW5 deadline.2 More on these tests below.
Parsing the Shell’s Grammar
There are two fairly distinct parts to implementing a shell: parsing the command input and then executing the appropriate commands once they’ve been parsed. Our emphasis in this class is on the UNIX system call interface and process control, not parsing. So we’ve provided you much of the code to parse command input (though not quite all).
The parse shell token() function we provide returns the next “token” from the com- mand line, and it differentiates between normal words like “echo” and special control operators like “;” or “>”. But in the code we hand out to you initially, eval line() treats every token like a normal command word, so “echo foo ; echo bar | wc” would simply print “foo ; echo bar | wc”! Real shells allow users to build up interesting commands from collec- tions of simpler commands, connected by control operators like && and |. Part of your task is to complete the parsing phase. You can complete it all at once, but you don’t need to; see below for staging hints.
sh0019 command lines follow this grammar. Each command is a “commandline” defined as follows:
2The only exception is if the 0019 staff determine that your code doesn’t implement the functionality required in the CW5 handout.
All your code for your solution to CW5 must go in the sh0019.c and sh0019.h files. The grading server will only consider code in these two files (and will use the baseline versions of all other files in the code we initially provide you).
commandline ::= list
| list “;”
| list “&”
list ::= conditional
| list “;” conditional
| list “&” conditional
conditional ::= pipeline
| conditional “&&” pipeline
| conditional “||” pipeline
pipeline ::= command
| pipeline “|” command
command ::= [word or redirection]…
redirection ::= redirectionop filename
redirectionop ::= “<" | ">” | “2>”
This grammar says, for example, that the command “echo foo && echo bar & echo baz” is parsed and executed as follows:
{ { echo foo } && { echo bar } } & { echo baz }
Thatis,the&&is“inside”thebackgroundcommand,so“echo foo && echo bar”runs in the background and “echo baz” runs in the foreground.
A robust shell will detect errors in its input and handle them gracefully, but all the inputs we give your shell in the CW5 tests follow the grammar above.
Executing Commands
The bulk of CW5 is actually implementing the shell—i.e., causing the user’s commands to execute after they’ve been parsed.
If you’re confused about a shell feature and tutorials and man pages don’t help, experiment! Tinkering with tools to understand their behavior is a hallmark of the Systems approach. The bash shell (which is the default on Linux on the 0019 ssh machines and in the 0019 Docker environment; and on Mac OS) is compatible with your shell. You may find the following com- mands particularly useful for testing; find out what they do by reading their man pages and feel free to be creative in how you combine them:
• echo (print arguments to standard output) • true (exit with status 0)
• false (exit with status 1)
• sleep N (wait for “N” seconds then exit) • sort (sort the standard input’s lines)
• wc (count the standard input’s lines)
• cat (print one or more files specified as arguments to standard output)
Run your shell by typing ./sh0019 and entering commands at the prompt. Exit your shell by typing Ctrl-D at the prompt or by going to another window and entering the command killall sh0019.3
The Nine Stages of the Shell
We describe below the nine implementation stages you must complete in CW5: what you need to implement in each, and hints on how to do so.
Stage 1: Simple Commands
Implement support for simple commands like “echo foo” by changing start command() and run list(). You’ll need to use the following system calls: fork(), execvp(), and waitpid(). Consult the man pages for details on how to call each of them and what they do. Also read the function definitions in sh0019.h.
You may also need to exit a forked copy of the shell (for example, if execvp() fails). To exit a child process, call the exit() function. For instance, call exit(1) or, equivalently,
exit(EXIT FAILURE) to exit with status 1.
Stage 2: Background Commands
Next, implement support to run processes in the background, such as sleep 1 &. A back- ground process allows the shell to continue accepting new commands without waiting for the prior command to complete. Implementing background commands will require changes to eval line() (to detect control operators) and run list(), as well as, most likely, to struct command.
Stage 3: Command Lists
Implement support for command lists, which are chains of commands linked by ; and &. The semicolon runs two commands in sequence—the first command must complete before the second begins. The ampersand runs two commands in parallel by running the first command in the background. These operators have equal precedence and associate to the left.
This stage will require changes to run list() and struct command at a minimum.
3In Docker, you can open another terminal in your Docker container by opening the Docker Desktop control panel, clicking on Containers, clicking on your running container, and then clicking on “Terminal.” If you are developing over ssh, just open a second local terminal window and ssh in that window to the same ssh host where your shell is running.
We suggest you implement the features of your shell in the order of the numbered stages below.
Your CW5 code should never call the normal exit() function. exit() performs cleanup actions, including changing the configuration of open stdio files, that shouldn’t happen in child processes (they should only happen in the parent shell). If you call exit() instead of
exit() from child processes, you may see weird behavior where, for example, the parent shell re-executes parts of its command input.
Hint:Howmuchdoyouneedtochangestruct commandtohandlethefullshellgrammar above? All that’s really required is a simple linked list of struct commands. This is possi- ble because the shell’s execution strategy for commands works sequentially, from left to right (with an exception for pipelines, as you’ll see in a later stage). You may be tempted to cre- atewhat’scalledanexpressiontreewithseparatestruct command,struct pipeline, struct conditional, and struct command list types. If that really helps you rea- son about your design, go for it, but it may be significantly more effort than it’s worth.
Stage 4: Conditionals
Implement support for conditionals, which are chains of commands linked by && and/or ||. These operators run two commands, but the second command is run conditionally, based on the status of the first command. For example:
$ true ; echo print
$ false ; echo print
$ true && echo print
$ false && echo print
$ true || echo print
$ false || echo print
# The second command always runs, because ’;’ is an
# unconditional control operator.
# With &&, though, the 2nd command runs ONLY if
# the first command exits with status 0.
# (prints nothing)
# With ||, the 2nd command runs ONLY if the first
# command DOES NOT exit with status 0.
# (prints nothing)
The && and || operators have higher precedence than ; and &, so a command list can contain many conditionals. && and || have the same precedence and they associate to the left. The exit status of a conditional is taken from the last command executed in that conditional. For example,true || falsehasstatus0(theexitstatusoftrue)andtrue && falsehasexit status 1 (the exit status of false).
Explore how conditionals work in the background. For instance, try this command:
$ sleep 10 && echo foo & echo bar
To support conditionals, you’ll probably find you need to make changes to run list(), eval line(), and struct command. You’ll also use the WIFEXITED and WEXITSTATUS macros defined in man waitpid.
Stage 5: Pipelines
Implement support for pipelines, which are chains of commands linked by |. The pipe operator | runs two commands in parallel, connecting the standard output of the left command to the standard input of the right command.
The | operator has higher precedence than && and ||, so a conditional can contain several pipelines. Unlike conditionals and lists, the commands in the pipeline run in parallel. The shell
starts all the pipeline’s commands, but only waits for the last command in the pipeline to finish. The exit status of the pipeline is taken from that last command.
To support pipelines, you’ll need to use some further system calls, namely pipe(), dup2(), and close(), and you’ll need to make changes to start command(), run list(), and struct command.
Stage 6: Reaping Zombie Processes
Your shell should eventually reap all its zombie processes using waitpid().
Stage 7: I/O Redirection
Implement support for I/O redirection, where some of a command’s file descriptors are read from (for input file descriptors) and/or written to (for output file descriptors) disk files. You must handle three kinds of redirection:
• < filename: The command’s standard input is taken from filename. • > filename: The command’s standard output is sent to filename.
• 2> filename: The command’s standard error is sent to filename.
For instance, echo foo > x writes foo into the file named x.
The parse shell token() function returns redirection operators as type TOKEN REDIRECTION. You’llneedtochangeevalline()todetectredirectionsandstoretheminstruct command.
Each redirection operator must be followed by a filename (a TOKEN NORMAL token). You’ll also change run command() to set up the redirections, using system calls open(), dup2(), and close().
The shell sets up a command’s redirections before executing the command. If a redirection fails (because the file can’t be opened), the shell doesn’t actually run the command. Instead, the child process that would normally have run the command prints an error message to standard error and exits with status 1. Your shell should behave this way, too. For example:
$ echo > /tmp/directorydoesnotexist/foo
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo && echo print
/tmp/directorydoesnotexist/foo: No such file or directory
$ echo > /tmp/directorydoesnotexist/foo || echo print
/tmp/directorydoesnotexist/foo: No such file or directory
Howtofigureouttherighterrormessagetodisplayonstandarderror?Tryman strerror.
Hint: You must reap all zombies eventually, but you need not to reap them immediately. We don’t recommend using signal handlers to reap zombies, since a signal handler can interfere with the waitpid() calls used to wait for foreground processes to complete. A well-placed waitpid() loop will suffice to reap zombies. Where should it go?
Hint: Your calls to open() will have different arguments depending on what type of redi- rection is used. How to figure out what those arguments should be? You can use the man page or you can simply use the strace command to check the regular shell’s behavior. For example, try this:
$ strace -o strace.txt -f sh -c “echo foo > output.txt”
The strace output is placed in file strace.txt. Examine that file’s contents. Which flags were provided to open() for output.txt? You can repeat this experiment with different redirection types.
Stage 8: Interruption
Implement support for interruption: pressing Ctrl-C in the shell should kill the currently running command line, if there is one.
Handling Ctrl-C is an initial step into job control, which encompasses UNIX’s functionality to help users interact with sets of related processes. Job control can be a complicated affair involving process groups, controlling terminals, and signals. Luckily, Ctrl-C is not too hard to handle on its own. You will need to take the following steps:
• All processes in each pipeline must have the same process group (see below).
• Your shell should use the claim foreground() function that we provide to inform the
OS about the currently active foreground pipeline.
• If the user presses Ctrl-C while the shell is executing a foreground pipeline, every process
in that pipeline must receive the SIGINT signal. This will kill them.
What are process groups? Job control is designed to create a common-sense mapping between operating system processes and command-line commands. This gets interesting because processes spawn new helper processes. If a user kills a command with Ctrl-C, the helper processes should also die. UNIX’s solution uses process groups, where a process group is a set of processes. The Ctrl-C key kills all members of the current foreground process group, and not just the current foreground process.
Each process is a member of exactly one process group. This group is initially inherited from the process’s parent, but the setpgid() system call can change it:
• setpgid(pid, pgid) sets process pid’s process group to pgid. Process groups use the same ID space as process
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com