CS计算机代考程序代写 Hive Excel compiler data structure ocaml algorithm # Midterm Coding Project: *Guess Who?*

# Midterm Coding Project: *Guess Who?*

*CSci 2041: Advanced Programming Principles, Summer 2021*

**Due:** Saturday, July 10 at 11:59pm

In the `exam2041` repository, you will find files named:

* `chr.ml` (mostly empty)
* `chr.mli`
* `boolean.ml` (mostly empty)
* `boolean.mli`
* `parsebool.ml`
* `guesswho.ml`
* and several example input files: `smallchars.csv`, `nochars.csv`, `nonames.csv`, `dupcols.csv`, `badrows.csv`, `ftest.csv`, `monsters.csv`

Create a directory named `exam` in your personal repository, and copy all of these files
to your `exam` directory. While you work on the exam, the Gitbot will help with some basic
compiler and functional teting, even though you’ll ultimately be submitting your solution to
gradescope. Don’t forget to run `git add` on both the directory and the files!

**Reminder:** This project is a summative activity. Therefore, unlike lab exercises and reading quizzes, you should only submit code that is the result of your own work and should not discuss your solutions with anyone but the course staff.

However, you *may* ask clarifying questions about these instructions on `#exam-questions` in the course discord server. **No code should be posted in this channel.**

## Background: *Guess Who?*

*Guess Who?* is a two-player children’s game involving a grid of characters, each with a name and a (cartoon) portrait. Each player has a copy of this grid. At the start of the game, each player chooses a character that they will “be” for the game. Then players take turns asking each other “yes/no” questions about their character. (For example, “Does your character have brown eyes?”) The players use the answers to these questions to eliminate possible characters, with the goal of being the first to “guess who” the other player has chosen. In some versions of the game, there are multiple character grids to choose from with different themes (e.g. aliens, monsters, kids, disney characters, superheroes…). If you’re curious, [wikipedia article](https://en.wikipedia.org/wiki/Guess_Who%3F) describes more about the game and links to pictures of the various versions of the game.

In this exam, you’ll complete a text-based implementation of this game. When you complete the required modules implementations, the “guesswho.ml” and “parsebool.ml” modules will drive a game between a user and the computer:

* The computer asks the user to select a character definition file, and loads the definitions from the file. (The character definitions list all of the “attributes” of each character that players can ask about and the name of each charcter.)

* The computer then displays the list of characters and asks the player to choose one; the computer also randomly chooses a player.

* Then the computer and user take turns: On the user’s turn, the computer displays all of the characters the user hasn’t ruled out, and asks the player either to “Guess who” the computer is (if they guess right, they win; if wrong, they lose) or to Ask a question. If the player chooses to ask a question, they enter it as a boolean formula: possible `attribute:value` pairs, combined with parentheses, and the logical operations and (`&&`), or (`||`), and not (`!`). The computer checks whether their character makes the formula true, tells the user, and then eliminates all characters with the opposite answer for the player.

* Then the computer tells the player what question it is asking, and answers the question itself, narrowing down its list of suspects. If the computer has eliminated all but one suspect, it informs the user that it has guessed their character, and wins.

* Otherwise, play resumes with the user’s turn.

You can [watch a video](https://canvas.umn.edu/files/21755508/download?download_frd=1) of Dr. Nick playing against his implementation to see the vintage 80s text-based game play.

### Overview of modules

The final implementation will consist of four modules: `chr.ml` implements everything to do with representing characters in the game; `boolean.ml` implements everything having to do with the boolean formulas that represent questions about the players; `parsebool.ml` implements an algorithm that *parses* text strings like “`EYES:compound && !(LIMBS:4 || LIMBS:6)`” into a useable boolean formula; and `guesswho.ml` “drives” the game by calling the appropriate functions from the other modules and getting user input. Let’s go into a little more detail about each module:

#### `Chr` Module

The `Chr` module, which you will define in `chr.ml`, must implement the `chr.mli` interface: it will define data types to represent character attribute names, attribute values, and characters, along with functions that other modules can use to access these properties. These data types are kept abstract in the interface file so that no matter what representation you choose, the other portions of the program should work correctly.

The `Chr` module will also provide functions for reading character definition files, which are stored in “Comma-Separated Value” (csv) format, a standard portable way to represent tables of information (which we’ll explain below), and for printing a list of characters, which are called from `guesswho.ml`. It will be up to you to impose some requirements on character definition files to make life easier for the rest of the game: all files must include “Name” as the first column; no attributes can be empty; and every character must have a distinct name.

The `Chr` module doesn’t depend on any other files, so it should be able to compile cleanly with the compiler command `ocamlc -c chr.mli chr.ml`.

#### `Boolean` module

The `Boolean` module, which you will define in `boolean.ml`, must implement the `boolean.mli` interface: it will define a data type to represent boolean formulas about characters. This will include the data type itself, as well as functions to construct elements of the type (that `parsebool.ml` will call), and to test whether a character makes the formula true (by having the combination of attributes specified in the formula; for instance having compound eyes and eight limbs). You’ll also implement a function to print out a boolean formula in a format that could be read by the parser, and a function to produce a boolean formula that narrows down a list of possible characters.

The `Boolean` module will depend on the `Chr` module to test if a character makes a formula true, so if you want to test it in utop, you’ll need to `#mod_use “chr.ml”` first so that the `chr` definitions are included in a module. To compile it, you’ll need to reference the `Chr` interface, so you’ll need to use the compiler command `ocamlc -c chr.mli boolean.mli boolean.ml`.

#### `Parsebool` module

The file `parsebool.ml` relies on your implementation of boolean formulas in `boolean.ml` to *parse* or translate a string representations of a formula into your ocaml data type representing it. You don’t need to do anything with this file, although you might find it interesting to read; we’ll also use it to test your `Boolean` module’s string translation function. There are a few things you might want to know for when you’re testing out your implementation:

* The parser assumes that all attribute names can be represented by legal Ocaml “identifier” rules, meaning they start with a letter, and contain only alphanumeric characters and underscores.

* The parser assumes that all attribute values are either legal ocaml identifiers, strings enclosed in “double quotes”, or integers.

* Passing a string that includes characters other than `||`, `&&`, `!`, `(`, `)`, or `attributename : attributevalue` to the parser will cause a lexical error, and passing a string that is not syntactically correct (e.g. “`|| ||`” or “`)`”) will cause a ParseError to be raised.

If you want to test `parsebool.ml` and its interaction with your `boolean.ml`, you will need to `#mod_use “chr.ml”` then `#mod_use “boolean.ml”` in utop. If you’ve correctly satisfied the `boolean.mli` and `chr.mli` interfaces, then parsebool should compile with the command `ocamlc -c chr.mli chr.ml boolean.mli boolean.ml parsebool.ml`.

#### `Guesswho` module

The file `guesswho.ml` connects the pieces from the various other modules to play the game. It contains code to get the user’s choice of character definitions, ask the user to select a character, call `Parsebool.parse_bool_exp` on the player’s questions, evaluate the question against both the computer’s character and the list of remaining characters, call `Boolean.split` to devise a question for the computer player, tell the user about the question and answer, and conclude the game. If you have a working implementation of `chr.ml` and `boolean.ml`, you can compile `guesswho.ml` using the compiler command `ocamlc -o guesswho chr.mli chr.ml boolean.mli boolean.ml parsebool.ml unix.cma guesswho.ml`. (The `Unix` module is needed to add “pauses” between phases of the game to make it easier for the user to follow.) Once you’ve compiled it, you can run the program by typing `./guesswho` at the terminal prompt.

Because `guesswho.ml` uses a lot of console IO, it’s best not to try loading it into `utop` with `#use`. You can still test little functions you write as part of the changes you’ll be making to `guesswho.ml` by copy/pasting individual function definitions into utop, and if you do that, you’ll want to make sure you have already executed `#mod_use “chr.ml”`, `#mod_use “boolean.ml”` and `#mod_use “parsebool.ml”` to load those modules into utop.

## Your mission

OK, so now that we’ve seen what all of the parts of the program are supposed to do, let’s get started filling in the parts that aren’t there yet.

### Part 1 (10 points): Data types for attribute names (keys), and attribute values (values)

In the file `chr.ml`, add:

* A type declaration for the type `k` that will represent the names of character attributes, things like name, eye_color, glasses, limbs, and so on. You can choose any type you like that has a reasonable conversion between strings and your internal representation, but you should explain the reasoning for your choice in a comment above the type declaration.

* Implementations of the functions `key_of : string -> k` and `string_of_key : k -> string`. These should have the property that for any string `s`, if we convert it to a key `k` using `let k = key_of s`, and then convert `k` to a string and back using `let k2 = key_of (string_of_key k)`, then `k` and `k2` are the same key. (So converting a string to a key might result in a new string once, but after that we should get the same key and string every time.) Additionally, `key_of` should be case-insensitive, so that (for example) `key_of “KEY” = key_of “keY”` and strings that are not the same after case conversion should result in different keys, so that (for example) `key_of “KEY” <> key_of “HEY”`.

* Add a declaration of the key value `name` that will always represent the name of a character. In order for the example character definition files to work, we require that calling `key_of “Name”` always produces this value.

* A type declaration for the type `v` that will represent the values of character attributes, things like Bob, Blue, Yes, 6, and so on. You can choose any type you like that has a reasonable conversion between strings and your internal representation, but you should explain the reasoning for your choice in a comment above the type declaration.

* Implementations of the functions `val_of : string -> v` and `string_of_val : v -> string`. These should have the property that for any string `s`, if we convert it to a value `v` using `let v = val_of s`, and then convert `v` to a string and back using `let v2 = val_of (string_of_val v)`, then `v` and `v2` are the same value. (So converting a string to a value might result in a new string once, but after that we should get the same value and string every time we do the conversion) As with the `key_of` function, `val_of` should also be case-insensitive (so `val_of “eXaMPLe” = val_of “ExAmplE”`) but should return different values for strings that are different after case conversion, so that `val_of “explore” <> val_of “explode”`.

Once you’ve finished this, we’re ready to start defining the data type and operations for characters.

### Part 2 (15 points): Data type for characters

Continuing to work in `chr.ml`, add

* A type declaration for the type `t` that will represent characters, which we can use to lookup the values of different keys (including, always, the key `name`) for each character. You can use any implementation you like: maybe an associative list, or a list with indices assigned to different keys, a hash table or map object constructed from the Map.Make functor, as long as it supports the operations below, and is justified in a comment above the declaration.

* The function `get : t -> k -> v` that given a character and a key, returns the value associated with that key. `get` should raise the `Not_found` exception if the given key is not defined for a character.

* The function `key_fold : (‘a -> k -> ‘a) -> ‘a -> t -> ‘a ` is meant to allow other parts of the program to “fold” all of the keys defined in a character. So `key_fold (fun acc key -> key::acc) [] t_rex` would evaluate to a list of all of the keys defined in `t_rex` (assuming t_rex is a value of type `Chr.t`).

* The function `kv_fold : (‘a -> (k*v) -> ‘a) -> ‘a -> t -> ‘a` is meant to allow other parts of the program to “fold” all of the key, value pairs defined in a character, so `kv_fold (fun acc (key,vl) -> (key,vl)::acc) [] t_rex` would evaluate to a list of all key, value pairs defined in `t_rex`. For any character `c : t`, The lists `List.map (fun key -> (key, get c key)) (key_fold (fun acc key -> key::acc) [] c)` produced by listing the keys in a character and then getting the values for each key, and `key_fold (fun acc (key,vl) -> (key,vl)::acc) [] c` should contain the same elements (Although depending on your implementation, they could conceivably be in a different order.)

* The function `eq : t -> t -> bool` should return true when applied to `c1` and `c2` if and only if `c1` and `c2` have identical sets of keys and the same values for all keys.

Note: although you can test some of these out yourself, the gitbot testing code won’t be able to do much without working implementations for part 3, because code outside of the `Chr` module won’t have any way to create values of type `t` without loading them from a file.

### Part 3 (20 points): Character definition input and output

OK, so now we need to finish `chr.ml` by filling in two very important functions, `read_from_file` and `print_list`.

`read_from_file` should expect a string as input. It should then use that string as a filename, and read a list of character definitions from a comma-separated values (CSV) file.

> *Wait a minute, what is a CSV file?*
> Well, I’m glad you asked. A CSV file is a standard way to encode a table of values in a text file that can be read by lots of different programs. (For example, Excel and Google sheets can read and write to csv files, but many programming languages also have libraries to read and write them.)
> The basic format is that we turn each row of a table into a line of text, with commas between the columns.
> Sometimes (though not always) there is a “header row” that defines what data is in each column.

In our case, the first row of a character definition file will list the string values of the keys. Each subsequent row will define one character; the string in the i-th column will be the *value* of the key listed in the i-th column of the very first row. So for example, loading the `smallchars.csv` file should result in a list of three characters. One character will have the values `val_of “Ash”` for the key `key_of “Name”`; `val_of “Short”` for the key `key_of “Hair”`; `val_of “Blue”` for the key `key_of “Eyes”`; and `val_of “No”` for the key `key_of “Glasses”`. Another character will have the values in the `Blake` row for the corresponding columns, and another will have the values in the `Charlie` row.

You already know from Episode 4.2 how to open and read a list of lines from a file; you might also find some of the following functions useful: `String.split_on_char`, `List.combine` or `List.map2`. These should allow your function to produce correct results for correctly formatted character definition files, but your implementation should also check a few more things:

* First, a character definition file MUST include the name for each character as the first column, so the first column of the first row should start with a string s such that `(key_of s) = name`.

* Second, the same key should not appear in multiple columns of the header row. (Otherwise, how would we know which is the correct value for the key?) Your implementation should check that this is the case. (You might find the function `List.sort_uniq` useful here, although other implementations are certainly possible.)

* Third, a character definition file that doesn’t actually define any characters (so, with either zero or one rows) isn’t very useful for our game.

* Next, every row in the character definition file should have a non-empty (so not `””`) value for every key in the header, and no additional columns.

* Finally, there should never be two characters with the same name. (“Are you Dave?” “Yes.” “Really?” “Oh no, not *that* Dave…”)

If any of these conditions fails, your implementation of `read_from_file` should raise `ChrFile` with a useful error message. Some examples of files that should cause exceptions to be raised are: `nochars.csv`, which doesn’t define any characters; `nonames.csv`, which doesn’t include a name column; `dupcols.csv` has a duplicate column name; and `badrows.csv`, which has a row with an empty value, a row with too few values, and a row with too many values. (Poor Goldilocks won’t find any rows that are just right!)

The other function for this part is `print_list : t list -> unit`. This should iterate through a list of characters and print out a string representation of each character (one per line) in a way that allows a user to see what keys a character has defined and what values are defined for each key. We will not place any specific formatting requirements on the output of this function (besides having one new line per character), except that calling `key_of` on the string representing a key and `val_of` on the string representing a value should produce a matching key,value pair to the character. But a readable format will make it easier for you to test your final game!

Honestly, it might take longer to read and understand these instructions than it does to implement them.

### Part 4 (15 points): Data types for questions (boolean queries)

OK, so now we can represent characters internally, read them from a file, and print them out for a player. Let’s start on the part of the game that handles questions. In the file `boolean.ml`, create a declaration for the type `Boolean.t` that will represent boolean formulas – logical combinations of questions players will ask about characters. Your type should be able to represent the logical AND of any two formulas, the logical OR of any two formulas, the negation of a formula, and a “test” formula that checks whether a character has a given key, value pair. You can choose any representation you like, as long as you can later work out whether the answer for a particular character is true or false; you should explain your representation in a comment above the type declaration. (One choice would be to extend the type for boolean expressions we defined in Video 2.4.)

Once you’ve added the type declaration, add functions that will allow other modules to build boolean formulas:
+ `make_test : Chr.k -> Chr.v -> t` should return a formula that represents a test whether a character has the key and value pair given as arguments.
+ `make_and : t -> t -> t` should take two formulas and return a formula that represents the logical AND of the two arguments.
+ `make_or : t -> t -> t` should take two formulas and return a formula that represent the logical OR of the two arguments.
+ `make_not : t -> t` should take a formula and return a formula that represents the logical negation of its argument.

Once this is done you should be ready to move on to the next part…

### Part 5 (15 points): Answering questions

Once we have data structures to represent characters and questions about characters, we can write the function that *answers* these questions! In `boolean.ml`, write the definition for the function `eval : t -> Chr.t -> bool` that takes as input a formula and a character, and returns whether the character makes the formula true or not. Suppose we call evaluate `let [testchar] = Chr.read_from_file “ftest.csv”;;` in utop to create a single character for testing, then:

+ `eval (make_test (Chr.key_of “k1”) (Chr.val_of “v1”)) testchar` should evaluate to `true`.
+ `eval (make_test (Chr.key_of “k2”) (Chr.val_of “Z”)) testchar` should evaluate to `false`.
+ `eval (make_not (make_test (Chr.key_of “k2”) (Chr.val_of “Z”))) testchar` should evaluate to `true`.
+ if `f1` and `f2` are formulas, then `eval (make_and f1 f2) testchar` should evaluate to the same thing as `(eval f1 testchar) && (eval f2 testchar)`.
+ if `f1` and `f2` are formulas, then `eval (make_or f1 f2) testchar` should evaluate to the same thing as `(eval f1 testchar) || (eval f2 testchar)`.

A few things to keep in mind: it could be that a test asks about a key that is not defined in a player. In that case, your code should return `false`. Also keep in mind that formulas produced by parsebool can have nested formulas as their arguments, e.g. `make_and (make_or ,,,) (make_not (make_and …))`, so your evaluation code will need to handle all such cases.

So now, combined with the code in `parsebool.ml`, our Guess Who? game can get questions from the player and answer them. We just need to be able to come up with questions for the computer player and print them out for the user to see…

### Part 6 (15 points): Asking questions

The next task in `boolean.ml` is to write the function `string_of : t -> string` that converts a formula back into a string that the parser could understand. So if we take an arbitrary formula `f`, and create the formula `ff = Parsebool.parse_bool_exp (Boolean.string_of f)`, `ff` and `f` should always result in the same value on any character. Recall that if `sk = Chr.string_of_key k` and `sv = Chr.string_of_val v`, then parsebool will turn `sk ^ “:” ^ sv` into `make_test k v`; parsebool reads `!` as negating the following formula, and `&&` for logical and, and `||` for logical OR. Note that it is OK for `string_of` to include more parentheses than the original string version of a query, as long as these parentheses are still balanced, so for example parsing `”k1:v1 && k2:v2 && k3:v3″` and then converting the result back to a string could output `”(K1:v1) && ((K2:v2) && (K3:v3))”`.

The last task to solve in `boolean.ml` is to write the function `split : Chr.t list -> t`. This function has the task of coming up with a “good” question for the computer player to ask given a list of remaining possible characters. The only *requirements* we’ll place on your implementation are:

* If called with a list of length 0 or 1, it should raise `Invalid_argument “Boolean.split”` – there’s no way to “narrow down” a list that’s already narrowed down.

* Otherwise, if there are at least two distinct characters in the list, it should return a formula such that at least one of the characters makes the formula true, and at least one of the characters makes the formula false. This way we can guarantee that the computer player will always eventually narrow the list of possibilities down to one. (If you get stuck, here’s a hint: if two characters are not the same, then they do not have the same value for every key.)

You will probably find the function(s) `Chr.key_fold` and `Chr.kv_fold` useful for implementing this function. In any case, your implementation should include a comment explaining the algorithm you’re trying to implement and why it should always narrow the list. Algorithms that correctly narrow the list more quickly than one per turn (on average) have the opportunity to earn some bonus points; see part 8 below.

Hey, you’re getting really close!

### Part 7 (10 points): Error handling

OK, we gave you some code to drive the game in `guesswho.ml`, but that code is a little… fragile. It doesn’t check for errors that could come up when reading a character definition file, or reading any of the inputs a player enters. Find the places where such errors can arise, and re-write the `guesswho.ml` file to handle these errors, by informing the user what error has occurred. (So, for instance there should be a different error message for “I couldn’t open the file \”nosuchchardeffile.csv\””, than “the file you chose doesn’t appear to be correctly formatted.”) It’s OK to simply print a user-understandable error message and exit the program by calling `exit (-1)` in these cases; we’re not trying to recover from these errors, just give the user a better explanation than `Exception: Sys_error` (for instance).

### Part 8 (up to 5 points, Extra credit, no late submissions): Cool Upgrades

Hey, if you’re down here, you must have made it! Great job! Enjoy a few rounds of Guess Who with your friend the computer. If you’ve got everything else done and want to earn some extra points, you could always make your implementation better in some way. For example, if you correctly construct an algorithm for `split` that always reduces the candidates by a factor of 2, or implements the “optimal” strategy, that would be a pretty cool upgrade! Or maybe you would like to rewrite `guesswho.ml` to actually recover from an error (by giving the user the chance to try again when errors happen); that would be pretty cool too!

Make a separate file named `upgrades.md` that explains your cool upgrade, and why it is correct (ie. doesn’t break any required functionality and acheives your intended goal).

## Other considerations

In addition to satisfying the functional specifications given above, your code should be readable, with comments that explain what you’re trying to accomplish. It must compile with the command:

`ocamlc -c chr.mli chr.ml boolean.mli boolean.ml parsebool.ml unix.cma guesswho.ml`.

In particular, *any file that does not compile correctly will receive zero autograder points*: Please double-check the results of your submission to Gradescope to make sure you have uploaded files that compile. Finally, solutions that pay careful attention to resources like running time and stack space (e.g. using tail recursion wherever feasible) and code reuse are worth more than solutions that do not have these properties.

## Submission instructions and late submission policy

Once you are satisfied with the status of your submission in github, you can upload the files:

* `chr.ml`
* `boolean.ml`
* `guesswho.ml`
* `upgrades.md` (if you are attempting the extra credit portion)

to the “Coding Midterm” assignment on [Gradescope](https://www.gradescope.com/courses/261241). (The autograder will have copies of the original `.mli` files and `parsebool.ml`). These files should be uploaded as standalone files, not part of a zip archive, so that the autograder script can find them. Gradescope will automatically check that these files are present and compile, and give you a report within a few minutes if a file was not found or did not compile: ***Please check this output to make sure your submission is complete***.

After the deadline, we will run automated correctness testing similar to the basic feedback tests described here, and provide some manual feedback on the efficiency, readability, structure and comments of your code, which will be accessible in Gradescope once all exams have been graded.

**Late Submissions**: We will accept late submissions up to three days after the official due date of Saturday, July 10th. Each late day will incur a 5-point penalty.