## Part 1 – Word Count
This problem asks you to write a function “word_count“ that counts
how many times words occur in a piece of text. This function must
have the type
“`
string -> (string * int) list
“`
and be in a file named “word_count.ml“.
For this problem, you must follow a rather prescribed sequence of
steps to implement this function. The reason for this is to provide
an example of how data, in this case text, can be repeatedly
transformed until the result is acquired. In the next problem you
will have to decide what this sequence of transformations is for
yourself.
The prescribed process takes the name of a file as input (as a
“string“) and returns a list of pairs indicating how many times a
word (again as a “string“) appears in the contents of the file.
The file “demo.txt“ in the “Homework/Files“ directory in the
public class repository contains a short paragraph of text. The word
“punctuation” appears 3 times and the word “about” appears 1 time.
Thus the pairs “(“punctuation”, 3)“ and “(“about”, 1)“ should
appear somewhere in the output of
“`
word_count “../../public-class-repo/Homeworks/Files/demo.txt ”
“`
Your program is to transform the original data (the filename) into the
result in 7 steps. In fact, you should use the following
implementation of “word_count“ and provide implementations for the
six “transform“ functions.
“`
let word_count (fn: string) : (string * int) list =
let step1: char list = read_file fn in
let step2: char list list = transform1 step1 in
let step3: string list = transform2 step2 in
let step4: string list = transform3 step3 in
let step5: (string * int) list = transform4 step4 in
let step6: (string * int list) list = transform5 step5 in
let step7: (string * int) list = transform6 step6 in
step7
“`
From this code, it is clear that each of the six “transform“
functions takes a single input, computed in the previous step, and
returns a result consumed in the following step.
Your job is to implement the six “transform“ functions.
** NEW **
A description of the 7 step values is below and ths should clarify
questions you may have.
– “step1“ is just the contents of the input file
– “step2“ is the result of splitting the file contents into chuncks
at any appropriate separator. If two separators (say a space and a
comma) are next to each other, then an empty list would be in thils
list to indicate that there were 0 characters between these two
separators.
– “step3“ should convert these “char list“ words into strings
– “step4“ should remove the empty words.
– “step5“ constructs a pair “(s,1)“ for each string “s“ in
“step4“
– “step6“ groups these together so that all pairs with the same
string are collected together so all the “1“ values are put into the
same list.
– “step7“ adds up the values in the lists in each pair to obtain the count
* NEWER *
Here is a sample interaction using the instructors solution. This
uses a new sample file “pets.txt“ that is now in the
public-class-repo.
“`
utop # let step1 = read_file “../../public-class-repo/Homeworks/Files/pets.txt” ;;
val step1 : char list =
[‘d’; ‘o’; ‘g’; ‘\n’; ‘c’; ‘a’; ‘t’; ‘\n’; ‘c’; ‘a’; ‘t’; ‘\n’; ‘d’; ‘o’;
‘g’; ‘\n’; ‘\n’; ‘\n’; ‘c’; ‘h’; ‘i’; ‘c’; ‘k’; ‘e’; ‘n’; ‘\n’; ‘c’; ‘h’;
‘i’; ‘c’; ‘k’; ‘e’; ‘n’; ‘\n’; ‘c’; ‘o’; ‘w’; ‘\n’; ‘d’; ‘o’; ‘g’; ‘\n’]
©¤( 12:53:34 )©¤< command 25 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step2 = transform1 step1 ;;
val step2 : char list list =
[[‘d’; ‘o’; ‘g’]; [‘c’; ‘a’; ‘t’]; [‘c’; ‘a’; ‘t’]; [‘d’; ‘o’; ‘g’];
[]; []; [‘c’; ‘h’; ‘i’; ‘c’; ‘k’; ‘e’; ‘n’];
[‘c’; ‘h’; ‘i’; ‘c’; ‘k’; ‘e’; ‘n’]; [‘c’; ‘o’; ‘w’]; [‘d’; ‘o’; ‘g’];
[]]
©¤( 12:53:49 )©¤< command 26 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step3 = transform2 step2 ;;
val step3 : string list =
[“dog”; “cat”; “cat”; “dog”; “”; “”; “chicken”; “chicken”; “cow”; “dog”; “”]
©¤( 12:54:00 )©¤< command 27 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step4 = transform3 step3 ;;
val step4 : string list =
[“dog”; “cat”; “cat”; “dog”; “chicken”; “chicken”; “cow”; “dog”]
©¤( 12:54:14 )©¤< command 28 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step5 = transform4 step4 ;;
val step5 : (string * int) list =
[(“dog”, 1); (“cat”, 1); (“cat”, 1); (“dog”, 1); (“chicken”, 1);
(“chicken”, 1); (“cow”, 1); (“dog”, 1)]
©¤( 12:54:33 )©¤< command 29 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step6 = transform5 step5 ;;
val step6 : (string * int list) list =
[(“dog”, [1; 1; 1]); (“cow”, [1]); (“chicken”, [1; 1]); (“cat”, [1; 1])]
©¤( 12:54:42 )©¤< command 30 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop # let step7 = transform6 step6 ;;
val step7 : (string * int) list =
[(“dog”, 3); (“cow”, 1); (“chicken”, 2); (“cat”, 2)]
©¤( 12:55:03 )©¤< command 31 >©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤©¤{ counter: 0 }©¤
utop #
“`
To assist in your effort, you may want to use the following functions
and you are free to copy them into your “word_count.ml“ file:
“`
let read_file (file_name: string) : char list =
let ic = open_in file_name
in
let rec read_chars ic =
(* `input_char` raises an exception when it attempts to read
past the end of the file. This is caught to terminate
the `read_chars` function. *)
try
let next_char = input_char ic
in next_char :: read_chars ic
with
_ -> []
in read_chars ic
let implode (cs: char list) : string =
String.concat “” (List.map (String.make 1) cs)
let explode (s: string) : char list =
let l = String.length s
in
let rec f i =
if i = l then [] else s.[i] :: f (i+1)
in f 0
“`
You might also like to use the “lookup_all“ function found in the
“find_and_lookup.ml“ file in the “Sample Programs“ directory of
the public repository. With this you can add the following simple
tests to the bottom of your file and check that they always evaluate
to “true“:
“`
let demo_count = word_count “../../public-class-repo/Homeworks/Files/demo.txt”
let test1 = lookup_all “new” demo_count = [2]
let test2 = lookup_all “the” demo_count = [5]
let test3 = lookup_all “to” demo_count = [2]
let test4 = lookup_all “punctuation” demo_count = [3]
“`
Finally, you might use the following function *temporarily* in
“transform1“ to split the file contents into words:
“`
let incomplete_split (c: char) (chs: char list) : char list list =
List.map explode (String.split_on_char c (implode chs))
“`
While this will work on the “demo.txt“ file, it is not sufficient
for the other sample file “Liverpool.txt“. The “demo.txt“ file is
rather special because it only contains words separated by a single
newline character. Thus, splitting a character list (“chs“ above)
into a list of list of characters by only splitting on a single
character (where “c“ is passed the character “\n“) is Ok for this
file. But for the “Liverpool.txt“ file you need to recognize that
other characters, such as a space, a comma, a period, or a sequence of
these, also indicates a break between words.
Thus, you will eventually want to write a function “split“
with the type
“`
‘a list -> ‘a list -> ‘a list list
“`
where the first argument is a list of “separators” (such as
“`
[ ‘ ‘; ‘,’; ‘;’; ‘.’; ‘!’; ‘\n’ ]
“`
and maybe other characters that indicate the separation of words.
The second argument is the input to be separated. In our case, this
will be the character list in “step1“.
The function “transform5“ may be the most interesting one. Here,
you might consider using one of your “partition“ functions from
Homework 01. Some other interesting functions might also need to be
written for this step.
### No “rec“
Your solution must be one that uses higher order functions such as
“List.filter“, “List.map“, “List.fold_left“, “List.fold_right“
and others. The only recursive function (using the “rec“ keyword)
allowed is the “read_chars“ function inside the provided function
“read_file“.
## Part 2 – Word Games
This problem asks you to write a function “word_games“ that
generates questions for a rapid-fire word game. This is inspired by
this episode of the “Sunday Puzzle” show on NPR:
https://www.npr.org/2019/01/06/682575357/sunday-puzzle-stuck-in-the-middle
You will write two functions for this part:
– “answers : string -> string list“
– “pretty_answers : string list -> (string * string) list“
The primary work will be done by “answers“. It takes as input the
name of a file containing a list of words are returns all 8 letter
words that if the middle 2 letters are removed results in a 6 letter
word also in the original list of words.
For example, if the file contains “accident” and “accent”, then
“answers“ will return the string “”accident”“ in its output.
### Getting started.
Create a file named “word_games.ml“ that will contain your two
functions.
To it, add the following function for reading files:
“`
let read_file (file_name: string) : char list =
let ic = open_in file_name
in
let rec read_chars ic =
try
let next_char = input_char ic
in next_char :: read_chars ic
with
_ -> []
in read_chars ic
“`
This is the only function in this part that may contain the “rec“ keyword.
This function will return a list of characters. Since our output
requires a list of strings, then you may want to add the following
function to your file as well:
“`
let implode (cs: char list) : string =
String.concat “” (List.map (String.make 1) cs)
“`
Also, your “split“ function from Part 1 above will be useful. Copy
that function into “word_games.ml“ as well.
Finally, add the following two lines
“`
let d1 = “../../public-class-repo/Homeworks/Files/words-small.txt”
let d2 = “../../public-class-repo/Homeworks/Files/words-google-10000.txt”
“`
These are file name paths to two sample dictionaries from inside your
“Hwk_02“ directory. This arrangement assumes that your individual
repository and the public class repository are both cloned into the
same directory, as instructed in Lab 01.
This will allow you to test your work with the simple expressions
“answers d1“ or “answers d2“.
### The “answers“ function
The “answers“ function will need to read in the contents of the
given file. You may then want to use “split“ to group that original
“char list“ into a list of words, that is a “char list list“.
From here, consider extracting the words of length 8 and 6.
It should then be a matter extracting the appropriate words from the
list of 8 letter words. Clarification: Say you have an 8 letter word
named “w8“, you want to check if the first three letters
concatenated to the last 3 letters is a word from the list of 6 letter
words. You might investigate how the library function “String.sub“
works to extract these 3 letter words from an 8 letter word.
There are several ways one can go about solving this problem. It is
probably wise to consider these steps in some details before you start
programming.
That is, ask yourself how the input is transformed and what the OCaml
type of those intermediate values will be.
### The “pretty_answers“ function.
This function is rather simple. It just takes the result from
“answers“ and converts it into a list of string pairs. The first is
the 6 letter word, the second is the 8 letter word that contains the 6
letter word in its first and last 3 letter bits.
For example, here is the result of using both function in utop:
“`
utop # pretty_answers (answers d2) ;;
– : (string * string) list =
[(“mating”, “matching”); (“offers”, “officers”); (“gaming”, “gambling”);
(“estate”, “estimate”); (“accent”, “accident”); (“caring”, “carrying”);
(“equity”, “equality”); (“states”, “statutes”); (“string”, “striking”);
(“though”, “thorough”)]
“`
### Note the file “words-google-10000.txt“ comes from this repository:
https://github.com/first20hours/google-10000-english
### Again, no “rec“
You solution must be one that uses higher order functions such as
“List.filter“, “List.map“, “List.fold_left“, “List.fold_right“
and others. The only recursive function (using the “rec“ keyword)
allowed is the “read_chars“ function inside the provided function
“read_file“.