2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Mini-Project 5: Language generation
Assigned
Friday, 7 May 2021
Summary
In this assignment, you will write programs that generate (or attempt to generate) di erent forms of writing. Along the way, you will explore issues pertaining to randomness, conditional behavior, and textual analysis. Collaboration
Each student should submit their own responses to this assignment. You may consult other students in the class as you develop your responses. You may also consult with the normal host of other folks: Mentors, tutors, Professor Eikmeier, etc. If you receive help from anyone, make sure to cite them in your responses. You do not need to cite course pages you were instructed to read for this assignment.
Disclaimer: In this assignment, you will read and generate some utterly horrible ¡°poetry¡±. Please accept my apologies if any of the work included herein o ends your expectations, cultural or otherwise.
Preliminaries
A syllabic lexicon, or syllax for short, is a collection of words or short phrases arranged by syllables. For the purposes of this class, a syllax is a vector of vectors, where the vector at index i contains only the words with i syllables.
For example, here is a syllax for some words related to CSC-151.
(define csc151-syllax
(vector
;0
(vector)
;1
(vector “cons” “car” “list” “pair” “Scheme” “sort” “match” “string”
“lab” “map” “fold” “test”)
;2
(vector “vector” “cadr” “cdr” “Racket” “jelly” “sandwich” “syllax”
“image” “recurse” “eboard” “data” “compose” “lambda” “section”
“SoLA” “MP”) ;3
(vector “recursion” “computer” “digital” “confusing” “programming”
“CSC” “abstraction” “decompose” “document” “abstraction”
“boolean” “binary”)
;4
(vector “humanities” “exponential” “collaborate” “one-fifty-one”
“algorithm” “DrRacket” “dictionary” “generalize”
“tail recursion”)
;5
(vector “collaborative” “experiential” “decomposition” “generality”) ;6
(vector)
;7
(vector “triskaidekaphobia”)))
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 1/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Here¡¯s another syllax for some words related to Grinnell college.
(define grinnell-syllax
(vector
;0
(vector)
;1
(vector “Mears” “Noyce” “husk” “train” “corn” “black”)
;2
(vector “self-gov” “Stonewall” “The Bear” “first-year” “scarlet”
“remote” “Webex” “prairie” “need-blind” “soybeans” “Hopkins”
“Younker” “Dibble” “cluster” “scurry”)
;3
(vector “liberal” “JRC” “CLS” “advisor” “FYE” “laurel leaf” “Honor G”
“ARH” “North Campus” “Iowa” “semester” “Women’s quad”
“Grinnellian”)
;4
(vector “curriculum” “Mary B. James” “Tutorial” “Green alien” “convocation”
“education”)
;5
(vector)
;6
(vector “Congregationalist”)))
Helpful procedures
In doing this assignment, you may find the following procedures of use, which are based on the lab on random language generation. Be sure to cite this lab, and refer back to it if you need more ideas on random language generation.
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 2/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
;;; (random-vector-element vec) -> any?
;;; vec : vector? (nonempty)
;;; Randomly select an element of `vec`
(define random-vector-element
(lambda (vec)
(vector-ref vec (random (vector-length vec)))))
;;; (vector-andmap vec) –> boolean?
;;; vec: vector? (nonempty)
;;; Returns true if every entry of vec is true, and
;;; false otherwise
(define vector-andmap
(lambda (vec)
(let ([len (vector-length vec)])
(letrec ([go (lambda (pos)
(if (= pos len)
#t
(and (vector-ref vec pos) (go (+ pos 1)))))])
(go 0)))))
;;; (strvec? val) -> boolean?
;;; val : any?
;;; Determines if val is a vector of strings?
(define strvec?
(lambda (val)
(and (vector? val)
(vector-andmap string? val))))
Turn in details
For this assignment, create one single file titled language.rkt. Include your answers to all parts in this single file. Your file should contain a header (see example below), and should clearly label the various parts and subproblems using comments. Make sure to organize the file so that it¡¯s easy for another human (e.g. your professor,mentors, graders) to read it. Turn in your file on Gradescope, and attempt to fix any errors that appear from the autograder.
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 3/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
# lang racket
(require rackunit)
(require csc151)
; language.rkt
; Author: Stu Dent
; Class: 151-02 Spring 02 2021
; Mini-Project 5, Parts 1-??
; Date: May 13, 2021
; Citations:
; XXX
; YYY
; Code here
Part One: Generating Haiku
Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.
a. Document, design, and implement a recursive procedure, (phrase n syllax) that randomly generates a string containing a phrase of n. The words for the phrase should be taken randomly from the syllax given in the second input. Make sure that you accommodate di erent ¡°patterns¡± of n syllables. For example, if n is 4, you might use
a one-syllable phrase (which is just a one-syllable word) and a three-syllable phrase a two-syllable phrase and a two-syllable phrase
a three-syllable phrase and a one-syllable phrase (which is just a one-syllable word) a four-syllable word
b. It is di icult to write tests for random procedures, so you will likely have to conduct experiments instead. Please include a record of your experiments, and your analyses of the results (e.g. you should check that the number of syllables is correct).
c. Document and write a procedure, (haiku syllax), that takes as input a syllax and generates a Haiku of the appropriate form, with each line terminated by the strange value “\n”.
Part two: Extracting words
In generating some kinds of text it can be useful to have a large corpus of words. And, in many cases, we achieve ¡°interesting¡± results by using the words of others. Let¡¯s consider how we might make a list of all the di erent words that appear in a book.
> (haiku some-syllax)
“exit dog dragon\nbaseball dog television\nelephant eat car\n”
> (display (haiku some-syllax))
exceeding dog car
ant dog eat car baseball ball
exit ball eat car
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 4/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
a. Document, write tests for, and implement a procedure, (extract-words str), that extracts all of the words from a string. Make sure that your words can include both uppercase and lowercase letters and that they can include hyphens and apostrophes.
If you¡¯ve already written or seen such a procedure, you can feel free to grab it from elsewhere, but make sure to include citations.
b. Document, write tests for, and implement a tail-recursive procedure, (dedup lst), that takes a list and removes all the duplicates from the list. You may not use any existing remove procedures provided by DrRacket.
Your procedure should return the elements in the same order that they appear in lst. Since you¡¯re using tail recursion, you¡¯ll probably need to reverse so-far at the end of the recursion.
Part three: Identifying syllables
In generating some kinds of text, such as those in a previous problem, it is useful to have a large corpus of words in di erent categories. One set of categories are words with a certain number of syllables.
a. Document and write a procedure, (syllables word), that attempts to determine how many syllables are in the string word. You can assume that word consists of only lowercase letters.
How do you decide how many syllables are in a word? One technique that works in many cases is to identify how many sequences of vowels there are. In many instances, that strategy provides a good rough estimate. However, there are also many cases in which that estimate fails (potentially, it fails for ¡°syllables¡±, although we could argue that the internal ¡°y¡± serves as a vowel). So try to be creative in figuring out other special patterns. It is likely that you will need one or more conditionals in your procedure.
It is fine if your procedure does not work perfectly, or even all that well. We¡¯d simply like to see some thought and creativity beyond the basics of ¡°sequences of vowels¡±.
b. Include some interesting examples of when your procedure works well and some of when it fails to work correctly.
c. Make a copy of The Project Gutenberg version of Jane Eyre, available at http://www.gutenberg.org/files/1260/1260-0.txt. Please name it 1260.txt and place it in the same directory as your Racket program.
d. Using syllables, filter, and any other procedures you deem appropriate, generate lists of the one- syllable, two-syllable, three-syllable, four-syllable, and five-syllable words in Jane Eyre.
e. Use those lists to generate some Haiku. Include examples.
Part four: Rhyming
What makes a poem? While there is no requirement that poetry rhyme, many people associate rhyme with poetry. It is also certainly the case that many forms of poetry, such as a quatrain make use of rhyme.
As we think about generating or analyzing text, it may be useful to to be able to identify rhymes. Of course, we appear to be working in the wonderfully inconsistent language known as English, so precise definition of rhymes are di icult.
a. One possible metric for rhyming is the end of the word. Write a procedure, (might-rhyme? word1 word2), that takes two strings that represent words (e.g., all lowercase letters plus potential apostrophes) and returns true if the two words share the last three characters.
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 5/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Note: Your procedure should work correctly if one or both of the words has fewer than three characters.
b. Identify at least six pairs of words that do not rhyme, but pass that test. You might, for example, pick some random words and then use filter to look through a larger list of words to see which seem to rhyme. Include them in a comment.
c. Identify at least six pairs of words that do rhyme, but do not pass that test. Include them in a comment.
d. Describe something you could do to address one or more of the cases in parts b and c. That is, how could you avoid the non-rhymes and how could you incorporate the rhymes? Include this as a comment.
e. Using your additional analysis, write a better (rhymes? word1 word2) procedure. You are free to make this as simple or as complicated as you like, provided you incorporate your ideas from part d. (You should, of course, document rhymes?.)
f. Using rhymes?, write a procedure, (rhymes-with word words), that takes a string and a list of strings as input and finds all of the words in words that appear to rhyme with word. (You should, of course, document rhymes-with.)
g. Write a procedure (abab words) that takes as input a corpus (list) of words and returns a string that represents a ¡°random¡± quatrain of four lines of four words. The last words of the first and third lines must rhyme, as must the last words of the second and fourth lines. Include a few examples of abab in your submission.
Part five: Nearby words and sentence generation
As you¡¯ve likely realized, generating actual language is hard, and writing programs that ¡°interpret¡± language is o en even harder. One of the legendary challenges of language generation has to do with the di erences between two very similar statements.
Time flies like an arrow.
Fruit flies like an apple.
Can you tell why that pair is complex? If not, ask your faculty member or mentor.
In looking for ways to generate somewhat realistic text, one approach that has shown some promise relies on a relatively straightforward analysis of an existing text.
You start with some word that you know can start a sentence.
You randomly select from among the words that immediately follow that word in the original text. You randomly select from among the words that immediately follow that word in the original text. And so on and so forth, until you reach the end of the sentence.
This approach sometimes works surprising well and sometimes works relatively poorly. We can o en improve it by working with pairs or triplets of words. But for now, we¡¯ll stick with single words.
a. Document, write, and test a procedure, (sentence-ends str), that finds all of the words in str that end sentences. The words that end sentences typically come before periods, question marks, and exclamation points.
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 6/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
b. Document, write, and test a procedure, (sentence-starts str), that finds all of the words in str that begin sentences, and returns them as a list (without duplicates). Typically, the start of a sentence is given by (a) the start of the string or (b) any word that comes a er end puncutation (question mark, exclamation point, period) plus a sequence of whitespace (space, newline, tab).
> (sentence-starts “The cat ate the hat. The rat sat.”)
‘(“The”)
> (sentence-starts “Do you like blue mac and cheese? No I don’t, it makes me s
neeze!”)
‘(“Do” “No”)
> (sentence-starts “The cat sat on the hat. \”Where is my hat?\” asked the ra
t. It’s now a flat hat. How ’bout that? Will the fat rat jump on that brat c
at who sat?”)
‘(“The” “Where” “asked” “It’s” “How” “Will”) ; or maybe not “asked”
c. Document, write, and test a procedure, (right-neighbors word words), that finds all of the words that immediately follow word in words, which is a list of strings. For example,
> (define cat-thing (extract-words “The cat sat on the hat. \”Where is my hat?
\” asked the rat. It’s now a flat hat. How ’bout that? Will the fat rat jump
on that brat cat who sat?”))
> (right-neighbors “hat” cat-thing)
‘(“Where” “asked” “how”)
> (right-neighbors “The” cat-thing)
‘(“cat”)
> (right-neighbors “the” cat-thing)
‘(“hat” “rat” “fat”)
> (right-neighbors “will” cat-thing)
‘(“the”)
> (right-neighbors “computer” cat-thing)
‘()
d. With these procedures, we should be able to generate things that appear to be similar sentences.
We pick among the starting words. Let¡¯s say we pick ¡°The¡±.
We look at all the right neighbors of ¡°The¡±. Unofrunately, there is only one: ¡°cat¡±. We have generated ¡°The cat¡±.
What words follow ¡°cat¡±? ¡°sat¡± and ¡°who¡±. Randomly choose among them… so perhaps now we have ¡°The cat who¡±.
We continue on in some fashion until we get an ending word and decide we are finished.
> (sentence-ends “The cat ate the hat. The rat sat.”)
‘(“hat” “sat”)
> (sentence-ends “Do you like blue mac and cheese? No I don’t, it makes me sne
eze!”)
‘(“cheese” “sneeze”)
> (sentence-ends “”The cat sat on the hat. ‘Where is my hat?’ asked the rat.
It’s now a flat hat. How ’bout that? Will the fat rat jump on that brat cat?
“)
‘(“hat” “hat” rat” “hat” “that” cat”) ; or (“hat” “rat” “that” “cat”)
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 7/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Document and write a procedure, (random-sentence all-words start-words end-words) that takes the list or vectors we generate from the procedures you¡¯ve written already, and uses them to build a sentence according to the algorithm following:
1. Start with one of the start words.
2. Call that the current word.
3. Repeatedly apply the following process.
a. If the current word is one of the last words, stop with
some probability (say 50%).
b. Generate the list of following words.
c. If that list or vector is empty, stop.
d. Pick randomly from the list of following words.
e. Add that to the sentence and call it the current word.
Part six: Freestyle
There are two options for your frestyle, you only need to complete one of the following options.
a. You¡¯ve explored a variety of issues in analyzing and generating text. It¡¯s now time to explore creative ways to use what you have learned. Choose any combination of techniques you¡¯ve learned (random sentence generation, random phrase generation with words by syllable, rhyming, Mad libs, etc.) and document and write a procedure that generates a kind of text of your choice. Make sure to incdlue some ¡°successful¡± examples of your procedure.
b. Document and write a procedure that attempts to determine how closely a piece of writing matchs one of the central poetic forms from poets.org. You¡¯ll need to examine syllables per line, rhyming scheme, and such. Include some examples. (You can see the di erent poetic forms in the list of poems.
Partial rubric
Redo or above
Submissions that lack any of these characteristics will get an I.
[ ] File named correctly language.rkt
[ ] File includes an appropriate header
[ ] File runs in DrRacket
[ ] If the code references other text files, those files are included in submis
sion.
[ ] If the code references other files, it does so with the base file name, rat
her than a complex path.
[ ] Part 4: At least 6 examples are given for parts b and c
Meets Expectations or above
Submissions that lack any of these characteristics will get an R.
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 8/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Exceeds Expectations
Submissions that lack any of these characteristics will get an M.
[ ] Avoids excessive repeated work
[ ] Variable names are clear
[ ] Every part of every problem is completed
[ ] Part 1a: Works correctly for different numbers of subvectors in
the syllax, provided the subvector at index `i` contains
only words of `i` syllables. (E.g., if we add six-syllable words,
it will, on occasion, include those.)
[ ] Part 3a: The `syllables` procedure has at least two extensions to “count
vowel sequences”
[ ] Part 4e: New procedure addresses some of the pairs given in parts b and c
[ ] Part 4g: `abab` procedure ensures words have rhymes
###Additional Grading Info The following syllaxes and helper procedures are used by the auto-grader, if you encounter an error you may try to debug with this code.
[ ] Code was reformatted with Ctrl-I before submitting
[ ] All procedures are documented and the documentation is mostly correct
[ ] Procedures that are supposed to return strings, such as `haiku`, return str
ings rather than using `display`
[ ] Part 1c: Includes Examples of the Haiku in action
[ ] Part 1c: Includes the “\n” at the end of the third line
[ ] Part 1a: Works correctly for different numbers of words in the syllax subve
ctors
[ ] Part 2: Includes tests for parts b and c
[ ] Part 3a: The `syllables` procedure counts vowel sequences, not just vowels
[ ] Part 4a: Works on words with fewer than three letters
[ ] Part 4g: Includes examples of `abab`
[ ] Part 5: Includes tests for parts a, b, and c
[ ] Part 5d: Works when a word does not have a following word.
[ ] Part 6: Includes examples of the procedure results
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 9/10
2021/5/13 CSC 151-02 (2021S2) – Mini-Project 5
Copyright ý Charlie Curtsinger, Sarah Dahlby Albright, Janet Davis, Nicole Eikmeier, Fahmida Hamid, Titus Klinge, Peter- Michael Osera, Samuel A. Rebelsky, Anya Vostinar, and Jerod Weinman. Selected materials are copyright by John David Stone or Henry Walker and are used with permission.
Unless specified otherwise elsewhere on this page, this work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.
(define number-syllax
(vector
(vector)
(vector “one” “two” “three” “four” “five” “six” “eight” “nine” “ten”
“twelve”)
(vector “seven” “thirteen” “fourteen” “fifteen” “sixteen” “eighteen”
“nineteen” “twenty”)
(vector “eleven” “seventeen” “twenty-one”)
(vector)
(vector)
(vector “seventy-seven thousand”)
(vector)))
; A syllax to help with phrase-syllables
(define test-syllax
(vector
(vector)
(vector “1” “01” “0001” “00001” “1+0i”)
(vector “2” “02” “0002” “00002” “2+0i”)
(vector “3” “03”)
(vector)
(vector)
(vector “6”)
(vector)
(vector)
(vector “9”)))
; How many syllables in a phrase generated from test-syllax
(define phrase-syllables
(lambda (phrase)
(apply + (map string->number (string-split phrase)))))
https://eikmeier.sites.grinnell.edu/csc-151-s221/assignments/project05.html 10/10