Q1. HTML Correctness Checker:
Implement the two versions of the program for as much of the specification listed below that you have time to complete. You should use the standard C and Haskell tools. You should not add any third-party libraries or write code that will not work with the standard tools. (For Haskell, see chapter 9 of “Learn You a Haskell” if you need some help with reading from a file.)
The amount of code required for each program should be relatively small, sufficient to produce a working answer. The code should be well-designed and written.
Each program should read its input from a file containing the HTML document to check, in plain text format (i.e., created using a normal editor such as Visual Studio Code), and:
• Confirm that each tag is a valid HTML tag (see below for which tags).
• Check that each tag has a corresponding closing tag if needed.
• Check that the tags are nested properly.
• Check that there are and tags around the entire document. • Check that there is a single
If an error is found an error message should be displayed and the program stops, so only the first error found is reported. If no errors are found the program should display a simple message that the HTML was correct.
Constraints:
• You must work with the following subset of HTML tags only, not every tag!
This mandatory subset is {html, body, title, h1, h2, h3, p, ul, li, a, div, br, hr}.
Don’t include any other tags.
• A
tag, and a
tag cannot be
nested inside a
tag. A
•
is one of the few tags that doesn’t have a closing tag, so documents can
contain just
. Similarly for
.
• Any attributes in an opening tag are ignored. For example, in
• The
and not in the body section between the body tags.
• A DOCTYPE at the start of the document is not required.
• Assume there are no comments () or entities (e.g., < to represent <) in the HTML document. • Assume that the ‘<’ and ‘>’ characters are used only as part of the HTML tags,
and do not appear in the text within p, h1, h2, h3, or any other sections, or in
attribute values.
• The input file name should be fixed as ‘file.html’. You do not need to add code
to ask for a file name and input a file name. The program always reads from file.html.
1
An HTML tag is what appears between the angle brackets, for example
or
. A closing tag has a ‘/’ character before the tag name, for example
or
.
does not need a matching closing tag , likewise for
.
An example of a complete valid HTML document based on the specification above is:
A paragraph with words and sentences.
A SubTitle
More text and a link: Google
-
List item 1
-
List item 2
The class and href words are attribute names followed by an attribute value in quotes. These attributes and values should be skipped over when checking the tags, meaning for example that
Some examples of invalid sections within a document are listed here (these examples are not complete documents and are not an exhaustive list):
1. Invalid nesting
as the closing
tag is not properly nested inside the enclosing
…
tags.
2. p incorrectly nested in another p
2
3. div incorrectly nested in a p
4. Invalid tag name
5. Missing closing tag
The
between the
has no matching
.
6. Missing opening tag
The
between the
has no matching
.
7. Missing head section
8. Missing angle bracket
9. Invalid use of ‘<’ or ‘>’ characters
This is some text using these characters, < and >.
The angle bracket characters should only be used for tags.
10. title included within the body section
3