comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
Update Sep 24: There was an error in problem 4. It is now fixed.
Update Sept 18: I made a slight change to Problem 2.4 (the input should be an instance and
an assignment, not a formula and an assignment); I clarified some of the language in other
places.
This assignment is due in week 8 on Sunday 10 October 11:59pm Gradescope will say Sunday
11 October 12:00am
All work must be done individually without consulting anyone else’s solutions in accordance
with the University’s “Academic Dishonesty and Plagiarism” policies.
You will be evaluated not just on the correctness of your answers, but on your ability to present
your ideas clearly and logically. You should always justify your answer unless explicitly asked
not to do so. Your goal should be to convince the person reading your work that your answers
are correct and your methods are sound. For clarifications and more details on all aspects of this
assignment (e.g., level of justification expected, late penalties, repeated submissions, what to do
if you are stuck, etc.) see the Ed post “Assignment Guidelines”.
What to submit:
1. Submit a pdf on GS with all your answers (excluding your implementations).
2. Submit the solutions to Problem 1,3, and 4 on Ed Lessons.
3. Submit your code for Problems 2 and 5 on Ed Lessons. The implementation part of the
assignment will be autograded based on hidden input/output test cases.
Problem 1. (6 marks) Write each of these formulas in CNF, and say in as few
words as possible what they express (e.g., x $ y expresses that ”x and y agree
on their truth-values”).
1. x !¬(y$ z).
2. x $(y$ z)
For each item, 1 mark for the CNF formula (no derivation needed), 2 marks for
succinctly saying what it expresses.
Problem 2. (19 marks)
You work for CliqueFinder, a company that mines social-networks for cliques. A
social-network is modeled by an undirected graph G = (V, E). For instance, ver-
tices may represent people and edges may represent the ”are friends” relation.
A clique of G is a subset C ✓ V of the vertices such that every pair of different
vertices in C are related by an edge. For instance, a clique may represent a group
of people who are all friends with each other. See the appendix for a refresher
on graph concepts.
Clients are interested in decomposing their networks into cliques. So, you define
the following computational problem that you call the exact-clique-cover (ECC)
problem. An input instance of the ECC problem consists of V, E, K where (V, E)
is an undirected graph and K is a natural number with 1 K |V|. Given an
instance, the ECC problem asks (a) to say if there is a partition V = [Ki=1Vi of
1
https://sydney.edu.au/students/academic-dishonesty.html
https://edstem.org/courses/6670/discussion/534640
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
the vertices into K non-empty parts, such that each part Vi is a clique of G, and
(b) if there is such a partition, to return one such partition, called a solution.
For example, consider the graph (V, E) pictured:
If K = 1 then the instance V, E, K has no solution; if K = 2 then the instance
V, E, K has a solution, e.g., V1 = {1, 2, 6} and V2 = {3, 4, 5}1; if K = 3 then the
instance V, E, K has solutions, e.g., V1 = {1, 2, 6}, V2 = {3, 5} and V3 = {4}; and
similarly if 4 K 6 then the instance V, E, K has solutions.2
You remember studying how to encode facts and reasoning into logic, and so
you decide to see if you can encode this problem into logic and use a SAT solver
to solve it.
For concreteness you should assume that the vertex set V is of the form
{1, 2, · · · , N} for some N � 1.
Your plan is to reduce the ECC problem to the satisfiability problem for CNF
formulas. Namely, find an encoding of an instance V, E, K by a CNF formula
FV,E,K such that:
• there is a solution to the instance V, E, K iff the formula FV,E,K is satisfiable,
• you can convert every satisfying assignment of FV,E,K into a solution for the
instance V, E, K,
• and the size of FV,E,K is bounded by a polynomial in N, K.
Explain what your encoding is and why it is correct. To do this, you should:
1. (12 marks) State all the variables you introduce, and say in plain language
what they are supposed to represent. It should be clear from your de-
scription how an assignment encodes a solution. State all the clauses you
introduce, and justify each with a short sentence in plain language saying
what it captures. For full credit , the size of your formula FV,E,K should be
bounded above by a polynomial in N and K.
1Actually, the only other solution with K = 2 swaps the order, i.e., V1 = {3, 4, 5} and V2 = {1, 2, 6}
2K = 1 asks if the graph itself is a clique; and K = N always has solutions, i.e., each vertex is in its own part.
2
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
For the level of justification and the formatting, look at the solutions to
similar problems in Tutorial 4, i.e., problems 15 and 16.3
2. (2 marks) Find an asymptotic upper bound on the size of your encoding
formula FV,E,K in terms of N and K, and write it in big-Oh notation. You
can assume that each variable can be stored in constant space.
Implement your encoding using the specified file formats (the formats are de-
scribed below). To do this you should:
3. (4 marks) Write a program that takes as input an instance V, E, K (in the
GRAPH format) and returns its encoding formula FV,E,K (in the DIMACS
formula format).
4. (1 mark) Write a program that takes as input (i) an instance V, E, K (in the
GRAPH file format), and (ii) an assignment of the variables (in the DIMACS
assignment format), and returns the solution that the assignment encodes
(in the ECC format).
Here is a refresher of basic terminology about graphs:
• An undirected graph G is a pair (V, E) where V is a non-empty finite set of
vertices and E ✓ V ⇥V is an irreflexive symmetric relation of edges. Irreflex-
ive means that for every u 2 V, (u, u) 62 E, and symmetric means that for
every u, v 2 V, if (u, v) 2 E then also (v, u) 2 E. In other words, there are
no self loops, and edges are not directed.
• A clique (aka complete-graph) is a subset C ✓ V of the vertices such that
(x, y) 2 E for all x, y 2 C with x 6= y.
• Note that although the empty set of vertices C = ∆ ⇢ V is a clique, the
cliques that make up a solution to the ECC problem are required to be
non-empty.
Problem 3. (6 Marks) Consider the following CFG G:
S ! AT | e
T ! TA | BT | e
A ! a
B ! b
1. (1 Mark) List the variables.
2. (1 Mark) List the terminals.
3. (2 Marks) Find a regular expression R such that L(R) = L(G).
4. (2 Marks) Show that the grammar is ambiguous by giving two leftmost
derivations of the same string.
3Make sure you have the latest version of Tutorial 4 from Ed since it was re-organised in week 6.
3
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
No additional justifications are needed.
Problem 4. (4 Marks) Sep 24: There was an error in this question. It is now
fixed.
Let L be the language over the alphabet S = {a, b, c} consisting of all strings of
the form ucv where u 2 {a, b}⇤, and |v| is the number of a’s in u.
For example, the following strings are in L: c, bbbc, aacbb, babbacab, abaccc, and
the following strings are not in L: e, aabb, baacbba, aabcaab, abbacccc.
Here is a grammar G that generates the language L:
S ! missingstring (1)
T ! a | b | c (2)
B ! Bb | e (3)
1. (2 marks) Fill in the missing string. The string can mention terminals, vari-
ables, and up to one |.
2. (2 marks) Give one or two sentences explaining your answer.
Problem 5. (15 Marks) Implement a program that given a CFG G = (V, S, R, S)
in CNF and a string u 2 S⇤ as input, returns the number of leftmost derivations
of u by the grammar G. Note that this number could be 0.
• Input: is a context-free grammar in Chomsky normal-form followed by a
sequence of input strings. Input is read from standard input (stdin).
• Output: One line per input string, giving the string, a comma, and then the
number of leftmost derivations.
Examples of usage, and of input and output are provided below.
Marks are assigned based on the proportion of test-cases that are passed.
A Appendix: Input/Output for the encoding problem
GRAPH file format
GRAPH is a file format for instances of the ECC problem.
Here is a typical GRAPH file:
c
c this is a comment
c
g 6 8 2
1 2
1 6
2 6
2 3
4
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
5 6
3 5
3 4
4 5
The file corresponds to the pictured graph, as well as the fact that the number of desired cliques
is 2.
The format is as follows. At the top you can have comment lines that start with a ”c”, like this:
c This line is a comment.
Then comes the problem line, which starts with a ”g” and then states the number of vertices, then
number of edges, and the desired clique-cover size. In the example, there are N = 6 vertices, 8
edges, and K = 1.
Finally, the edges are listed. Each edge is represented as a pair of numbers i, j such that 1 i <
j N. For instance a line with 3 5 says that there is an undirected edge between vertex 3 and
vertex 5.
ECC file format
ECC is a format for solutions of the exact-clique-cover problem.
Here is a typical ECC solution file:
2
1 2 6
3 4 5
The first line lists the number of cliques. This is also the number of remaining lines. Each
successive line is a list of vertices. This file says that the two disjoint cliques that cover the
vertices are the sets V1 = {1, 2, 6} and V2 = {3, 4, 5}.
If there is no solution, then the solution file should be
2
NO SOLUTION
DIMACS formula format
DIMACS is a file format for CNF formulas.
Here is a typical DIMACS formula file:
5
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
c
c start with comments
p cnf 5 3
1 -5 4 0
-1 5 3 4 0
-3 -4 0
This file corresponds to the following formula over variables x1, x2, x3, x4, x5:
(x1 _ ¬x5 _ x4) ^ (¬x1 _ x5 _ x3 _ x4) ^ (¬x3 _ ¬x4)
The format is as follows. At the top you can have comment lines that start with a ”c”, like this:
c This line is a comment.
Then comes the problem line, which starts with a ”p” and then says how many variables and
clauses there are. For instance, here is a problem line that says this is a CNF problem with 5
variables and 3 clauses:
p cnf 5 3
Finally the clauses are listed. Each clause is represented as a list of numbers like 3 and -4. A
positive number like 3 represents a positive occurrence of variable 3. A negative number like -4
represents a negated occurrence of variable 4. The number 0 is treated in a special way: it is not
a variable, but instead marks the end of each clause. This allows a single clause to be split up
over multiple lines.
DIMACS assignment format
DIMACS is also a file format for assignments in propositional logic. Here is a typical DIMACS
assignment file:
SAT
1 -2 -3 4 5
It says that the formula is satisfiable, and a satisfying assignment maps x1, x4, x5 to 1 and x2, x3
to 0.
If you run ”minisat formula answer” where ”formula” is a DIMACS formula file, minisat will
produce a DIMACS assignment file called ”answer”. If the formula is not satisfiable, then the
”answer” file will be
UNSAT
B Appendix: Input/Output for CFG problem
The input is a sequence of lines:
1. A comma separated list of variable symbols
2. A comma separated list of terminal symbols
6
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
3. The start variable
4. One or more lines of the form:
• A -> B C
• A -> a
• A -> epsilon
5. The string end
6. One or more lines, each consisting of a non-empty input string.
7. The string end.
The output is a sequence of lines, one for each of the input strings, with a number, i.e., the
number of leftmost derivations of that input string by the grammar.
For example:
A, B , C,D, S , T
a , b
T
T �> A B
T �> B A
T �> S S
T �> A C
T �> B D
T �> eps i lon
S �> A B
S �> B A
S �> S S
S �> A C
S �> B D
C �> S B
D �> S A
A �> a
B �> b
end
ab
a
abab
end
represents the grammar:
hTi ! hAi hBi | hBi hAi | hSi hSi | hAi hCi | hBi hDi | e
hSi ! hAi hBi | hBi hAi | hSi hSi | hAi hCi | hBi hDi
hCi ! hSi hBi
hDi ! hSi hAi
hAi ! a
hBi ! b
7
comp2022/2922 Assignment 3 – SAT encodings and CFGs s2 2021
and input strings ab, a, abab.
and the expected output is
ab , 1
a , 0
abab , 2
because ab has one leftmost derivation, a has no leftmost derivations, and abab has two leftmost
derivations.
8