CS 3304_Programming_Project#3
Programming is 10% science, 20% ingenuity, and
70% getting the ingenuity to work with the science.
Unknown Programming Wisdom
Dr. T.’s Programming Project #3
Sanitizing Social Network Postings
Assigned November 13th; Due December 2nd, 11:59pm
Video Instructions
Disclaimer: This project cannot be completed in one all-nighter, even by a programming genius;
the due date cannot be extended under any circumstances.
Part I: Implementing the ScalaH tool
Background: The Java development ecosystem used to have the command line tool,
javah. It would take a binary class file as input, and generate a Java Native Interface
(JNI) header (.h) file for all the native methods contained in the input class file. Then the
powers that be in Oracle decided to deprecate javah and replace it with a command line
-h switch to the Java compiler, javac. Executing javac -h dir sourceFile.java now
generates the JNI header file for all the native methods declared in sourceFile.java and
places the generated header file into the dir directory. The problem is that now it has
become impossible to generate JNI header files for languages other than Java that are
also compiled into the JVM class format. One of these languages is Scala.
You are to implement a command line tool called ScalaH that would take a class file
generated by the Scala compiler, scalac, and generate the JNI header file for all the
native methods contained in that class file (those that are annotated with the @native
annotation in the Scala source file). Implement your solution as either a Java or Scala
program.You can make your implementation as complex and sophisticated as you like,
but you can follow the following heuristic that makes heavy use of the
java.util.spi.ToolProvider API. Suggested heuristic:
1.) Invoke javap (Java class disassembler) programmatically, so the output is
redirected to a disk file.
2.) Read in the redirected javap output and generate a surrogate Java source file of
the decompiled class that contains only the native methods. Note that native
methods can only be abstract as they are implemented natively in C. Also note
that to be compiled with javac, the output of javap may need to be adjusted.
3.) Programmatically compile the generated source file with the flag -h, so the Java
compiler javac would generate the JNI C header file.
Your program is to take the class file generated by the Scala compiler.
Note: place the generated header file in the same directory as the input class file.
Please, make sure that your implementation is robust and bug-free, as you’d need to use it to
complete Part 2 of the project.
Part II: Sanitizing Social Network Postings
To understand the motivation behind Part 2, please watch the following YouTube videos:
You work for a major job agency. Among your clients is an assortment of village idiots; they post
each and every facet of their daily existence on a certain Social Network, without any regard for
how these postings will affect them in the future. As a result, too many of your clients bombed
their job interviews because of some social media indiscretions, thus costing your company
business. The management has decided to remediate the situation by providing a service to the
clients to sanitize their questionable postings in preparation for important job interviews. Asking
a client to delete all of their postings would be unnecessary and suspicious, so you want to
identify those postings that look “questionable” and would be seen by your client’s interviewer.
Then, your client can examine these postings to see if they indeed need to be deleted before a
particular interview. Fortunately for you, the Social Network’s privacy is very lax: all the events
(i.e., who friended whom, postings and their comments/likes, etc.) can be easily screen-scraped
from a certain website and stored in csv files; someone on your team has already implemented
this functionality so you can rely on it, as explained further.
Furthermore, the social network’s visibility rules make a user’s posts visible by:
1.) Friends
2.) Friends of friends, defined transitively (i.e., friend of friend of friend…)
3.) Some users may have restricted the visibility of their posts as detailed below.
We want to check only the posts that are visible to the interviewer.
For the purposes of this project, “a questionable post” satisfies one of the following criteria:
1.) Contains words “drinking”, “wasted”, “snorted”, or “bong” and
has been liked the number of times that exceeds 20% of the poster’s friends
2.) Has been commented on the number of times that exceeds 30% of the poster’s friends
Architecture
Your solution will comprise three layers, written in different languages: Layer 1: a Scala
functional data processing engine; Layer 2: a C++ module that calls Prolog programmatically;
Layer 3: Prolog knowledge bases, one static and another generated at runtime. The Scala layer
will read the input files and generate a Prolog knowledge base. Then the Scala layer would
invoke the C++ layer by means of Scala JNI, so the C++ layer could programmatically invoke
the static and the generated Prolog knowledge bases. Your design should highlight the
strengths and minimize the weaknesses of functional and logic languages and paradigms. Your
solution will read in a collection of .csv files and will output a file containing the ids and
messages of the questionable posts.
The diagram below demonstrates a high-level interaction between the language modules.
Important: All the analysis functionality should be pushed onto the Prolog layer, so it could be
displayed for subsequent analysis for the interested parties. In other words, if someone
challenges you “show me how your system has arrived at this conclusion!”, you should be able
to demonstrate the actual logic that produced the results. Indeed, for non-CS but technically
savvy people, Prolog should be easier to understand than either Scala or C++. Please, notice
that this requirement is important and would incur a heavy points deduction if not fulfilled.
Recommended Implementation Strategy
You can follow this implementation strategy if you find it helpful.
1.) Write the Scala part that reads in the input .csv files and generates a Prolog knowledge
base with the corresponding facts.
2.) Write the sanitize.pl module that works with the generated knowledge bases to flag the
suspicious post ids and messages.
3.) Write the C++ module that invokes sanitize.pl and a generated knowledge base
programmatically. This module will be invoked from Scala by means of JNI.
4.) Writing a native method in your Scala layer that would invoke the C++ module.
5.) Use the ScalaH utility from Part 1 of the project to generate the .h header for your native
method. The header will contain the JNI method(s) that you will need to implement.
6.) Include the generated C++ header into your C++ module and implement the JNI method,
which will call your earlier implemented functionality for invoking Prolog
programmatically. This method would need to translate between regular C++ data
structures and those used by JNI.
7.) Write the run target of your Makefile for testing your solution. Make sure that the run
target correctly sets all the environment variables that you need to run your solution.
If this implementation strategy does not make sense to you for whatever reason, please feel
free to follow your own strategy. As long as you produce a solution that satisfies the
requirements, it should not make any difference how you arrived at your solution.
Technical Details
The network sanitizer can be implemented as Prolog knowledge bases, specified in two
separate files, containing: (1) facts about a client’s interaction with the social network
(web-scrapped and saved in a collection of .csv files for your convenience), and (2) predicates
and rules that can be used for flagging the suspicious posts.
I. Assume that the files are named “joe_info.pl” (generated at runtime from .csv input files) and
“sanitize.pl” (written by hand shared by all test cases), respectively.
If you were to execute the following commands from the Prolog prompt:
?- setof(Pair, toBeCheckedPost(joe, rick, Pair), S).
S = [(p2, ‘I am so wasted!’), (p3, ‘This is the biggest bong I have ever smoked!’), (p7,
‘Prolog is now my favorite language!’)].
These posts will need to be examined and potentially deleted, before joe schedules his
interview with rick. Notice how the built-in setof predicate produces the results at once without
duplication. Obviously, p7 is a false positive that would be dismissed by a human checker. Of
course, in your solution joe_info.pl will be generated at runtime, and both joe_info.pl and
sanitze.pl will be invoked from Scala via C++.
For each test case, you will be given a collection of input .csv files. Please, feel free to define
the Prolog knowledge base for your solution any way you like. But if you are unsure which facts
and rules you would need to implement your solution, consider the following facts that can be
used to hold the data in the provided input .csv files:
%user1 friended user2
friended(user1, user2).
user1 → user name (unique)
user2 → user name (unique)
post(id, user, msg).
id → the unique id for this post
user → the user’s name of this post
msg → the textual message of this post
like(user, postId).
user → the name of the user who liked this post with this postID
postId → the unique id of the liked post
comment(user, postId, msg).
user → the name of the user who made this comment
postId → the unique id of the post being commented on
msg → the actual text of the comment
permission_mfo(user).
“mfo” stands for “my friends only”.
When this user has defined this permission, only the user’s direct friends can see posts of
this user, thus overriding the default transitive visibility.
permission_exclude(user1, user2).
user1 has excluded user2 from seeing its own postings and those user1 gets to see through
its connections. In other words, even though user1 and user2 are friends, user2 does not
get to see any additional postings due to this friendship. However, user2 and user2’s
friends may still be able to see user1’s postings through some other connections, such as
being direct or transitive friends of user1.
In essence, you would need to define Prolog rules that would make it possible to execute the
following query:
toBeCheckedPost(User, Interviewer, (ID, M)).
User → the variable identifying your social network’s client
Interviewer → the variable identifying the user who plans to interview your client
Notice, it is assumed that the Interviewer is also
a member of your social network
(ID, M) → a tuple, where ID is the variable that stands for the postIDs of the
questionable posts that need to be further examined by a human analyst, and M is the
corresponding message.
You can download the input files here.
Your solution will be invoked via the run parameters to the Makefile as follows:
make candidate=name1 interviewer=name2 CSVDir=dirName run
Where name1 is the name of the candidate to be interviewed, name2 is the name of the
interviewer, and the dirName is a directory that contains a collection of .csv files, e.g.,
make candidate=ellen interviewer=rich CSVDir=input_ellen run
will create a disk file named ellen_rich.csv
The input .csv files are named based on the following convention:
friended.csv — contains the data that can be stored in friended facts
posts.csv — contains the data that can be stored in post facts
comments.csv — contains the data that can be stored in comment facts
likes.csv — contains the date that can be stored in like facts
p_exclude.csv — contains the data that can be stored in permission_exclude facts
p_mfo.csv — contains the data that can be stored in permission_mfo facts
To test your solution, follow these test cases:
make candidate=joe interviewer=rick CSVDir=input_joe run
Should generate a file named joe_rick.csv containing:
p2,’I am so wasted!’
p3,’This is the biggest bong I have ever smoked!’
p7,’Prolog is now my favorite language!’
https://drive.google.com/file/d/1IIdM5rE6xR3EKJiani2kqHT9TrVrN968/view?usp=sharing
make candidate=li interviewer=jose CSVDir=input_li run
Should generate a file named li_jose.csv containing:
p2,’I am so wasted!’
p3,’This is the biggest bong I have ever smoked!’
p6,’No way the semester is almost over!’
p7,’Prolog is now my favorite language!’
p9,’Scala Prolog and now what?!’
make candidate=li interviewer=annie CSVDir=input_li run
Should generate a file named li_annie.csv containing:
Looking good!
make candidate=shivani interviewer=rich CSVDir=input_shivani run
Should generate a file named shivani_rich.csv containing:
p2,’I am so wasted!’
p3,’This is the biggest bong I have ever smoked!’
p7,’Prolog is now my favorite language!’
p9,’Teamwork overrated!’
make candidate=shivani interviewer=ali CSVDir=input_shivani run
Should generate a file named shivani_ali.csv containing:
Looking good!
make candidate=ellen interviewer=rich CSVDir=input_ellen run
Should generate a file named ellen_rich.csv containing:
p2,’I am so wasted!’
p3,’This is the biggest bong I have ever smoked!’
p6,’No way the semester is almost over!’
p9,’We are a great team!’
make candidate=ellen interviewer=ali CSVDir=input_ellen run
Should generate a file named ellen_ali.csv containing:
Looking good!
Technical Advice
Prolog
In this project, we will use SWI-Prolog, which is installed on rlogin, and can be invoked via the
command swipl. You may want to consider installing Prolog on your laptop, but please, make
sure that your solution works on rlogin.
Your sanitize.pl module will contain several rules that should make it possible to execute the
toBeCheckedPost query described above. How you name your intermediate rules and what
they are is entirely up to you. In terms of design, you need to be able to determine if (1) the
interviewer would be able to see the posts from your client AND (2) which of your client’s posts
are suspicious. To answer each of these questions, you’d likely want to first create rules to
answer some sub-questions. For example, can user A see posts of user B? Is this message’s
text, numbers of likes or comments raise suspicion?
To invoke your Prolog code from C++ programmatically, you would probably want to create a
wrapper rule that takes a user name, an interviewer name, and a list variable, to which the
answer is bound. i.e., to_check(User, Interviewer, AnswerList) :- …
C++
Your C++ code will be compiled as a Unix dynamic library (.so object) that would contain both
Prolog and JNI dependencies. This layer will bridge Scala and Prolog. Since Scala does not
have its own native interface but compiles to Java bytecode, Scala can use the Java Native
Interface (JNI) to interact with C/C++ code. Shared objects can be tricky to work with. So to run
your solution, you would need to set both LD_LIBRARY_PATH and LD_PRELOAD
environmental variables. LD_LIBRARY_PATH is where the loader looks for your .so file; set this
variable to your current directory. LD_PRELOAD specifies which other dynamic libraries need to
be preloaded into memory before your .so is loaded. These libraries are
/snap/swi-prolog/current/usr/lib/libswipl.so and
/var/lib/snapd/snap/core18/current/lib/x86_64-linux-gnu/libtinfo.so.5
Troubleshooting
1. How to fix the error below I received when I run the Scala program?
If you run your solution outside your Makefile, you need to configure the running
environment in your working directory whenever you log in to rlogin by setting the
LD_LIBRARY_PATH and LD_PRELOAD environmental variables.
2. How to fix the error below when I run a .pl file?
swipl
cannot open path of the current working directory: Permission denied
SWI-prolog is a snap app. Snap seems to require that the current working directory that
you are in needs to have group rx: chmod g+rx .
3. How to fix the error below when I call a C++ function from Scala?
You need to declare your function name in your Scala file, for example:
@native def callProlog_functionName(s: String): Array[String]
4. Your program crashes with a strange error message
Put these two lines before you start invoking any Prolog API calls:
const char* dir = “SWI_HOME_DIR=/snap/swi-prolog/current/usr/lib/swipl/”;
putenv((char*)dir);
Part III: Write a blog post to reflect on your experiences
Write something interesting about your experiences of working on this project.
Grade breakdown
10% ScalaH tool–it has to work correctly generating a JNI header file given a class file as input.
50% Correctness: we will test the correctness of your solution based on the provided input and
also some student-provided test cases.
30% Adherence to the requirements, broken down as follows:
20% Leveraging the respective strengths of each language: We understand that you can
parse files, do calculations, and return results by just using a single Scala module
without any C++ or Prolog code, but this is not the goal of this project. You are to
demonstrate the achieved mastery in each programming paradigm and in combining
multiple languages within a single solution.
10%: adhering to functional and logic programming styles:
To get all of these points, you must not use any nonfunctional features as mentioned in
the spec AND demonstrate meaningful usage of functional programming features.
10% Blog
Submission Requirements
Part I:
You can follow any naming scheme you want, as we will test your submission by using your
Makefle. Your Makefile should be able to build your solution from scratch, including the
generation of the header file by means of your ScalaH tool. Your solution will also be tested by
using the run target of your Makefile, following the format specified above.
Part III:
The document for your blog (be it the blog itself or its URL).
Name it: Part3_PID1_PID2.pdf
Include all the source files in a zip archive, named PID1_PID2.zip and submit only this zip to
Canvas. Please, one submission per team as always.