CSCI 4144/6405 – Data Mining and Data Warehousing
Assignment 4: Association Rule Mining Due: 11:55pm, Apr. 3 (Sunday), 2022
TA Office Hours via MS Teams (Specifically, via the channel “Office Hour – TA”):
• Mondays (11:00am-11:59am, Java and Python): -MacLeod
Copyright By PowCoder代写 加微信 powcoder
• Tuesdays (11:00am-11:59am, Python Only):
• Thursdays (11:00am-11:59am, Java and Python):
• Fridays (11:30am-12:30pm, Java and Python):
1. Assignment Overview
In this assignment, you need to write a program, which uses the Apriori algorithm to generate frequent itemsets and thereafter provide all strong association rules. The major objective of this assignment is to get yourself familiar with association rule mining.
2. Important Note
There is a zero-tolerance policy on academic offenses such as plagiarism or inappropriate collaboration. By submitting your solution for this assignment, you acknowledge that the code submitted is your own work. You also agree that your code may be submitted to a plagiarism detection software (such as MOSS) that may have servers located outside Canada unless you have notified me otherwise, in writing, before the submission deadline. Any suspected act of plagiarism will be reported to the Faculty’s Academic Integrity Officer in accordance with Dalhousie University’s regulations regarding Academic Integrity. Please note that:
1) The assignments are individual assignments. You can discuss the problems with your friends/classmates, but you need to write your program by yourself. There should not be much similarity in terms of coding.
2) When you refer to some online resources to complete your program, you need to understand the mechanism, then write your own code. Your code should not be similar to the online resources. In addition, you should cite the sources via comments in your program.
3. Detailed Requirements
1) Overview: The Apriori Algorithm is used to generate frequent itemsets. Once the frequent itemsets are available, the association rules can be easily generated. The details of this method can be found in Chapter 6 of the textbook.
2) Sample Data Set: A sample data set file titled “Play_Tennis_Data_Set.csv” is used as the data set for this assignment. Specifically:
a) CSV stands for Comma-Separated Values. A CSV file is a text file that uses a comma to separate values. Often, the first record in a CSV file is a header line including a list of field names. Therefore, it is very easy to dig into a CSV file and look for useful information. You can use any text editor to open a CSV file and view its content. More details about CSV can be found here: https://en.wikipedia.org/wiki/Comma-separated_values
b) The sample data set is available via brightspace. There are 14 records (not including the header line) in the data set. The data set includes 5 fields: Outlook, Temperature, Humidity, Windy, and PlayTennis. Please note that these fields are equally important in terms of association rule mining. Namely, PlayTennis is not a special field; both “{Humidity=normal} => {PlayTennis=P}” and “{PlayTennis=P} => {Humidity=normal}” are possible association rules.
3) Required Program: You need to write a program that generates the association rules based on the frequent itemsets resulting from the Apriori Algorithm. Here are the detailed requirements:
a) You should place “Play_Tennis_Data_Set.csv” in the directory where your program file is located.
b) The name of your program should be “RuleMining”. After RuleMining is executed via the command-based interface, RuleMining should prompt the user to enter the minimum support threshold and the minimum confidence threshold (they are denoted as “min_sup” and “min_conf” respectively in ASN 4), which are used to generate the frequent itemsets and association rules.
a. The user input for min_sup and min_conf should be fraction values, such as 0.25. b. Note that the Apriori algorithm uses the minimum support count to generate frequent itemsets. In this assignment, the minimum support count should be obtained by multiplying min_sup by the total number of tuples in the data set, and thereafter rounding the product up to the closest integer. For example, with the provided sample data set, when min_sup=0.25, the minimum support count is
equal to é0.25 x 14ù = é3.5ù = 4.
c) With the provided minimum support and minimum confidence, your program will read
“Play_Tennis_Data_Set.csv” and generate all strong association rules. The resulting association rules will be saved in a file named “Rules.txt”, which should be placed in the directory where your program file is located.
d) Appendix 1 at the end of this file illustrates the appropriate structure of “Rules.txt”. Please note:
a. The order of the generated rules does not matter. As long as the set of rules are correct, you will not lose marks.
b. The support and confidence of each generated rule should be rounded to the nearest hundredth. For example, if the original value of the support is 0.256, it should be rounded to 0.26. If the original value of the support is 0.264, it should be rounded to 0.26 too.
4) Testing Platform and Required Language: The details of the testing platform and the required programming language are presented as follows.
a) Testing Data Set: The sample data set is used to test your program. Varied min_sup/min_conf combinations will be used to verify the robustness of your program.
b) Testing Server: “timberlea.cs.dal.ca” is the computer used by the TA to evaluate your program. Therefore, you need to make sure that your program works on timberlea.
a. You can use your CS ID to log on to “timberlea.cs.dal.ca” in order to write your program. Alternatively, you can write your program on other machines, then transfer your program to timeberlea and thereafter test it on timberlea.
b. If you do not know your CS ID, you can visit the following webpage to get your CS ID. If your CS ID does not work or you have a question about your CS ID, please send an email to
https://csid.cs.dal.ca/
c) Required Programming Language: You need to use Java or Python as the programming language because timberlea supports these languages. Note that both Python 2 and Python 3 are available on timberlea.cs.dal.ca. You can use “python2 –version” and “python3 –version” to check the specific versions on timberlea. In addition, you can only use the following header file or libraries in your program:
a. Java: java.io.*, java.util.*, java.lang.Math
b. Python: csv, math, itertools
d) Compiling and running your program on timberlea.cs.dal.ca should not lead to errors or warnings. To compile and run your program on timberlea, you need to be able to access the command-line interface of timerlea. In addition, you need to be able to upload a file to or download a file from timberlea.
a. To access command-line interface of timerlea, you can use the software tool “putty” on MS Windows computers. “putty” can be downloaded here: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html . On Mac and Linux computers, you can use the command “ssh” to access timberlea via the program called “Terminal”.
b. To transfer files between your computer and timberlea, several different methods could be used. Here are two methods for MS Windows and macOS/Linux computers.
i. MS Windows Computer: WinSCP is popular tool used to transfer files between two computers. You can download WinSCP from the following webpage: https://winscp.net/eng/download.php . The documentation for WinSCP can be found here: https://winscp.net/eng/docs/start. Specifically, you can focus on the “Uploading Files” and “Downloading Files” section of this document to understand how to transfer files.
ii. Mac and Linux Computer: On Mac and Linux computers, you can use the command “scp” to transfer files. Here is a tutorial on the command “scp”: https://www.linuxtechi.com/scp-command-examples-in-linux/.
5) Readme File: You need to complete a readme file named “Readme.txt”, which includes the instructions that the TA could use to compile and execute your program on timberlea.
6) Submission: Please pay attention to the following submission requirements:
a) You should place “Readme.txt” in the directory where your program file is located.
b) You should place “Play_Tennis_Data_Set.csv” in the directory where your program file is
c) Your program file, “Readme.txt”, and “Play_Tennis_Data_Set.csv” should be compressed
into a zip file named “ASN4-YourFirstName-YourLastName.zip”. For example, my zip file should be called “ASN4-Qiang-Ye.zip”. Finally, you need to submit your zip file for this assignment via brightspace.
a. Appendix 2 at the end of this document includes the commands that you can use to compress your files on timberlea.
4. Grading Criteria
The marker will use your submitted zip file to evaluate your assignment. The full grade is 17 points. The details of the grading criteria are presented as follows.
1) Does “Readme.txt” include enough information so that the TA can easily compile and execute the program on timberlea? (1 Point)
2) The marker will use three test cases to evaluate your program. Each test case corresponds to a unique pair of min_sup and min_conf.
a) Test Case #1 (4 Points)
b) Test Case #2 (4 Points)
c) Test Case #3 (4 Points)
3) Overall Quality of the Program (i.e. whether the structure of the program is clear and reasonable, whether the program is properly commented, whether the indentation is appropriate, etc). (4 Points)
5. Academic Integrity
At Dalhousie University, we respect the values of academic integrity: honesty, trust, fairness, responsibility and respect. As a student, adherence to the values of academic integrity and related policies is a requirement of being part of the academic community at Dalhousie University.
1) What does academic integrity mean?
Academic integrity means being honest in the fulfillment of your academic responsibilities thus establishing mutual trust. Fairness is essential to the interactions of the academic community and is achieved through respect for the opinions and ideas of others. Violations of intellectual honesty are offensive to the entire academic community, not just to the individual faculty member and students in whose class an offence occur (See Intellectual Honesty section of University Calendar).
2) How can you achieve academic integrity?
– Make sure you understand Dalhousie’s policies on academic integrity.
– Give appropriate credit to the sources used in your assignment such as written or oral work, computer codes/programs, artistic or architectural works, scientific projects, performances, web page designs, graphical representations, diagrams, videos, and images. Use RefWorks to keep track of your research and edit and format bibliographies in the citation style required by the instructor. (See http://www.library.dal.ca/How/RefWorks)
– Do not download the work of another from the Internet and submit it as your own.
– Do not submit work that has been completed through collaboration or previously submitted for another assignment without permission from your instructor.
– Do not write an examination or test for someone else.
– Do not falsify data or lab results.
These examples should be considered only as a guide and not an exhaustive list.
3) What will happen if an allegation of an academic offence is made against you?
I am required to report a suspected offence. The full process is outlined in the Discipline flow chart, which can be found at: http://academicintegrity.dal.ca/Files/AcademicDisciplineProcess.pdf and includes the following: a. Each Faculty has an Academic Integrity Officer (AIO) who receives allegations from instructors. b. The AIO decides whether to proceed with the allegation and you will be notified of the process. c. If the case proceeds, you will receive an INC (incomplete) grade until the matter is resolved. d. If you are found guilty of an academic offence, a penalty will be assigned ranging from a warning to a suspension or expulsion from the University and can include a notation on your transcript, failure of the assignment or failure of the course. All penalties are academic in nature.
4) Where can you turn for help?
– If you are ever unsure about ANYTHING, contact myself.
– The Academic Integrity website (http://academicintegrity.dal.ca) has links to policies, definitions, online tutorials, tips on citing and paraphrasing.
– The Writing Center provides assistance with proofreading, writing styles, citations.
– Dalhousie Libraries have workshops, online tutorials, citation guides, Assignment Calculator, RefWorks, etc.
– The Dalhousie Student Advocacy Service assists students with academic appeals and student discipline procedures.
– The Senate Office provides links to a list of Academic Integrity Officers, discipline flow chart, and Senate Discipline Committee.
Appendix 1: Format of “Rules.tex”
1. User Input:
Support=0.30
Confidence=0.60
Rule#1: {Humidity=normal} => {PlayTennis=P} (Support=0.43, Confidence=0.86)
Rule#2: {PlayTennis=P} => {Humidity=normal}
(Support=0.43, Confidence=0.67) ….
Appendix 2: How to Use Zip and Unzip on Timberlea
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com