机器学习代写: COMP9318 Final-Project fool classifier

## Instructions:
1. This note book contains instructions for **COMP9318 Final-Project**.

* You are required to complete your implementation in a file `submission.py` provided along with this notebook.

* You are not allowed to print out unnecessary stuff. We will not consider any output printed out on the screen. All results should be returned in appropriate data structures returned by corresponding functions.

* This notebook encompasses all the requisite details regarding the project. Detailed instructions including **CONSTRAINTS**, **FEEDBACK** and **EVALUATION** are provided in respective sections. In case of additional problem, you can post your query @ Piazza.

* This project is **time-consuming**, so it is highly advised that you start working on this as early as possible.

* You are allowed to use only the permitted libraries and modules (as mentioned in the **CONSTRAINTS** section). You should not import unnecessary modules/libraries, failing to import such modules at test time will lead to errors.

* You are **NOT ALLOWED** to use dictionaries and/or external data resources for this project.

* We will provide you **LIMITED FEEDBACK** for your submission (only **15** attempts allowed to each group). Instructions for the **FEEDBACK** and final submission are given in the **SUBMISSION** section.

* For **Final Evaluation** we will be using a different dataset, so your final scores may vary.

* Submission deadline for this assignment is **23:59:59 on 27-May, 2018**.
* **Late Penalty: 10-% on day-1 and 20% on each subsequent day.**

 

## Introduction:

In this Project, you are required to devise an algorithm/technique to fool a binary classifier named `target-classifier`. In this regard, you only have access to following information:

<br>
1. The `target-classifier` is a binary classifier classifying data to two categories, $\textit{i.e.}$, **class-1** and **class-0**.

2. You have access to part of classifiers’ training data, $\textit{i.e.}$, a sample of 540 paragraphs. 180 for **class-1**, and 360 for **class-0**, provided in the files: `class-1.txt` and `class-0.txt` respectively.

3. The `target-classifier` belong to the SVM family.

4. The `target-classifier` allows **EXACTLY 20 DISTINCT** modifications in each test sample.
5. You are provided with a test sample of **200** paragraphs from **class-1** (in the file: `test_data.txt`). You can use these test samples to get feedback from the target classifier (**only 15 attempts** allowed to each group.).
6. **NOTE: You are not allowed to use the data `test_data.txt` for your model training (if any). VIOLATIONS in this regard will get ZERO score**.

<br>
### -to-do:
* You are required to come up with an algorithm named `fool_classifier()` that makes best use of the above-mentioned information (**point 1-4**) to fool the `target-classifier`. By fooling the classifier we mean that your algorithm can help mis-classify a bunch of test instances (**point-5**) with minimal possible modifications (**EXACTLY 20 DISTINCT** modifications allowed to each test sample).

* **NOTE::** We put a **harsh limit** on the number of modifications allowed for each test instance. You are only allowed to modify each test sample by **EXACTLY 20 DISTINCT tokens (NO MORE NO LESS)**.

* **NOTE::** **ADDING** or **DELETING** one word at a time is **ONE** modification. Replacement will be considered as **TWO** modifications $(\textit{i.e.,}$ **Deletion** followed by **Insertion**).

 

 

## Constraints

Your implementation `submission.py` should comply with following constraints.

1. You should implement your methodology using `Python3`.
* You should implement your code in the function `fool_classifier()` in the file `submission.py`.
* You are only allowed to use pre-defined class `strategy()` defined in the file: `helper.py` in order to train your models (if any).
* You **should not** do any pre-processing on the data. We have already pre-processed the data for you.
* You are supposed to implement your algorithm using **scikit-learn (version=0.19.1)**. We will **NOT** accept implementations using other Libraries.

* You are **not supposed to augment** the data using external/additional resources. You are only allowed to use the partial training data provided to you ($\textit{i.e.,} $ `class-1.txt` and `class-0.txt`).

* You are **not** allowed to use the test samples ($\textit{i.e.,}$ `test_data.txt`) for model training and/or inference building. You can only use this data for testing, $\textit{i.e.,}$ calculating success %-age (as described in the **EVALUATION** section.). **VIOLATIONS IN THIS REGARD WILL GET ZERO SCORE**.

* You are **not** allowed to hard code the ground truth and any other information into your implementation `submission.py`.

* Considering the **RUNNING TIME**, your implementation is supposed to read the test data file ($\textit{i.e.,}$ `test_data.txt` with 200 test samples), process it and write the modified file (`modified_data.txt`) within **12 Minutes**.

* Each modified test sample in the modified file (`modified_data.txt`) should not differ from the original test sample corresponding to the file (`test_data.txt`) by more than 20 tokens.

* **NOTE::** Inserting or Deleting a word is **ONE** modification. Replacement will be considered as **TWO** modifications $(\textit{i.e.,}$ deletion followed by insertion).

 

## Submission Instructions:

* Please read these instructions **VERY CAREFULLY**.

### FEEDBACK:
* For this project, we will provide real-time feed-back on a test data ($\textit{i.e.,}$ the file `test_data.txt` containing **200** test cases).
* Each group is allowed to avail only **15 attempts in TOTAL**, so use your attempts **WISELY**.
* We will only provide **ACCUMULATIVE FEEDBACK** ($\textit{i.e.,}$ how many modified test samples out of **200** were classified as Class-0). We **WILL NOT** provide detailed feedback for individual test cases.
* For the feedback, you are required to submit the modified text file ($\textit{i.e.,}$ `modified_data.txt`) via the submission portal: http://kg.cse.unsw.edu.au:8318/project/ (using Group name and Group password).
* **NOTE::** Please make sure that the modified text file is generated by your program `fool_classifier()`, and it obeys the modification constraints. We have provided a function named: `check_data()` in the class: `strategy()`to check whether the modified file: `modified_data.txt` obeys the constraints.

3. Your algorithm should modify each test sample in `test_data.txt` by **EXACTLY 20 DISTINCT TOKENS**.

### Final Submission:
1. For final submission, you need to submit:
* Your code in the file `submission.py`
* A report (`report.pdf`) outlining your approach for this project.
2. We will release the detailed instructions for the final submission submission via Piazza.