程序代写代做代考 data mining python information retrieval data science algorithm database Java hadoop deep learning AI About COMP9318 (2018 s1)

About COMP9318 (2018 s1)

Wei Wang @ CSE, UNSW

February 24, 2018

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Introduction

Lecturer-in-charge:

Prof. Wei Wang

School of Computer Science and Engineering
Office: K17 507
E-mail: weiw@cse
Ext: 9385 7162
http: // www. cse. unsw. edu. au/ ~ weiw

Research Interests:

Knowledge graph / natural language processing

High-dimensional data / Similarity query processing

DB + AI

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

http://www.cse.unsw.edu.au/~weiw

COMP 9318

Course Info

Homepage: http://www.cse.unsw.edu.au/~cs9318

Communications:

Main form: Piazza Forum: https:
//piazza.com/configure-classes/spring2018/comp9318
Email: weiw AT cse.unsw.edu.au:

Only for matters that cannot/should not be resolved via
piazza.

Lectures:

1800 – 2100 MON, Keith Burrows Theatre

Tutorials: several online tutorials + ipython notebooks

Consultations: by appointment only.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

http://www.cse.unsw.edu.au/~cs9318
https://piazza.com/configure-classes/spring2018/comp9318
https://piazza.com/configure-classes/spring2018/comp9318

Assessment

Overview

1 written assignments + 1 programming project + lab

lab = np.mean(sorted([lab1, lab2, lab3, lab4,

lab5], reverse=True)[:3])

Read the spec to find out late penalty policies.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Finally . . .

Exam

If you are ill on the day of the exam, do not attend the exam
— I will not accept medical special consideration claims from
people who have already attempted the exam.

Final Mark

Final mark

final mark = 0.15 · (ass1 + proj1 + lab) + 0.55 · exam

Also requires exam ≥ 40.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Warning I

This course has

Broad coverage

Heavy workload

High fail rate ≥ 20%
Plagiarism is not allowed. Make sure you read all types of
plagiarism, esp. collusion in
https://student.unsw.edu.au/plagiarism.

Specially, we do not accept personal plea or excuses; if you have
valid reasons that affect your performance, apply for a UNSW
Special Consideration:
https://student.unsw.edu.au/special-consideration.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

https://student.unsw.edu.au/plagiarism
https://student.unsw.edu.au/special-consideration

Warning II

Example excuse

I spent so much time and effort on this course but still failed?

I did the work by myself and may have shared it with my
classmate for discussion.

If I fail this course, I will […]. Please.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Resources I

Lecture Slides

Contains many materials not found in the text/reference
books.

Text Book

Leskovec et al, Mining of Massive Datasets (ver 2.1),
Available at
http://infolab.stanford.edu/~ullman/mmds.html

Jensen et al, Multidimensional Databases and Data
Warehousing. (Accessible from a UNSW IP)

Han et al, Data Mining: Concepts and Techniques, 1st/2nd
edition, Kaufmann Publishers.

Reference Books

Tan et al, Introduction to Data Mining, Addison-Wesley, 2005.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

http://infolab.stanford.edu/~ullman/mmds.html

Resources II

Witten et al, Data Mining: Practical Machine Learning Tools
and Techniques with Java Implementations, 1st/2nd edition,
Morgan Kaufmann.

Charu Aggarwal, Data Mining: The Textbook, Springer, 2015.

Software

Anaconda

Python 3

Jupyter notebook

Python libs such as numpy, pandas, matplotlib,
scikit-learn, . . .

Reading Materials

Papers from machine learning/data mining
conferences/journals, white papers, surveys, etc.

All available from the course Web page.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Schedule (tentative)

Week Contents Assignments

1 Course overview + Introduction lab
2 Data warehousing and OLAP
3 Maths review + Data Preprocessing lab
4 Data Preprocessing + Classification
5 Classification ass1
BREAK
6 Classification
7 Classification lab, proj1
8 Classification
9 Clustering
10 Clustering + Association Rule Mining lab
11 Association Rule Mining lab
12 Advanced topic + review

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Course Objective and Requirements

Objectives:

Cover practically useful data mining/machine learning
algorithms and concepts

Foster deeper understanding of maths, models, and
algorithms

Gain hands-on experience with solving real problems

Requirements:

You need to have a solid background in Maths (Linear
Algebra, Calculus, Probability & Statistics) and programming
(mainly python).

Understand (not memorize) concepts/equations/algorithms.

Ask why.
Describe it in your own language to a layman.

Feedback welcome (throughout the course).

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Example

Example

John got a positive result for the α test, and the probability that
patients with the deadly β disease having a positive α test result is
99%. Should John be worried about having the β disease?

P(β | α) =
P(α | β)P(β)

P(α)
= 0.99

P(β)

P(α)

P(β | α) =
P(α | β)P(β)

P(α | β)P(β) + P(α | ¬β)P(¬β)

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Example

Example

John got a positive result for the α test, and the probability that
patients with the deadly β disease having a positive α test result is
99%. Should John be worried about having the β disease?

P(β | α) =
P(α | β)P(β)

P(α)
= 0.99

P(β)

P(α)

P(β | α) =
P(α | β)P(β)

P(α | β)P(β) + P(α | ¬β)P(¬β)

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Example

Example

John got a positive result for the α test, and the probability that
patients with the deadly β disease having a positive α test result is
99%. Should John be worried about having the β disease?

P(β | α) =
P(α | β)P(β)

P(α)
= 0.99

P(β)

P(α)

P(β | α) =
P(α | β)P(β)

P(α | β)P(β) + P(α | ¬β)P(¬β)

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Example

Exercise

Exercise: plot the function P(β | α) with respect to P(α | ¬β)
given P(β) = 8

100,000
.

0 2 4 6 8 10

10−1

100

101

102

P(α | ¬β) (Percentage)

P


)

(P
e
rc

e
n

ta
g

e
)

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

CSE Computing Environment

For those new to the computing environment at CSE, UNSW

Use Linux/command line.

Project marked on linux servers
You need to be able to upload, run, and test your program
under linux.

Assignment/Project submission

Give to submit. Watch out for possible error messages.
Classrun. Check your submission, marks, etc. Read
https://wiki.cse.unsw.edu.au/give/Classrun
Common errors:

File corrupt (during SFTP?), not in the correct format.
Submission not accepted by the system (wrong filename? too
large? . . . ).

Lab submission: our home-made Web submission system.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

https://wiki.cse.unsw.edu.au/give/Classrun

Other Specialised Courses

Other specialised courses in the Database or Data Science stream:

COMP9319: Advanced algorithms on compression, text/XML
databases, etc.

COMP9313: Big data systems (hadoop, spark, etc)

COMP6714: Information retrieval, Natural language
processing, Search engines.

Other machine learning courses:

COMP9417: Machine Learning and Data Mining

COMP9444: Neural Networks and Deep Learning

COMP9418: Advanced Machine Learning

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Research and Development Opportunities with us

Talk to me about PhD/Master/Honour/Research Project
opportunities in the area of data management, text mining,
machine learning, and natural language processing.

PhD scholarship and/or top-ups available.

Special research project (12UoC or 18UoC) for MIT students
— needs to contact me by the end of this semester.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

About Learning

Things to ponder:

The long-term impact of the latest development in
AI/DS/Hardware.

What do you want out of this course?

Requirement:

Plan ahead for the course.

Learning happens outside your comfortable zone.

Review teaching materials after the lecture.

Use the Jupyter notebooks.

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

Make Errors and Learning Sth. New

Source:

http://combiboilersleeds.com/images/comfort-zone/comfort-zone-0.jpg

Wei Wang @ CSE, UNSW About COMP9318 (2018 s1)

http://combiboilersleeds.com/images/comfort-zone/comfort-zone-0.jpg