Lecture 1: Introduction and Overview
Introduction to Machine Learning Semester 1, 2022
Copyright @ University of Melbourne 2022. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author.
Copyright By PowCoder代写 加微信 powcoder
This lecture
• Introduction and Warm-up
• Housekeeping COMP90049 • Machine Learning
Intros & Warm-up
Introductions
• Lecturer in CIS since 2019
• Research in natural language processing • PhD from Edinburgh University
• 1.5 years research in industry (Amazon)
Introductions
• Lecturer in CIS since 2019
• Research in natural language processing • PhD from Edinburgh University
• 1.5 years research in industry (Amazon)
Please go to: pollev.com/iml2022
Warm-up (3 minutes)
With your neighbor / in the chat:
1. Think of 3 words that describe your expectation of this subject 2. Think of a short definition of machine learning?
What is learning?
What is machine learning?
The basic incredients of ML
Learning what? : Task to accomplish a goal, e.g.,
• Assign continuous values to inputs (essay → grade)
• Group inputs into known classes (email → {spam, no-spam}) • Understand regularities in the data
Learning from what? : Data
• Where do the data come from?
• Is the data reliable? Representative?
How do we learn? : Define a model
• an algorithm to predict a certain outcome for an input
• typically a function with parameters
• derive a learning algorithm to find the best model parameters
How do we know learning is happening?
• The algorithm improves at its task with exposure to more data • We need to be able to evaluate performance objectively
About COMP90049
COMP90049 – Who?
Coordinator & Lecturer
Head tutor Tutors
COMP90049 – How?
The subject is offered as dual delivery
• Lectures are on campus and live-streamed (and recorded)
• Workshops are either on campus or live on zoom (not hybrid)
• All recordings and other materials will be made available online through Canvas
• You are the first to experience dual delivery. Your feedback is very welcome!
COMP90049 – What?
• Topics include: classification, clustering, optimization, unsupervised learning, semi-supervised learning, neural networks
• All from a theoretical and practical perspective
• Refreshers on maths and programming basics
• Theory in the lectures (some live-coding and demo-ing of libraries and toolkits)
• Hands-on experience in workshops and projects
• Guest lecture 1: academic writing skills
• Guest lecture 2: bias and fairness in machine learning
COMP90049 – – What? : Intended Learning Outcomes
• Understand elementary mathematical concepts used in machine learning
• Derive machine learning models from first principles
• Design, implement, and evaluate machine learning systems for real-world problems
• Identify the correct machine learning model for a given real-world problem
COMP90049 – Lectures
Lecture 1 Lecture 2
Wed 17:15-18:15
(MSD)-B117
Fri 14:15-15:15
Arts West West Wing-B101 ( Theatre)
• Most lectures are live streamed through Lecture Capture
• Later in the semester, some lectures will be pre-recorded, and we’ll have a live Q&A session instead (“Flipped classroom”)
COMP90049 – Lectures
Lecture 1 Lecture 2
Wed 17:15-18:15
(MSD)-B117
Fri 14:15-15:15
Arts West West Wing-B101 ( Theatre)
Lecture content
• Derivation of ML algorithms from scratch • Motivation and context
• Some coding demos in Python
COMP90049 – Workshops
• start from week 2
• 1 hour per week
• ∼ 14 slots, please sign up and stick to one • Online (Zoom) or face to face
Workshop Content
• Practical exercises
• Working through numerical examples
• Revising theoretical concepts from the lectures
Consultations
See link on Canvas homepage for schedule (subject to change). All consultations are held online via Zoom. Consultations are optional additional support. They are run by different tutors who will answer your questions on the respective materials: come prepared!
Coding consultations
• Starting week 3, for the first half of semester
• 2-hour blocks (join in at any point and stay for as long as you like) • You can ask questions around Python / the weekly code snippets • Not an assignment consultation
Maths consultations
• Starting week 3
• 1 hour at different times (changing weekly)
• Clarify mathematical concepts (probability, optimization, …)
Assignment consultations
• 1-2 sessions per assignment for clarification sessions. Usually half a week to a week before submission.
COMP90049 – Subject Communication I
For general questions
• Default: Post on the Piazza discussion board
• Backup option 1: Email the head tutor (Hasti) or your tutor • Backup option 2: Email the lecturer
• Actively engage by asking and answering questions. Peer teaching is the most effective way of learning!
• (Of course no assignment solutions should be given away. Doing – or asking for – this is academic misconduct.)
Personal/private concerns: Email head tutor or lecture, e.g., • With specific assignment questions
• With private or personal concerns
• Constructive feedback, always very welcome!
COMP90049 – Subject Communication II
We need 2 or 3 student representatives
• Communication channel between class and teaching team • Collect and pass on (anonymous) feedback or complaints • Attend a student-staff meeting during the semester (TBD) • Represent the diversity of the class
Interested? Send me an email with a short paragraph on why you want this role.
Expected Background
Programming concepts
• We will be using Python and Jupyter Notebooks
• Basic familiarity with libraries (numpy, scikit-learn, scipy)
• You need to be able to write code to process your data, apply different algorithms, and evaluate the output
• Optional practice / demo Jupyter notebooks (most weeks)
Expected Background
Programming concepts
• We will be using Python and Jupyter Notebooks
• Basic familiarity with libraries (numpy, scikit-learn, scipy)
• You need to be able to write code to process your data, apply different algorithms, and evaluate the output
• Optional practice / demo Jupyter notebooks (most weeks)
Mathematical concepts
• formal maths notation
• basic probability, statistics, calculus, geometry, linear algebra • (why?)
What Level of Maths are we Talking?
ln P(y=true|x) =w·f 1 − P(y = true|x)
P(y = true|x) =ew·f 1 − P(y = true|x)
P(y = true|x) =ew·f − ew·f P(y = true|x) P(y = true|x) + ew·f P(y = true|x) =ew·f
ew·f 1 P(y =true|x)=1+ew·f = 1+e−w·f
P(y = false|x) = 1 = e−w·f 1+ew·f 1+e−w·f
What Level of Maths are we Talking?
P(y = 1|x;β) = hβ(x) P(y =0|x;β)=1−hβ(x)
→P(y|x;β)=(hβ(x))y ∗(1−hβ(x))1−y n
argmax P(yi |xi ; β)
= argmax (hβ (xi ))yi ∗ (1 − hβ (xi ))1−yi
=argmaxyi loghβ(xi)+(1−yi)log(1−hβ(xi))
Assessment
Two small coding projects (30%)
• Project 1: release week 2, due week 3
• Project 2: release week 5, due week 6
• Jupyter notebooks; Read in data, apply ML algorithm(s), evaluate.
Open-ended research project (30%)
• Release week 7, due week 10
• You will be given a data set and will formulate a research question and write a short research paper on your findings. You will be graded based on the quality of your report.
Final exam (40%)
• during exam period
• 2 hours; closed-book
• Hurdle requirement: you have to pass the exam (≥ 50%).
Academic Honesty
• Videos & Quiz
• Linked from Canvas ’Home’ page (or in Modules) • CIS-specific scenarios
What and Why of Machine Learning?
What is Machine Learning? (A common mis-conception 😉
What is Machine Learning? (A common mis-conception 😉
https://xkcd.com/1838/
(you’re sitting in the right class!)
Source: https://www.springboard.com/blog/machine-learning-engineer-salary-guide/
Three ingredients for machine learning
… and related questions
Three ingredients for machine learning
… and related questions
• Discrete vs continuous vs …
• Big data vs small data
• Labeled data vs unlabeled data • Public vs sensitive data
Three ingredients for machine learning
… and related questions
• function mapping from inputs to outputs
• motivated by a data generating hypothesis • probabilistic machine learning models
• geometric machine learning models
• parameters of the function are unknown
Three ingredients for machine learning
… and related questions
• Improving (on a task) after data is taken into account • Finding the best model parameters (for a given task) • Supervised vs. unsupervised learning
ML Example Problem
ML Example Problems
• Scenario 1
You are a data scientist at a new streaming service for Australian music. You just got access to data collected from your first 10,000 customers (location, clicks, listening time, etc). You want to get a better idea about types of typical customer behavior to further improve your service. What would you do?
ML Example Problems
• Scenario 1
You are a data scientist at a new streaming service for Australian music. You just got access to data collected from your first 10,000 customers (location, clicks, listening time, etc). You want to get a better idea about types of typical customer behavior to further improve your service. What would you do?
• Solution:
Identify groups of customers that share similar behavior, e.g., like the same kinds of music; or stay for a similar amount of time on the website
CLUSTERING
ML Example Problems
• Scenario 2:
The streaming service has run for a number of months, and collected some popularity data on the listed songs (e.g., number of times listened to). Each week several hundred new bands want to be listed on the platform, and you want to quickly determine which ones are likely promising.
ML Example Problems
• Scenario 2:
The streaming service has run for a number of months, and collected some popularity data on the listed songs (e.g., number of times listened to). Each week several hundred new bands want to be listed on the platform, and you want to quickly determine which ones are likely promising.
• Solution:
Identify some easily measurable and relevant properties of the bands: genre, song length, etc and compare them to the corresponding properties of successful bands (or songs) already listed on the platform.
SUPERVISED CLASSIFICATION ;
ML Example Problems
• Scenario 3:
Now you want to get more specific: for a given new song you want to estimate the number of times it will be listened to in the first two weeks after being listed on the platform.
ML Example Problems
• Scenario 3:
Now you want to get more specific: for a given new song you want to estimate the number of times it will be listened to in the first two weeks after being listed on the platform.
• Solution:
Again, define a set of relevant properties of songs. Take the songs already listed on the platform, and their associated known clicks. Estimate a function predicts the number of clicks based on your defined feature set.
REGRESSION
More Applications
• natural language processing • image classification
• stock market prediction
• movie recommendation
• web search
• medical diagnoses
• spam / malware detection • …
Machine Learning, Ethics, and Transparency
commons.wikimedia.org/wiki/File:Pseudo-algorithm comparison for my slides on machine learning ethics.svg
Def 1. Discrimination= To make distinctions.
For example, in supervised ML, for a given instance, we might try to discriminate between the various possible classes.
Machine Learning, Ethics, and Transparency
commons.wikimedia.org/wiki/File:Pseudo-algorithm comparison for my slides on machine learning ethics.svg
Def 2. Discrimination= To make decisions based on prejudice.
Digital computers have no volition, and consequently cannot be prejudiced. However, the data may contain information which leads to an application where the ensuing behavior is prejudicial, intentionally or otherwise.
Machine Learning gone wrong…
Machine Learning gone wrong…
Machine Learning gone wrong…
Machine Learning and Ethics
Not everything that can be done, should be done
• Attributes in the data can encode information in an indirect way
• For example, home address and occupation can be used (perhaps with other
seemingly-banal data) to infer age and social standing of an individual
• Potential legal exposure due to implicit “knowledge” used by a classifier
• Just because you didn’t realize doesn’t mean that you shouldn’t have realized, or at least, made reasonable efforts to check
Questions to Ask
• Who is permitted to access the data?
• For what purpose was the data collected?
• What kinds of conclusions are legitimate?
• If our conclusions defy common sense, are there confounding factors? • Could my research / application be abused (dual use)?
• COMP90049 Overview
• What is machine learning?
• Why is it important? Some use cases. • What can go wrong?
Next lecture: Concepts in machine learning
References i
. Natural Language Processing. MIT Press (2019)
Deisenroth, A Aldo Faisal, and Ong. Mathematics for Machine Learning. Cambridge University Press (forthcoming)
. Pattern Rechognition and Machine Learning. Springer (2009) . Machine Learning. McGraw-Hill, , USA (1997).
References ii
Microsoft’s AI robot goes dark.
https: //www.reuters.com/article/us-microsoft-twitter-bot-idUSKCN0WQ2LA
Amazon scraps secret recruiting tool.
https://www.reuters.com/article/ us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
Predictive policing algorithms are racist.
https: //www.reuters.com/article/us-microsoft-twitter-bot-idUSKCN0WQ2LA
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com