Lecture 23: Recap and Exam Info
Semester 1, 2022 Please complete the End of Semester Survey:
, CIS https://unimelb.bluera.com/unimelb/
Copyright @ University of Melbourne 2022. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without
Copyright By PowCoder代写 加微信 powcoder
written permission from the author.
A semester of machine learning (?)
Source: https://www.evalotta.net/blog/2016/4/19/on-practice
This lecture
• Details on the exam
• Recap of the subject content
Exam Details
Date, time, format…
• Time: Wednesday, June 8th at 3pm
• Duration: 2 hours, with an additional 15 minutes of reading time
• Format: Canvas Assignment. Online. Not invigilated.
• You are allowed to use authorized materials (more soon).
• We accept exams submitted up to 15 minutes after the due time with no penalty, to give you time to scan handwritten sheets and upload all your answers.
• Submissions more than 15 minutes after the due time will not be marked and considered as fail.
Aim to submit on time (!)
Exam Content Details
• Worth 40% of your grade
• A number of questions of three different categories (coming up next)
• You should attempt all questions (no pick-and-choose)
• Questions have different weight (!)
• The exam is worth 120 marks, i.e., ≈ 1 mark per minute. The marks associated with a question will give you an idea about how much time you should spend on it.
Exam Format Details
“You should enter your answers in a Word document or PDF, which can include typed and/or hand-written an- swers.
You should answer each question on a separate page, i.e., start a new page for Question 1, Question 2, etc – parts within questions do not need new pages. Write the question number clearly at the top of each page.
You have unlimited attempts to submit your answer-file, but only your last submission is used for marking. If your last submission arrives more than 15 minutes after the due time, you will fail the exam.”
Exam Format Details
Authorised materials:
Calculators:
Lecture slides, workshop materi- als, prescribed reading, your own project reports.
You must not use materials other than those autho- rised above. You are not permitted to communicate with others for the duration of the exam, other than to ask ques- tions of the teaching staff via the exam chat support. Your computer, phone and/or tablet should only be used to access the authorised materials, enter or photograph your answers, and upload these files. The work you sub- mit must be based on your own knowledge and skills, without assistance from any person or unauthorized ma- terials.
Exam Chat Support
We will use Big Blue Button, the standard chat support system integrated into Canvas. It will magically appear in our subject Canvas page shortly before the exam begins.
For more information visit
• https: //lms.unimelb.edu.au/students/student-guides/exam-support
• https: //students.unimelb.edu.au/your-course/manage-your-course/ exams- assessments- and- results/exams/support- services
Section A: Short answer Questions
Section A: Short answer Questions
• Requiring you to explain or compare concepts covered in this subject. • some may require a small amount of calculation
• to be answered in 1-3 (handwritten) lines, unless otherwise instructed
Section A: Short answer Questions
Section A: Short answer Questions
• Requiring you to explain or compare concepts covered in this subject. • some may require a small amount of calculation
• to be answered in 1-3 (handwritten) lines, unless otherwise instructed
Section B: Method Questions
Section B: Method Questions
• Resembling Workshop Questions
• demonstrate your conceptual understanding of the methods that we have studied in this subject.
• usually involve some calculations, and you will need to show your calculations, or (less commonly) describe the logical process with which you arrived at an answer (i.e., not just state the answer)
Section B: Method Questions
Section B: Method Questions
• Resembling Workshop Questions
• demonstrate your conceptual understanding of the methods that we have studied in this subject.
• usually involve some calculations, and you will need to show your calculations, or (less commonly) describe the logical process with which you arrived at an answer (i.e., not just state the answer)
Section C: Design and Application Questions
Section C: Design and Application Questions • Resembling Assignment Questions
• demonstrate that you have gained a high-level understanding of the methods and algorithms covered in this subject, and can apply that understanding.
• Expected answer to each question to be from one third of a page to one full page in length (hand-written).
• Require significantly more thought than Sections A or B, and should be attempted last.
Section C: Design and Application Questions
Section C: Design and Application Questions • Resembling Assignment Questions
• demonstrate that you have gained a high-level understanding of the methods and algorithms covered in this subject, and can apply that understanding.
• Expected answer to each question to be from one third of a page to one full page in length (hand-written).
• Require significantly more thought than Sections A or B, and should be attempted last.
Recap part I: Basic Concepts in Machine Learning
What is machine learning?
“We are drowning in information, but we are starved for knowledge” , Megatrends
Our definition of Machine Learning
automatic extraction of valid, novel, useful and comprehensible knowledge (rules, regularities, patterns, constraints, models, …) from arbitrary sets of data
Three ingredients for machine learning
• Discrete vs continuous vs …
• Big data vs small data
• Labeled data vs unlabeled data • Public vs sensitive data
• function mapping from inputs to outputs • parameters of the function are unknown • probabilistic vs geometric models
• Improving (on a task) after data is taken into account • Finding the best model parameters (for a given task) • Supervised vs. unsupervised
Terminology
• The input to a machine learning system consists of:
• Instances: the individual, independent examples of a concept, also
known as exemplars
• Attributes: measuring aspects of an instance also known as
• Concepts: things that we aim to learn generally in the form of labels or classes
Instance Topology
• Instances characterised as “feature vectors”, defined by a predetermined set of attributes
• Input to learning scheme: set of instances/dataset
• Flat file representation
• No relationships between objects
• No explicit relationship between attributes
Instance Topology
• Instances characterised as “feature vectors”, defined by a predetermined set of attributes
• Input to learning scheme: set of instances/dataset
• Flat file representation
• No relationships between objects
• No explicit relationship between attributes
• Possible attribute types (levels of measurement):
1. nominal
2. ordinal
3. continuous
Also: Feature Selection Why? How?
Recap part II: Linear Classification
Naive Bayes I
Task: classify an instance D = ⟨x1 , x2 , …, xn ⟩ according to one of the classes cj ∈C
argmax P(cj |x1, x2, …, xn) (1) cj ∈C
argmax P(cj )P(x1, x2, …, xn|cj ) (2) cj ∈C P(x1, x2, …, xn)
argmax P(cj )P(x1, x2, …, xn|cj ) (3) cj ∈C
argmax P(cj ) P(xi |cj ) (4)
Posterior P(cj|x1,x2,…,xn)
What does the equality between (3) and (4) imply?
= prior∗likelihood
i evidence
Naive Bayes II: Smoothing and estimation
The problem with unseen features
• If any term P(xm|y) = 0 then the class probability P(y|x) = 0 • Solution: no event is impossible: P(xm|y) > 0∀xm∀y
1. Epsilon Smoothing 2. Laplace Smoothing
Estimation
Logistic Regression
• Is a binary classification model
• Is a probabilistic discriminative model. Why?
• We model probabilities P(y = 1|x; θ) = p(x) as a function of observations x under parameters θ. [ What about P(y = 0|x; θ)? ]
• We want to use a (suitably modified) regression approach
Logistic Regression
• Is a binary classification model
• Is a probabilistic discriminative model. Why?
• We model probabilities P(y = 1|x; θ) = p(x) as a function of observations x under parameters θ. [ What about P(y = 0|x; θ)? ]
• We want to use a (suitably modified) regression approach 1
P(y=1|x1,x2,…,xF;θ) = 1+exp(−(θ0+Ff=1θfxf)) = σ(x;θ)
• We define a decision boundary, e.g., predict y = 1 if P(y = 1|x1,x2,…,xF;θ) > 0.5 and y = 0 otherwise
Perceptron: Definition I
• The Perceptron is a minimal neural network
• Neural networks are inspired by the brain – a complex net of neurons
• A (computational) neuron is defined as follows:
• input = a vector x of numeric inputs (⟨1, x1 , x2 , …xn ⟩)
• output=ascalaryi ∈R
• hyper-parameter: an activation function f • parameters: θ = ⟨θ0, θ1, θ2, …θn⟩
• Mathematically:
jj
yi =f θxi j
Recap part III: Non-Linear Classification
Multi-layer Perceptron I
• Input layer with input units x: the first layer, takes features x as inputs
• Output layer with output units y: the last layer, has one unit per possible
output (e.g., 1 unit for binary classification)
• Hidden layers with hidden units h: all layers in between.
Input layer
Output layer
Multi-layer Perceptron I
• Input layer with input units x: the first layer, takes features x as inputs
• Output layer with output units y: the last layer, has one unit per possible
output (e.g., 1 unit for binary classification)
• Hidden layers with hidden units h: all layers in between.
Input layer
Hidden layer 1
Output layer
Multi-layer Perceptron I
• Input layer with input units x: the first layer, takes features x as inputs
• Output layer with output units y: the last layer, has one unit per possible
output (e.g., 1 unit for binary classification)
• Hidden layers with hidden units h: all layers in between.
Hidden layer 1
Input layer
Hidden layer 2
Output layer
Learning the Multi-layer Perceptron
Recall Perceptron learning:
• Pass an input through and compute x
• Compare yˆ against y
x221 kkka2 ŷ
z=Σ θ x a=g(z)
• Weight update θi ← θi + η(y − yˆ)xi x1
Learning the Multi-layer Perceptron
Recall Perceptron learning:
• Pass an input through and compute x
• Compare yˆ against y
x221 kkka2 ŷ
z=Σ θ x a=g(z)
• Weight update θi ← θi + η(y − yˆ)xi x1
Why can’t we use this method to learn parameters of the MLP? What do we do instead?
Backpropagation: The Generalized Delta Rule
Σθ2 Σ Σ ij
• The Generalized Delta Rule
△θ2 =η∂E =η(yp−yˆp)g′(z)a =ηδa
ij ∂θ2 ij ij ij
δ i = ( y p − yˆp ) g ′ ( z i )
• The above δi can only be applied to output units, because it relies on the
target outputs yp.
• We do not have target outputs y for the intermediate layers
Backpropagation: The Generalized Delta Rule
• Instead, we backpropagate the errors (δs) from right to left through the
△θ1 = η δ a jk jk
δ =θ1 δ g′(z) j ijij
Decision Trees
https://xkcd.com/1924/
Decision Trees
• ID3 algrithm: recursive divide and conquer • Split criteria:
• entropy/purity: intuition? What’s a good value of entropy? • information gain
• gain ratio
Ensemble learning (aka. Classifier combination): constructs a set of base classifiers from a given set of training data and aggregates the outputs into a single meta-classifier
• Intuition 1: the combination of lots of weak classifiers can be at least as good as one strong classifier
• Intuition 2: the combination of a selection of strong classifiers is (usually) at least as good as the best of the base classifiers
• Stacking
• Bagging (Random Forests)
• Boosting (Decision Trees, Adaboost)
Recap part IV: More Food for Thought (or exam preparation…)
Questions to think about I
Choosing a classification (or any ML) Algorithm
• Probabilistic interpretation?
• Restrictive assumptions on features?
• Restrictive assumptions on the problem? • How well does it perform?
• How long does it take to train?
• How interpretable is it?
• How much data does it require?
Questions to think about II
How do we know we succeeded?
• Choose the right evaluation metric (accuracy, precision, recall, …)
• Know the mechanics behind the metrics.
• What is overfitting and how do we prevent it?
• Choose the right evaluation strategy, maximizing the utility of your data (cross-validation, hold-out, …). What to consider?
Questions to think about II
How do we know we succeeded?
Questions to think about III
Theoretical considerations and optimization
• Is the problem linearly separable?
• Is my classifier powerful enough to solve my problem?
• What does the objective function of my classifier look like? And what optimization strategy should I choose?
Recap part V: Evaluation
Learning Curves
Underfitting and Overfitting
• Use more complex model (e.g. use nonlinear models) • Add features
• Boosting
High Variance
• Reduce model complexity – complex models are prone to high variance • Reduce features ; add data
Recap part VI: Beyond supervised learning…
Semi-supervised learning
Learning from both labelled and unlabeled data
• Semi-supervised classification:
• L is the set of labelled training instances {xi , yi }li =1
• U is the set of unlabeled training instances {xi }l +u i=l+1
• Oftenu≫l
• Goal: learn a better classifier from L ∪ U than is possible from L
Approaches
• Self-training
• Active learning, query strategies • Data augmentation
• Unsupervised pre-training
Unsupervised Learning: Clustering
Learning in the context where we don’t have (or don’t use) training data labelled with a class value for each instance.
Finding groups of items that are similar.
• k-means clustering
• hierarchical clustering
• agglomerative clustering • divisive clustering
Recap part VII: Problems and applications, more generally…
Anomaly Detection
Types of Anomalies
• Global, contextual, collective anomalies
Concepts/scenarios of anomaly detection
• unsupervised, semi-supervised, supervised methods
• Statistical methods: assume data follow a fixed model
• Proximity based: outlier if nearest neighbors are far away
• Density based: outlier, if in region of low density
• Clustering based: outlier, if not part of large and dense cluster
Name a statistical and a proximity-based method
Fair Machine Learning
https://fairmlclass.github.io/ 33
Fair Machine Learning
Sources of bias
• Models and algorithms
Algorithmic Fairness
• Fairness through unawareness (Why (not)?)
• Fairness through awareness: group fairness, equal opportunity,
predictive parity
Approaches towards preventing bias in ML models
• Pre-processing, for example, …
• Modeling, e.g., for example, …
• Post-processing, e.g., for example, …
Source https://www.aitrends.com/machine-learning/here-are-six-machine-learning-success-stories/
Source https://www.aitrends.com/machine-learning/here-are-six-machine-learning-success-stories/
• Understand fundamental mathematical concepts in machine learning (including probability and optimization)
• Understand the theory behind a variety machine learning algorithms
• Identify the correct ML model given a specific data set
• Meaningfully evaluate the output of a ML model in the context of a
specific problem
• Apply a variety of ML algorithms
• Python programming: ML model implementation, data processing,
evaluation
• Problem solving, Academic writing and presentation
And finally…
Please participate in the end of semester survey!
• What worked well?
• Suggestions for improvements?
Capstone / PhDs
Feel free to get in touch if you’re interested in a project in NLP!
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com