CSC311 Final Project Overview
• Background and Task
• Dataset and Starter Code
Copyright By PowCoder代写 加微信 powcoder
• Inspecting a Baseline Model
• Overview of Different Approaches
Background and Task
• Massive Open Online Courses: KhanAcademy, Coursera
• Question: How can we personalize education in MOOCs?
• Idea: Measure students’ understanding of the material by introducing a personalized assessment component.
Background and Task
Why a personalized assessment component?
• Each question can be designed to highlight a misconception. • Lets us adjust the level of difficulty.
Background and Task
Goal: Build a predictive model to predict whether a student will answer a given question correctly, given answers to past questions, and other students’ answer.
(student_id, question_id, is_correct) (1, 1, 1)
PREDICTIONS: (student_id, question_id, ?) (2, 2, ?)
Background and Task
• Part A: Try out established methods you’ve covered in class. • Part B: Improve on the existing methods.
The project has an (ungraded) Kaggle-based competition component!
Background and Task
Lets switch to the Colab notebook.
• We’ll inspect the dataset and the starter code.
• We’ll build a baseline model and make a Kaggle submissions with it.
• The dataset also contains metadata including 1) date of birth 2) gender 3) eligibility for ”pupil premium”.
• Not used in part A, but might be relevant for part B.
Part A: Testing out various models, under the guidance of the project handout.
• Given a notion of similarity, classify a test example by looking at the most similar training examples to it.
• Similarity in terms of student, or similarity in terms of question?
What to analyze?
• Notion of similarity: Compare student-based similarity with item-based similarity.
• Choice of hyperparameter: In both cases, which value of k works better? • Limitations: What are the limitations of using KNN in this context?
Item Response Theory
• Goal: Assign a probability that a student will answer a given question correctly.
• Simplifying assumption 1: Correct answer probability depends on two parameters:
▶ θi: ith Student ability
▶ βj: jth question difficulty.
• Simplifying assumption 2: Correct answer probability increases monotonically with θi and −βj.
Item Response Theory
• How to train: Maximize data log likelihood under model parameters!
p(cij|θi, βj) = sigmoid(θi − βj) = exp (θi − βj) 1+exp(θi −βj)
• Connection to logistic regression: Think about how this model relates to logistic regression!
Item Response Theory
• Possible extensions1
p(cij|θi, βj) = c + [1 − c] ∗ sigmoid(kj(θi − βj))
• c: Probability of getting question right via. random guess.
• kj: How steep the sigmoid looks (i.e. how discriminative the question is”)
1reference link
Item Response Theory
Can you think of other real-life problems where Item Response Theory can be applied?
• healthcare
• recommender systems •?
Item Response Theory
What to analyze?
• Log likelihood: Derive the log likelihood and inspect it’s form.
• Inspecting the results: Using the trained θ and β vectors, plot how the probability of a correct answer changes as “student ability” varies. Why does the plot look the way it does? What can we learn from the plot?
Matrix Factorization
We consider two options in the handout:
• Singular Value Decomposition • Alternating Least Squares
Matrix Factorization
• Using PCA (via. Singular Value Decomposition)
• Goal: Complete the matrix using the top principal components.
• Question: Using KNN to fill in missing values requires us to specify whether
we’re using question or student similarity. Is there such a distinction for SVD?
Matrix Factorization
• Alternating Least Squares: Assign each student and question a vector. Train the values of these vectors so that a high dot product between student i and question j’s vectors implies a correct answer.
• Objective:
min (Cnm − unzm) (1)
U,Z 2 (n,m)∈O
• How to train U and Z: Loop over each un and zm, and solve (1) assuming all other terms are fixed. Repeat until convergence.
Matrix Factorization
• How to train U and Z matrices:
1. 2. 3. 4. 5. 6.
Initialize U and Z.
repeat until “convergence”:
for n = 1,…,N do
un = (∑ zjz⊤)−1 ∑
j:(n,j)∈O for m = 1,…,M do
j j:(n,j)∈O
zm = (∑ uiu⊤)−1 ∑ i:(i,m)∈O i i:(i,m)∈O
Matrix Factorization
What to analyze?
• Limitations of SVD: In what way is SVD limited in this context?
• Affect of hyperparameters on ALS performance: How does the choice of hyperparameters affect the training dynamics and the final accuracy?
• Alternative objectives: Can we change the loss function so that the problem is treated as a binary classification problem?
Neural Network
• Learning a “student autoencoder”: Represent each student by a vector of length Nquestions. Train an autoencoder to project the student vectors into a low dimensional space where similar students are clustered together.
Neural Network
• Learning objective:
• Network architecture: Two layer, fully connected network.
min ||v − f(v; θ)||2 (2)
Neural Network
What to analyze?
• Bottleneck width: How does the dimensionality of the bottleneck layer affect the results?
• Effect of regularization: How does regularizing the network weights by penalizing their Frobenius norm affect the results?
• Try to improve stability and accuracy by: 1. Select 3 models (same or different).
2. Generate three alternative datasets by bagging.
3. Train the models on the corresponding bagged dataset.
4. Pick the average of the 3 models as the final decision on the test set.
• Reminder about bagging:
What to analyze:
• How did using an ensemble affect the accuracy? • How did it affect the stability of the model?
This part is more open ended – don’t forget to explain your approach in enough detail that a reader of your report can faithfully reproduce your results.
Questions / Starter Code
If we have time remaining, we can either look deeper into the starter code, or answer student questions.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com