TA session
Danchen Zhang
Muddiest points & reading summary
Muddiest points & reading summary will be collected again after this Sunday midnight.
– Muddiest points from class in March 24.
– Reading summary for class April 7.
– SLP Chapter 18 (Information Extraction)
– SLP Chapter 20 (Semantic Role Labeling)
Assignment 1
BeamSearch & BeamSearch with length normalization
Beam could be implemented as a min-heap with heapq, min score storing in the root.
BeamSearch1¡¯s beam should store at least probability & pre-sentence, and optionally store last-word, word-count, complete.
BeamSearch2¡¯s beam should store at least score & probability & pre-sentence.
– Score is for min-heap add & remove operation.
– Probability is for score calculation.
A popular bug in BeamSearch2
Beam only stores score without probability, leading to score error.
In iterations of adding word, score is directly multiplied, which is wrong.
Correct
Assignment 2
Bayesian Network
Take P(a|b,e) as an example:
Step 1: recognize a has [1, 2] ; b has [1, 2, 3], e has [1, 2]
Step 2: count sample size of each combination of {b, e}, e.g. #(b=1, e=1).
Step 3: for each combination of {b,e}, count the distribution of a, e.g., #(a=1), #(a=2) when (b=1, e=1).
Step 4: calculate conditional probability, e.g., P(a=1|b=1, e=1)= #(a=1)/#(b=1, e=1).
A frequent bug in Filtering & Prediction
prior[0]=(prior[0]*0.7+prior[1]*0.3) # rain prior[1]=(prior[0]*0.3+prior[1]*0.7) # sun
Wrong, because prior[0] has been modified during calculation. value1=(prior[0]*0.7+prior[1]*0.3) # rain value2=(prior[0]*0.3+prior[1]*0.7) # sun
prior[0]=value1
prior[1]=value2
Assignment 3
# Ranking
# Modules discussion # Interesting bugs
# Grading policy
Task 3 ranking (maybe updated due to deadline extension)
1
TIH42@pitt.edu
Accuracy 0.859375 Hamming loss 0.140625
2
ZHL141@pitt.edu
Accuracy 0.578125 Hamming loss 0.234375
3
RUT15@pitt.edu
Accuracy 0.53125 Hamming loss 0.22265625
4
TIW61@pitt.edu
Accuracy 0.53125 Hamming loss 0.22265625
5
KSR43@pitt.edu
Accuracy 0.515625 Hamming loss 0.1771
6
BIW21@pitt.edu
Accuracy 0.484375 Hamming loss 0.24609375
7
QJS2@pitt.edu
Accuracy 0.453125 Hamming loss 0.59375
8
BID11@pitt.edu
Accuracy 0.423 Hamming loss 0.267
9
SHL171@pitt.edu
Accuracy 0.423 Hamming loss 0.262
10
ZIW35@pitt.edu
Accuracy 0.423 Hamming loss 0.262
Runs with wrong implementation are removed. Wrong evaluations are fixed. Will comments in CourseWeb. ****If you find any problem in grading, please send me an email.
Final Top 10 runs will be given 15 points. We will publish their performance & reports later this week.
Useful modules
Numerical feature preprocessing.
Categorical feature preprocessing.
Something about multi-class & multi-label classification.
StandardScaler: mean =0, std =1
from sklearn.preprocessing import StandardScaler x = np.array([[1., -1., 2.],
[2., 0., 0.],
[0., 1., -1.]])
scaler = preprocessing.StandardScaler().fit(x)
array([[ 0. , -1.22474487, 1.33630621], [ 1.22474487, 0. , -0.26726124], [-1.22474487, 1.22474487, -1.06904497]])
Other numerical feature preprocessings include:
MinMaxScaler
– Map to [0, 1]
MaxAbsScaler
– Map to [-1, 1]
Binarizer
– Map to 0 & 1
RobustScaler
– Deal with outliers
Normalizer
– Normalized by l1, l2, or max
Categorical features
OneHotEncoder encodes a categorical feature to multidimensional binary features.
– Color {pink, yellow, red, blue}
– Data [pink, pink, yellow, blue, blue]
– New feature [[1,0,0,0],[1,0,0,0],[0,1,0,0],[0,0,0,1],[0,0,0,1]]
– Sparse. Work together with PCA to shrink dimensions.
OrdinalEncoder encodes a categorical feature to 1-dimensional feature.
– New feature [0, 0, 1, 3, 3]
– Blue > pink; red = (yellow + blue)/2; not reasonable
– But sometimes cooperates very well with Tree models.
Multi-class classification
OneVsOneClassifier constructs one classifier per pair of classes. At prediction time, the class which received the most votes is selected.
– Too many classifiers to learn, and very slow.
OneVsRestClassifier constructs one classifier per class. For each classifier, the class is fitted against all the other classes.
– Default choice.
Multi-label classification implementation
1. Sklearn OneVsRestClassifier can be directly use.
2. Implement four classifiers for ¡°no, family, paid, school¡±.
3. Some classifiers directly support multi-label classification.
See more in https://scikit-learn.org/stable/modules/multiclass.html
Interesting bugs – evaluation
Multilabel classification, groundtruth:
[[1 0 0 1]
[1 0 1 1]] prediction: [[1 1 0 1]
[1 0 1 0]]
numpy.mean != accuracy_score comparison:
[[ True False True True]
[ True True True False]]
acc_wrong = numpy.mean(gd == predict) # 0.75 acc_correct = accuracy_score(gd, predict) # 0.0
Column disappear
Train_data has ¡®G3¡¯ when loaded. pandas.get_dummies(train_data)
¡®G3¡¯ -> OneHotEncoder -> ¡®G3_0¡¯, ¡®G3_1¡¯, ¡®G3_5¡¯, ¡®G3_6¡¯, … train_data[¡®G3¡¯] no longer exists.
Fake multi-label, true multi-class
Eight classes:
1. School
2. Family VS
3. Paid
4. No
5. School, family
6. School, paid
7. Family, paid
8. School, family, paid
Three classes:
1. School 2. Family 3. Paid
8-class is a multi-class classification with heavy data imbalance. Some class has very few training data.
3-class is a multi-label/tagging classification. Each class is relatively more balanced.
GridSearchCV for parameter tuning
Exhaustive search over specified parameter values for an estimator (classifier).
– It select parameter based on Cross-Validation (CV) performance.
– After selecting the best parameter, you should train the model on the training
data with this ¡°best param¡±.
– If you directly use the best model learned on 9 fold (e.g., GridSearch with 10
fold CV), your runs performance varies each time when you run it. You need to repeat the experiments for many time (such as 50), and report the average score with STD value.
Grading policy
1. Your program should be at least runnable.
2. Correct implementation. If multi-label -> multi-class……
3. Report & parameter tuning & feature engineering. Basically no problem.
4. Performance:
a. Task 1: MSE > 22.0, -3;
b. Task 2: Acc <0.4, -3;
c. Task 3: Top 10, _15.
Assignment 4
Natural Language Understanding
Task 1: extract opinions from online restaurant reviews with CoreNLP.
Should extract [menu, large], [portion, large], [price, reasonable]
Popular bugs or failures from last year
1. In extraction, grammar rules should be carefully refined and designed,
otherwise, too many opinions will be missed. a. Opinion extraction recall is important.
2. POS tagger could help you prevent wrong extraction.
a. ¡°Visited, I¡± could be filtered.
b. Opinion extraction precision is also important.
3. Wrong opinion format leads to mismatch. For example, ¡°expansive, food¡± and ¡°food, pricy¡±.
4. ¡°Fish taco¡± problem. Phrase recognition not required, but is important in real world NLP problems.
Task 2: find similar opinions with word2vec
Opinion similarity is defined by yourself. For example, to me, ¡°excellent service¡±, ¡°nice waiter¡±, ¡°great service¡±, and ¡°good service¡± are similar to ¡°good service¡±.
When tune the parameters, please balance precision & recall performance. Both are important.