程序代写代做代考 python ada deep learning PowerPoint Presentation

PowerPoint Presentation

LECTURE 1

Introducton to NLP and Regular Expressions

Arkaitz Zubiaga, 8th January, 2018

2

 Lectures: Mon (4pm, LIB2) & Wed (10am, L5)

 Seminars: Mon (3pm, OC1.01) (week 2 onwards). The seminars
will cover supplementary material and provide technical detail.

 Labs: Thu 2-4pm in CS0.01 (weeks 2, 3, 5, 7 and 9).

ABOUT THE MODULE: CS918

3

 Assessment: 70% exam in May/June, 30% of 2 assignments:

 Assign. 1: released week 2, deadline week 8.

 Assign. 2: released week 4, deadline week 11.

ABOUT THE MODULE: CS918

4

 Give fundamental understanding of NLP methods for processing
linguistc data in textual form.

 Familiarisaton with diferent applicatons of NLP.

 Give students the skills to apply state of the art NLP methods on
different types of text (newswire, web, social media, scientfc
artcles).

AIMS OF THE MODULE: CS918

5

 Essental:

 Jurafsky, Daniel, and James H. Martn. 2009. Speech and Language Processing: An Introducton
to Natural Language Processing, Speech Recogniton, and Computatonal Linguistcs. 2nd and
3rd editons.

 Bird Steven, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly
Media, Inc., 2009.

 Recommended:

 Christopher D. Manning and Hinrich Schütze. 1999. Foundatons of Statstcal Natural Language
Processing. MIT Press, Cambridge, MA, USA.

 Christopher M. Bishop. 2006. Patern Recogniton and Machine Learning (Informaton Science
and Statstcs). Springer-Verlag New York, Inc., Secaucus, NJ, USA.

 Marie-Francine Moens, and Juanzi Li. “Mining User Generated Content and Its Applicatons.” In
Mining User Generated Content, 3–17. Social Media and Social Computng. Chapman and
Hall/CRC, 2014.

BOOKS FOR THE MODULE

6

 What is Natural Language Processing (NLP)?

 What are NLP areas and applicatons?

 Why is NLP challenging?

 Basic text processing with Regular Expressions.

LECTURE 1: CONTENTS

7

 NLP is the feld that studies computatonal methods for
automatcally identfying structure in human language data (e.g.
English or Chinese, writen or spoken).

 NLP is also concerned with the insights that such computatonal
work gives us into human processing of language.

 In this module, we will focus on textual rather than spoken
language.

WHAT IS NATURAL LANGUAGE PROCESSING?

8

 A lot of today’s knowledge is writen in texts, even more so on
the Internet, social media, emails.

 We need automated means to process all that content!

 Communicaton with chatbots and across languages needs
understanding of language.

WHY IS NLP IMPORTANT?

9

 Is being increasingly used by companies, e.g.:

WHY IS NLP IMPORTANT?

10

 1940s: used mainly for machine translaton.

 1980s: Gained momentum with a focus on computatonal
grammars for the representaton of meaning. Small corpora,
mostly rule-based.

 1990s: Rapid expansion, large collectons, Internet.

 2000s: Shif from computatonal grammars to statstcal (machine
learning).

 2013-: Largely infuenced by Deep learning.

BRIEF HISTORY OF NLP

11

NLP APPLICATIONS: QUESTION ANSWERING

12

NLP APPLICATIONS: QUESTION ANSWERING

13

NLP APPLICATIONS: QUESTION ANSWERING

14

NLP APPLICATIONS: QUESTION ANSWERING

15

NLP APPLICATIONS: INFORMATION EXTRACTION

Subject: meetng

Date: 8th January, 2018

To: Arkaitz Zubiaga

Hi Arkaitz, we have fnally scheduled the meetng.

It will be in the Ada Lovelace room, next Monday 10am-11am.

-Mike
Create new Calendar entry

Event: Meeting w/ Mike
Date: 15 Jan, 2018
Start: 10:00am
End: 11:00am
Where: A. Lovelace

16

NLP APPLICATIONS: SENTIMENT ANALYSIS

17

NLP APPLICATIONS: SENTIMENT ANALYSIS

18

NLP APPLICATIONS: MACHINE TRANSLATION

19

NLP APPLICATIONS

Coreference resoluton

Queston answering (QA)

Part-of-speech (POS) tagging

Word sense disambiguaton (WSD)

Paraphrase

Named entty recogniton (NER)

Parsing
Summarizaton

Informaton extracton (IE)

Machine translaton (MT)
Dialog

Sentment analysis

mostly solved

making good progress

stll really hard

Spam detecton
Let’s go to Agra!Let’s go to Agra!

Buy V1AGRA …Buy V1AGRA …


Colorless green ideas sleep furiously.Colorless green ideas sleep furiously.
ADJ ADJ NOUN VERB ADV ADJ ADJ NOUN VERB ADV

Einstein met with UN ofcials in PrincetonEinstein met with UN ofcials in Princeton
PERSON ORG LOCPERSON ORG LOC

You’re invited to our dinner
party, Friday May 27 at 8:30
You’re invited to our dinner
party, Friday May 27 at 8:30

Party
May 27
add

Party
May 27
add

Best roast chicken in San Francisco!Best roast chicken in San Francisco!

The waiter ignored us for 20 minutes.The waiter ignored us for 20 minutes.

Carter told Mubarak he shouldn’t run again.Carter told Mubarak he shouldn’t run again.

I need new bateries for my mouse.I need new bateries for my mouse.

The 13th Shanghai Internatonal Film Festval…The 13th Shanghai Internatonal Film Festval…

第 13届上海国际电影节开幕…第 13届上海国际电影节开幕…

The Dow Jones is upThe Dow Jones is up

Housing prices roseHousing prices rose

Economy is
good

Economy is
good

Q. How efectve is ibuprofen in reducing
fever in patents with acute febrile illness?

Q. How efectve is ibuprofen in reducing
fever in patents with acute febrile illness?

I can see Alcatraz from the window!I can see Alcatraz from the window!

XYZ acquired ABC yesterdayXYZ acquired ABC yesterday

ABC has been taken over by XYZABC has been taken over by XYZ

Where is Citzen Kane playing in SF? Where is Citzen Kane playing in SF?

Castro Theatre at 7:30. Do
you want a tcket?

Castro Theatre at 7:30. Do
you want a tcket?

The S&P500 jumpedThe S&P500 jumped

20

WHY IS NLP CHALLENGING?

 Language is ambiguous, e.g. “Flying planes can be dangerous”

 What is actually meant?

 It can be dangerous for a person to fy planes.

 Planes that are fying in the air can be dangerous.

21

WHY ELSE IS NLP CHALLENGING?

We had a double room, but was
to cold when we complaint

a pain in the neck

throw in the towel

neologisms
unfriend
Retweet

selfe

tricky entty names

Let It Be is a good song…

They were listening to One
Directin…

world knowledge

Mary and Sue are sisters.

Mary and Sue are mothers.

the London Euston-Birmingham
New Street train

is Euston-Birmingham a word?

non-standard English segmentaton issues idioms

PART 2: BASIC TEXT PROCESSING
WITH REGULAR EXPRESSIONS

23

REGULAR EXPRESSIONS

 A formal language for specifying text paterns.

 For searching or replacing text.

 How do we search for any of the following in a text?

 woodchuck

 woodchucks

 Woodchuck

 Woodchucks

24

APPLICATION OF REGULAR EXPRESSIONS

 When the user says “You are X”, ELIZA responds with “What
makes you think I am X?”, for any X.

 X can be obtained with regular expressions.

25

REGULAR EXPRESSIONS: DISJUNCTIONS

• Leters inside square brackets []

• Ranges [A-Z]

Patern Matches

[wW]oodchuck Woodchuck, woodchuck

[1234567890] Any digit

Patern Matches

[A-Z] An upper case leter Drenched Blossoms

[a-z] A lower case leter my beans were impatient

[0-9] A single digit Chapter 1: Down the Rabbit Hole

26

REGULAR EXPRESSIONS: NEGATIONS

• Negatons [^Ss]
• Caret (^) means negaton only when frst in []

Patern Matches

[^A-Z] Not an upper case leter Oyfn pripetchik

[^Ss] Neither ‘S’ nor ‘s’ reason

[^e^] Neither e nor ^ Look here

a^b The patern a caret b Look up a^b now

27

REGULAR EXPRESSIONS: MORE DISJUNCTION

Patern Matches

groundhog|woodchuck groundhog
woodchuck

yours|mine yours
mine

a|b|c = [abc]

[gG]roundhog|[Ww]oodchuck Groundhog
groundhog
Woodchuck
woodchuck

28

REGULAR EXPRESSIONS: ? * + .

Patern Matches

colou?r Optonal
previous char

color colour

oo*h! 0 or more of
previous char

oh! ooh! oooh! ooooh!

o+h! 1 or more of
previous char

oh! ooh! oooh! ooooh!

baa+ baa baaa baaaa baaaaa

beg.n begin begun begun beg3n

ba{5} 5 tmes baaaaa

29

REGULAR EXPRESSIONS: ANCHORS ^ $

Patern Matches

^[A-Z] Coventry

^[^A-Za-z] 1 “Hello”

\.$ The end.

.$ The end? The end!

30

REGULAR EXPRESSIONS: ERRORS

 False positives (Type I errors)

 False negatves (Type II errors)

31

FALSE POSITIVES: TYPE I ERRORS

 Instances that should not be output.

 For instance, if we search for “[Tt]he”:

There are 10 people in the room, they all have a laptop with
them.

32

FALSE NEGATIVES: TYPE II ERRORS

 Instances that have been missed.

 For instance, if we search for “the”:

The laptop is in the kitchen.

33

EVALUATION

 In NLP we are always dealing with these kinds of errors.

 Reducing the error rate for an applicaton ofen involves two
antagonistc eforts:

 Increasing precision (minimising false positves)

 Increasing coverage or recall (minimising false negatves).

34

EVALUATION: PRECISION AND RECALL

There are 10 peiple in the riim, they all have a laptip with them.

 (rato of correct items among those output)

¼ = 0.25

 (rato of reference items that have been output)

1/1 = 1

35

EVALUATION: F1 SCORE

We want to optmise for both precision and recall:

 (harmonic mean of precision and recall)

Equaton as follows, however generally ß = 1:

36

SUMMARY

 Regular expressions play a surprisingly large role:

 Sophistcated sequences of regular expressions are ofen the
first model for any text processing task.

 For many hard tasks, we use machine learning classifiers.

 But regular expressions are used as features in the classifers
or to preprocess the text.

 Can be very useful in capturing generalisatons.

37

REGULAR EXPRESSIONS: REFERENCES

 Regular expressions with Python:
htps://docs.python.org/3.7/howto/regex.html

 Testng regular expressions online:
htps://regex101.com/

https://docs.python.org/3.7/howto/regex.html
https://regex101.com/

38

RESOURCES

 Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 3rd edition. Chapters 1-
2.

 Bird Steven, Ewan Klein, and Edward Loper. Natural Language
Processing with Python. O’Reilly Media, Inc., 2009. Chapters 1-3.

Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38