程序代写代做代考 information retrieval data science Entity Linking and Relation Recognition

Entity Linking and Relation Recognition
This time:
Entity Linking
The challenge of entity linking Techniques for entity linking
Relation Recognition
What is relation recognition? Identifying related entities Classifying relations
Data Science Group (Informatics) NLE/ANLP Autumn 2015 1 / 24

Information Extraction
Recall: IE is the task of extracting information from unstructured text: Detect entities of interest
e.g. Companies, locations, products
Detect relations of interest between entities:
COMPANY in LOCATION COMPANY sell PRODUCT COMPANY acquire COMPANY
etc.
Data Science Group (Informatics) NLE/ANLP Autumn 2015 2 / 24

Determining the Identity of Entities
The task is called:
Named Entity Disambiguation Entity Linking
Data Science Group (Informatics) NLE/ANLP Autumn 2015 3 / 24

The Problem
A problem instance consists of:
A knowledge base (KB) such as Wikipedia
An entity mention in a textual context
Goal:
return canonical entry in KB of entity being mentioned
or
return NIL if the entity does not in KB
Data Science Group (Informatics) NLE/ANLP Autumn 2015 4 / 24

Why This is Challenging
Absence from KB
Not realistic that KB contains all entities being mentioned
— open class concepts hard to maintain up-to-date and complete — e.g. lists of people being talked about
Data Science Group (Informatics) NLE/ANLP Autumn 2015 5 / 24

Why This is Challenging
Entity Ambiguity
— many different entities potentially referred to with same string
Manchester
— the city in England
— the town in Bolivia
— one of 32 towns in the USA — the football club
— the University
— the Airport
— song by The Beautiful South
Data Science Group (Informatics) NLE/ANLP Autumn 2015 6 / 24

Why This is Challenging
Name Variations
— many different ways of referring to the same entity
Manchester United Football Club MUFC
Manchester United FC Manchester United
Man United Man U Manchester United
The Reds
Busby Babes
Lancashire & Yorkshire Railway Newton Heath The Heathens
Data Science Group (Informatics) NLE/ANLP Autumn 2015 7 / 24

Wikipedia as the KB
Many named entities have their own page in Wikipedia
The title of the page is a canonical way of naming the entity
Title of Wikipedia page for 42nd US president is
Bill Clinton
not
William Jefferson Blythe III William Jefferson Clinton
Data Science Group (Informatics) NLE/ANLP Autumn 2015 8 / 24

Techniques for Entity Linking
Two phases
Find candidates in KB for given entity mention
Rank candidates to find most probable
Data Science Group (Informatics) NLE/ANLP Autumn 2015 9 / 24

Generating Candidates
The name variants challenge is addressed here
Need to find all potentially relevant candidates
Familiar tradeoff:
precision versus recall
Need good recall
— so that the correct entity is among candidates
but too many candidates can hurt precision (and efficiency)
Data Science Group (Informatics) NLE/ANLP Autumn 2015 10 / 24

Strategies for Generating Candidates
Remember, this is just generating candidates! — typically makes limited use of the context
Mention is exact match with title of Wikipedia page
David Beckham
Mention is proper substring of title of Wikipedia page or vice-versa
Beckham
Mention is an acronym of page title
UoS
Data Science Group (Informatics) NLE/ANLP Autumn 2015 11 / 24

Strategies for Generating Candidates
Mention is a similar string to page title
— use a string similarity measure, e.g. Levenshtein Distance
Mention is a known alias for page title
— can extract from Wikipedia redirects and disambiguation pages
UK , Becks
Data Science Group (Informatics) NLE/ANLP Autumn 2015 12 / 24

Information for Ranking Candidates
Entity mention occurs within a context
Co-occurrence of entity mentions
— other named entities in same document
Local context of an entity mention — neighbouring words
Global context of an entity mention
— document within which entity mention occurs — bag-of-words for document captures topic
Data Science Group (Informatics) NLE/ANLP Autumn 2015 13 / 24

Strategies for Ranking Candidates
Entity relatedness
Do co-occurring entities also co-occur with same types in KB pages?
Query relevance
Does a candidate KB page contain tokens in local context of entity mention?
Document similarity
Does a candidate KB page have a high bag-of-words similarity to document?
Data Science Group (Informatics) NLE/ANLP Autumn 2015 14 / 24

Relation Extraction
Discovering relationships between entities
In their largest acquisition to date, Google has acquired YouTube for $1.65 billion in an all stock transaction.
< entity > < relationship > < entity > Google acquire YouTube COMPANY acquire COMPANY
Data Science Group (Informatics) NLE/ANLP Autumn 2015 15 / 24

Binary Relations
Relation extraction is typically concerned with binary relationships
Named entity recognition is the unary variant of this task — entities belong to a specified class
Binary relations are fundamental to meaning
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 24

Relation Granularity
Recall that named entities are classified into classes — PERSON, PLACE, COMPANY, etc
Relation types can also be organised into classes
X acquired Y X married to Y
=⇒ Y PART-OF X
=⇒ X AFFILIATED-WITH Y
Data Science Group (Informatics)
NLE/ANLP Autumn 2015 17 / 24

Supervised Approaches
Two phases:
Phase 1: Extract a pair of entities that are related in some way
Phase 2: Categorise the relationship that holds between the entities
Data Science Group (Informatics) NLE/ANLP Autumn 2015 18 / 24

Extracting Related Entity Pairs
Needs a binary classifier
Are entities e1 and e2 related in this text?
Classifier trained on positive and negative examples Positive examples given in labelled training data
Negative examples are entities found in training data that are not labelled as being related
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 24

Classifying Relations
Needs a multiclass classifier e.g. Naïve Bayes
Features used for classification:
Class of each of the two target named entities Tokens appearing in named entity mentions
Data Science Group (Informatics) NLE/ANLP Autumn 2015 20 / 24

Classifying Relations
More features used for classification:
Bag-of-words between entity mentions Distance between entity mentions
Number of other named entity mentions between target named entities
Features of the syntactic structure
Data Science Group (Informatics) NLE/ANLP Autumn 2015 21 / 24

Syntactic Paths
Using features for relation identification Using syntactic paths as features
… YouTube, a subsidiary of Google …
How are YouTube and Google related in the syntax? Can be captured with a syntactic path
Data Science Group (Informatics) NLE/ANLP Autumn 2015 22 / 24

Syntactic Paths: Example
NP
NP PUNC NP
NNP,NP PP YouTube DT NN IN NP
a subsidiary of NNP Google
(NP ↑,NP ↓,NP ↓,PP ↓,NP ↓) Data Science Group (Informatics) NLE/ANLP
Autumn 2015
23 / 24

Next Topic: Information Retrieval
What is information retrieval? Boolean retrieval
Indexing documents
Retrieval with an inverted index
Data Science Group (Informatics) NLE/ANLP Autumn 2015 24 / 24