PowerPoint Presentation
Comp90042
Workshop
Week 10
25 May
1
1
Machine translation
Information extraction
2
Table of Contents
3
1. Machine Translation
MT
Statistical
Neural Network
Pre deep learning era.
Complex, lot of feature engineering.
Word-based approach
Phrased-based approach
Cleaner, less feature engineering
Encoder-Decoder framework
End-to-End
Use RNN / Transformer
Statistical MT:
– P(e|f) = Given French sentence f, aim is to find the best English sentence e
– Complex –> Due to difficult alignment
Neural MT:
– Encoder-Decoder
3
4
1. Machine Translation (Encoder-Decoder)
RNN – Encoder
RNN – Decoder
I
come
from
Melbourne
Ich
komme
aus
Melbourne
Sentence vector
Attention
(weighted average)
English
Germany
4
5
1. Machine Translation (Generation)
RNN – Decoder
Sentence vector
komme
Ich
RNN – Decoder
Sentence vector
Ich
RNN – Decoder
Sentence vector
aus
komme
Note: Sentence vector is dynamic if you use Attention!
Prev decoder state
Prev decoder state
Prev decoder state
Encoder
output
Encoder
output
Encoder
output
5
6
1. Machine Translation (Generation)
stop
Note: Sentence vector is dynamic if you use Attention!
RNN – Decoder
Sentence vector
Melbourne
aus
Prev decoder state
Encoder
output
RNN – Decoder
Sentence vector
Melbourne
Prev decoder state
Encoder
output
6
7
1. Machine Translation
Q1
What aspects of human language make automatic translation difficult?
Lexical complexity
Morphology (E.g. English – Turkish)
Syntax (E.g. English – Japanese)
Semantics
The whole part of linguistics, from lexical complexity, morphology, syntax, semantics etc. In particular if the two languages have very
different word forms (e.g., consider translating from an morphologically light language like English into Turkish, which has very complex
morphology), or very different syntax, leading to different word order.
These raise difficult learning problems for a translation system, which needs to capture these differences in order to learn translations from
bi-texts, and produce these for test examples.
7
Machine translation
Information extraction
8
Table of Contents
9
2. Information Extraction
Q2
What is Information Extraction?
From (Unstructured) Text to Structured Data.
Example:
The University of Tokyo, abbreviated as Todai, is a public research university located in Bunkyo, Tokyo, Japan.
Result:
Capital(Japan, Tokyo)
Location(The University of Tokyo, Japan)
Location(Todai, Japan)
9
10
2. Information Extraction
How to perform IE?
IE
Named Entity Recognition
Find the entity of “Tokyo”, “The University of Tokyo”, “Japan”
Sequence Model: RNN, HMM, CRF
step1
Relation Extraction
Mostly classifiers
Find relations between two entities. E.g. “Tokyo” vs “Japan”
step2
10
11
2. Information Extraction
Q2A
What is NER? Why is it difficult?
NER = Named Entity Recognition
Mostly proper nouns —> Person, Place, Things
Ambiguous
People’s name vs location —> Philip is in Philip Island
Organization vs location —> New York Times vs New York
Fast solution but not effective: gazetter – create a (somewhat) exhaustive list of names of places.
Similar with POS tagging, but different tags and purpose
11
12
2. Information Extraction
Q2B
What is IOB trick in a sequence labelling? Why is it important?
Motivation: Named entities can consist of more than 1 token
Ex: [The University of Melbourne] is the best Australian University.
4 words
We indicate whether a given token is Beginning a named entity, Inside a named entity, or Outside a named entity.
The-B-LOC University-I-LOC of-I-LOC Melbourne-I-LOC is-O the-O best-O Australian-O University-O .-O
12
13
2. Information Extraction
Q2C
What is Relation Extraction?
Attempt to find relationships between entities in a text
Ex: Harry Potter vs JK.Rowling —> relation Author
It is done after obtaining entities (the NER tags)
13
14
2. Information Extraction
Q2D
Why are hand–written patterns generally inadequate for IE, and what other
approaches can we take?
Why?
Too many different ways of expressing the same information.
Other approach:
Parsing the sentence might lead a more systematic method of deriving all of the relations, but language variations mean that it’s still quite difficult.
Frame the problem as supervised machine learning, with general features (like POS tags, NE tags, etc.)
Bootstrapping patterns — using known relations to derive sentence structures that describe the relationship.
14
iPython 11-machine-translation
Whats inside?
– Encoder-Decoder for machine translation (Char-level)
– Use Colab — faster
To do?
– Modify to use GRU
– Modify to do translation at the word-level:
French & English tokenizer (SpaCy)
Create vocab
Replace low frequency with UNK
Change the corpus reading function
Updating training and inference to use the vocab to look up words
Apply attention mechanism (if you have time)
15
Next