程序代写代做代考 Interlingua MT: Translation of Numbers

Interlingua MT: Translation of Numbers

Topics:
Number systems
Grammar for numbers
Parsing

*

Interlingua MT:
Translation of Numbers
Semantic processing
Generation

MT “pyramid”
(revisited)
Source language
Target language

Interlingua
Transfer: deeper rep. Transfer: semantic rep.
Transfer: functional structure Transfer: phrase structure

Direct translation: word for word
translation
No transfer process
needed for interlingua
*

Interlingua MT
Interlingua
Language1
Language2
Language3
Others …….
Advantage of interlingua: Adding a new language needs only one more language pair: new language €€ Interlingua
*

*
What is interlingua?
An interlingua is supposed to be a universal

representation for … What?
meaning, of course
but what is meaning?
Under the circumstance of no clear meaning for

“meaning”, we may describe interlingua as
a universal representation for what can be conveyed

through human language communication
Question:
What can be conveyed by our languages?

How to design an interlingua?
Any clear idea about it? No
What we are sure to know is its

universality and
versatility

Think about the following
ontology of human knowledge
conceptions of what we know and can express via speech
ontology of objects in the world and in our languages
ontology of events
ontology of words, etc.

Any example to help us understand it any better?
*

Interlingua MT for numbers

Interlingua:
*
Values
English numbers
Chinese numbers
Others …….
Arabic numbers
Used as universal Arerparbeiscenntuamtiobnerfsor values

Number systems
Decimal numbers
Arabic numbers
Yes
Chinese numbers – ?

• <= 10,000, yes • > 10,000, still?
English numbers – ?

• <= 1,000, yes • > 1,000, still?
The distinction between the two can be exemplified by the difficulties in converting or translating between them.

Basically yes, but with quite some variation!

What is the difference?
*

*
What define a number system?
Base
the set of digits (or, base symbols) used
the cardinality of the digit set (i.e., the number of digits)
decimal numbers
base 10

digits: {0,

1, 2, 3, 4, 5, 6, 7, 8, 9}
each digit has its own digit value.
Position

the place where a digit shows up. 2 3 3 8 8
each position has its position value:
|Base|Pos 4 3 2 1 0

What value does a digit represent?
2
3
3 8 8
0
4 3 2 1
8×100
8×101
3×102
3×103
2×104
Digit value
Position value
Digit value
A digit represents different value when showing up in

different position
Position value
*
Digit x |Base|Pos

What is the value of a number?
All number’s value = sum of all its digits’ values. E.g.,

23,388
= 2×104 +
3×103 +
3×102 +
8×101 +
8×100
= 23,388

Hei! So trivial!
What kind of game are you
playing?
10

Let us play with binary numbers
Base 2
Digits =

{0, 1} (i.e., only 0 and 1 appear in a number)
Still trivial?
All computers play such a game.

How about numbers on other bases?
*
11,111 = 1×24 +
1×23 +
1×22 +
1×21 +

= 1×20

31

*
Octal numbers
{0, 1, 2, 3, 4, 5, 6, 7}
Base 8
Digits =


Numbers:
0, 1, 2, 3, 4, 5, 6, 7,
10, 11, 12, 13, 14, 15, 16, 17,
20, 21, 22, 23, 24, 25, 26, 27,
30, 31, 32, 33, 34, 35, 36, 37, ……
• 3578 =
= ? 3×82
+ 5×81
+
7×80
=
23910

*
Hexadecimal numbers
{0, 1, 2, …, 9, A, B, C, D, E, F}
Base 16
Digits =


Numbers:
0, 1, 2, … 9, A, B, C, D, E, F
10, 11, 12, … 19, 1A, 1B, 1C, 1D, 1E, 1F
20, 21, 22, … 29, 2A, 2B, 2C, 2D, 1E, 2F
……
• 35716 = ?
= 3×162 + 5×161 + 7×160 = 85510

*
Chinese numbers
Base 10, basically

• Digits = {零, 一, 二, 三, 四, 五, 六, 七, 八, 九}
Another set of digits: {壹, 貳, 叁…, 玖}
Position
Positions in Chinese numbers are explicitly

expressed
• Positions: {個}, 十拾, 百佰, 千仟, 万萬, 亿億, 兆
• Position values: 1, 10, 102, 103, 104, 108, 1012
• E.g.,
五 千 六 百 七 十 八
= 5×103 + 6×102 + 7×101 + 8×100
= 5,67810

A grammar for Chinese numbers
G –> Digits

S –> {G} 十 {G}

B –> G 百
B –> G 百 S
B –> G 百 零 G

Q –> G 千
Q –> G 千 B
Q –> G 千 零 S Q –> G 千 零 G
W –> Q/B/S/G 萬
W –>
W –>
W –>
W –>
Q/B/S/G 萬 零
Q/B/S/G 萬 Q Q/B/S/G 萬 零 B Q/B/S/G 萬 零 S
G
CCoonnjjuunnccttiiioonn,, nnoott zzeerroo!!
*

Large numbers in Chinese
W –> Q/B/S/G 萬
W –> Q/B/S/G 萬 Q
W –> Q/B/S/G 萬 零 B W –> Q/B/S/G 萬 零 S W –> Q/B/S/G 萬 零 G
Q/B/S/G 兆 零
Q/B/S/G 兆 零
Z –> Q/B/S/G 兆
Z –> Q/B/S/G 兆 Y
Z –> Q/B/S/G 兆 零 Y Z –> Q/B/S/G 兆 零 W Z –> Q/B/S/G 兆 零 Q Z –> B
Z –> G
Problem:
Ambiguity in analysis
*
Y –> Q/B/S/G 億
Y –> Q/B/S/G 億 W
Y –> Q/B/S/G 億 零 W
Y –> Q/B/S/G 億 零 Q
Y –> Q/B/S/G 億 零 B
Y –> Q/B/S/G 億 零 G

Solution
W –> B/S/G 萬
W –> B/S/G 萬 Q
W –> B/S/G 萬 零 B W –> B/S/G 萬 零 S W –> B/S/G 萬 零 G
Y –> B/S/G 億
Y –> B/S/G 億 WQ
Y –> B/S/G 億 零 W Y –> B/S/G 億 零 Q Y –> B/S/G 億 零 B Y –> B/S/G 億 零 G

YQ –> Q 億
YQ –> Q 億 WQ YQ –> Q 億 零 W
Z –> Q/B/S/G 兆
Z –> Q/B/S/G 兆 YQ
Z –> Q/B/S/G 兆 零 Y Z –> Q/B/S/G 兆 零 W Z –> Q/B/S/G 兆 零 Q Z –> Q/B/S/G 兆 零 B Z –> Q/B/S/G 兆 零 G
*
WQ –> Q 萬
WQ –> Q 萬 Q

WQ –> Q 萬 零 B
WQ –> Q 萬 零 S
WQ –> Q 萬 零 G

YQ –> Q 億 零 Q
YQ –> Q 億 零 B
YQ –> Q 億 零 G

*
Chinese numbers => values
Two steps:
Syntactic analysis
Parsing: to derive a syntactic tree (called parse tree) for an input sentence / number.
Result: a phrase structure tree.
Semantic interpretation:
To convert the parse tree

into a semantic / meaning representation,
namely, a value.

Semantic rules for interpretation
We need to define a semantic rule for each grammar rule to specify
how a phrase structure under the grammar rule is

interpreted into a meaning representation, i.e.,
how to convert a syntactic structure into meaning.

Z –> Q/B/S/G 兆 零 Y
sem(Z)
= sem(Q/B/S/G 兆 零 Y)
=
sem(Q/B/S/G)
x sem(兆) +
sem(Y)
*

Example: parsing







Q
G
Z
B
G
Y
x
x
+
x
x
20

Example: semantic interpretation





B
x
x
+
x
x
*
=3
G =3
Q
=3×103

=103 =1012
=6
=6×102

=102 =108
Y=6×1010
G =6
Z =3×1015 + 6×1010 = 3,000,060,000,000,000
=3×1015

*
Generation (i): Head
Given an Arabic number, generate its Chinese counterpart

Format: N = head * pos + tail
Denoted as: head(N, pos) and tail(N, pos), respectively

Given an input number X, how generate it? Heads and then tails

1012|8|4
1012|8|4
integer division!
remainder!

head(X,兆|亿|萬) = X /
tail(X,兆|亿|萬) = X %
Generate its head

gen(head(X,兆|亿|萬)))  a Q-number < 104 Generate its tail gen(tail(X,兆|亿|萬))  a number < 1012|8|4 * Generation (ii): Tail < 104 Generating a Q-numbers X < 104 head(X,千/百/十) = X / tail(X,千/百/十) = X % 103|2|1 103|2|1 Generate its head gen(head(X,千/百/十)))  a Q-number < 10 Generate its tail gen(tail(X,千/百/十))  a number < 103|2|1 Generation: example 1. X=123,456,789,123,456,789 gen(X) = gen(head(X,兆) 兆 gen(tail(X,兆) = gen(123,456) 兆 gen(789,123,456,789) 2. X=123,456 gen(X) = = gen(X,萬) 萬 gen(tail(X,萬)) gen(12) 萬 gen(3,456) 3. X=12 gen(x) = = gen(head(X,十)) 十 gen(tail(x,十)) gen(1) 十 gen(2) 4. gen(1) = 一 gen(2) = 二 * Example: generation of conjunction gen(3,000,060,000,000,000): head = 3000 tail = 60,000,000,000 gen(3000): head = 3 tail = 0 gen(60,000,000,000): head = 600 tail = 0 gen(600): head = 6  六 tail = 0 三千 零六百 億 三千兆零六百億 千 三 零 兆 百 * 六百億 For Chinese, any time when tail is less than 1/10 of pos, insert a conjunction 零 to the output. English part? Interlingua: values English numbers Chinese numbers Others ……. Arabic numbers ??? * * Grammar for English number Exercise Design a grammar for English numbers, covering the range [0, 1,000,000,000,000,000-1], and Use it to analyse the English number for 123,456,789,123 (or 123,456,789,123,456) Design the generation procedure for English numbers and illustrate how it works for a real English number, e.g., 123,456. * Hints In lecture on interlingua, the following was given as the starting point for your design of the grammar for English numbers for HW2: D0 --> {zero}
D –> {one, two , .. nine}
D’ –> {ten, eleven, … nineteen}
T –> {twenty, …ninety}
and then five rules: H- –> D0 | D | D’ | T | T D
subsuming the following:
H- –> D0 
H- –> D 
H- –> D’ 
H- –> T  
H- –> T D
to cover numbers under 100. (Do not add any extra symbol in a rule such as T –> D + D, which is wrong!) 
Do not forget N –> H-, for N is our “axiom” (just like S for sentence). So are rules for Th-, M-, B-, etc.
Following the above fashion, we can have Th- –> D hundred {H-} for numbers in the range [100, 999]. As mentioned in class that people actually say “twenty hundred” and even “ninety nine hundred”, we can extend this rule into the following by replacing D with H-:
Th- –> H- hundred
Th- –> H- hundred H-
Th- –> H- hundred and H-  (For British English)
Originally, Th- is defined to cover [100, 999]. Given the larger coverage of H- that that of D, the Th- rules have certain overgeneration to generate number beyond 999. But conceptually, simply thinking of Th- as for number under 1000 is fine for other rules.

You may merge them into one line (NOT one rule!) as:
Th- –> H- hundred {and} {H-}
where {} means optional. Please check if any number in this range [100, 999] missing before moving on to rules for M-, B-, etc. 

接上页:

M- –> [ ] thousand
M- –> Th- thousand Th-
M- –> Th- thousand and H-
M- –> Th- thousand H-
整理为M- –> Th- billion

…..

[…]billion […]million […]thousand […]

注意:
箭头换成标准箭头符号
中文数字gen(23)
=gen(2,十)+gen(3)
=gen(2)+gen(3)
英文数字gen(23)
=gen(2,tens)gen(3)
=twenty gen(3)

gen(19)是直接得出19的