Development of a
reasoning system In practice
Igor Malin, Sean Shang, David Gu Members of WTG Data Science Team
Problem statement / Why?
• WiseTechwritessoftwareforglobal logistics, which includes a customs function
• World Customs Organisation (WCO) defines the internationally standardized system of names and numbers to classify traded products
• On import and export customs brokers have to manually classify goods according to the WCO rules and hierarchy when goods cross international borders
• With the increasing volumes of global trade, increase of regulatory complexity, automation becomes essential
2
Example of rules
• Section I: Live animals; animal products
• 1.- Any reference in this Section to a particular genus or species of an animal, except where
the context otherwise requires, includes a reference to the young of that genus or species.
• 2.- Except where the context otherwise requires, throughout this Schedule any reference to “dried”
products also covers products which have been dehydrated, evaporated or freeze-dried.
• Chapter 3: Fish and crustaceans, molluscs and other aquatic invertebrates
• 1.- This Chapter does not cover :
• Mammals of heading 01.06;
• Meat of mammals of heading 01.06 (heading 02.08 or 02.10);
• Fish or crustaceans, molluscs or other aquatic invertebrates, dead and unfit or unsuitable for human consumption by reason of either their species or their condition
• Caviar or caviar substitutes prepared from fish eggs (heading 16.04).
• Heading 03.04 – Fish fillets and other fish meat (whether or not minced), fresh, chilled or frozen.
• This heading covers fish fillets and other fish meat (whether or not minced) in the following states only :
• Fresh or chilled, whether or not packed with salt or ice or sprinkled with salt water as a temporary preservative during transport.
• Frozen,oftenpresentedintheformoffrozenblocks.
3
Manual classification process example
• Section I: Live animals; animal products? => Yes! =>
• Ok, which Chapter?
• Chapter 01 Live animals? => Maybe?
• Oh wait: This Chapter covers all live animals except : (a) Fish
• Chapter 02: Meat And Edible Meat Offal => Maybe?
• Oh wait: This Chapter applies to meat of all animals (except fish and crustaceans) ☺
• Chapter 03: Fish And Crustaceans, Molluscs And Other Aquatic Invertebrates => ??
• Yes!
• Which Heading?
• Heading 03.01 – Live fish
• Heading 03.02 – Fish, fresh or chilled, excluding fish fillets
• Heading 03.03 – Fish, frozen, excluding fish fillets
• Heading 03.04 – Fish fillets and other fish meat (whether or not minced), fresh, chilled or frozen
4
How do we approach solving this problem?
• Partsofthebigpicture:
• Ontologies to be able to reason about the objects we classify, their characteristics and inter-
relations. E.g.
• Fish is an animal
• Animalcanhavetwodisjointattributes:live,dead
• Fishtreatmentformscanbe:Fresh,Chilled,Frozen,Cooked
• Fillet is a type of meat
• Fish consists of meat
• Legal text parsing for rule generation to be able to understand what objects belong to what classes:
• “This Chapter covers all live animals except : (a) Fish”
• Rule: Chapter 1 -> includes -> animals (live) & excludes -> fish
• Inference/Reasoning through combination of ontology information, rules, and input object attributes
• Input: “Frozen fish”, Question: “Is it part of Chapter 1?”
• Reasoning: Evaluation or Satisfaction
5
Ontologies
• Local
• Business-specific representation
• External
• Lexical Database
• WordNet • Encyclopedia
• DBpedia, Wikidata
• Example: horses
• Why do we need to use it?
• Domain-Specific
• AGROVOC, Food and Agriculture
• Integration
• From Annotations to Linked Data
• Reuse existing knowledge bases/resources
• Design your ontology carefully to make the integration easier
6
Hierarchies
7
A Controlled Vocabulary
• Labels
• Taxonomies • Relations
8
Reasoning
• RDFTriples
• Deduction
• SPARQLQuery
• RDFSandOWLreasoner • DescriptionLogic
9
SPARQL Query
10
SPARQL Query
11
RDFS and OWL Reasoner
• DescriptionLogic
• TBox(terminologicalbox)
• Typicallydescribeconcepthierarchies
• Classificationrules • ABox(assertionalbox)
• Typically in the form “A is an instance of B”
• Instance type assertion
• RDFS(RDFSchema)
• OWL(WebOntologyLanguage)
12
rdfs:subClassOf
• Given
• ?xrdfs:subClassOf?y. • ?instancerdf:type?x.
• Deduce
• ?xrdf:type?y.
• Example
• Given
• :Poultryrdfs:subClassOf:Animal.
• :Chicken rdf:type :Poultry . • Deduce
• :Chicken rdf:type :Animal .
13
owl:TransitiveProperty
• Given
• ?p rdf:type owl:TransitiveProperty .
• ?y?p?x.
• ?z ?p ?y .
• Deduce
• ?z ?p ?x .
• Example
• Given
• :friendOf rdf:type owl:TransitiveProperty .
• :David :friendOf :Igor .
• :Sean :friendOf :David .
• Deduce
• :Sean :friendOf :Igor
14
Deduce new triples by SPARQL query
15
Creating rules
• Howcanwecreaterules • Challenges
16
Syntax
• In linguistics, we can use Context-free grammar to denote the syntax of a language, for example in common English, we have:
• Noun -> meat | fat | pork | …
• Verb -> have | has | contain | …
• Adjective -> fresh | edible | live | …
• Determiner -> the | a | an | this | that | …
• Preposition->from|to|of|…
• Conjunction->and|or|but|…
• S->NPVP
• NP -> Determiner Nominal | Proper-Noun | …
• Nominal -> Noun Nominal | Noun
• VP->Verb|VerbNP|VerbNPPP|VerbPP
• PP -> Preposition NP
• …
17
Parsing
• For a sentence: The dog chases the cat. We have a parsing tree (If you have taken the compiler class, this is very similar to the Abstract Syntax Tree):
S
NP
VP
Det
Nom Noun
Verb
NP
Det
Nom Noun
cat
The
dog
chases
the
18
“Legal English”
• In our case, the story is slightly different.
• The texts we are dealing with were written in a strict subset of English that is used by the lawyers to
write texts that have legal effects.
• In Legal English the punctuations are playing an important role.
• And clearly we can make use of them, just like what we do while parsing source code.
19
Example
• Originalsentence:
• Pig fat, free of lean meat and poultry fat not rendered or otherwise
extracted .
• Thismeans:
• and
,
,
,
fresh, chilled, frozen, salted, in brine, dried or smoked
Pig fat (free of lean meat)
poultry fat (not rendered or otherwise
extracted)
fresh, chilled, frozen, salted, in brine, dried or smoked
• We can summarize the syntax rules as:
• S -> [List-of NP] [List-of ADJ]
• NP -> Nominal | Nominal “,” ADJ PP “,”
• (‘a : Syntax-Rule) => List-of ‘a -> ‘a “,” [List-of ‘a] | ‘a CONJ ‘a
.
20
In action
• We have developed a system in Common Lisp that works as a DSL to deal with syntax.
• For an easier implementation of parser, the syntax from the previous page can be written as:
• S := [List-of NP] [List-of ADJ] ;;
• NP:= Nominal | Nominal “,” ADJ PP “,” ;;
• {a} List-of := a “,” [List-of a] | a CONJ a ;;
21
Rules generation
• Once we have the parse tree of a sentence, we can do rules generation
• Again, if you have taken a compiler course or done compiler work, this is similar to AST transformation
• We perform an unification based pattern matching to the parse trees, and apply certain rules to them
• E.g.: the previous example will be turned into something like:
• (descs (or (adj “fresh”) (adj “chilled”) (adj “smoked”) (pp (prep “in”) (noun “brine”)) (adj “dried”) (adj “frozen”)))
• This is then good enough to be translated into RDFs
• We generate ontology from (Formal(ish)) Natural Language
22
Challenges so far
• Some sentences are even difficult for human to interpret:
• Pig fat, free of lean meat, and poultry fat, not rendered or otherwise
extracted, fresh, chilled, frozen, salted, in brine, dried or smoked.
• EnglishgrammarisCHAOTIC
• The loss of synthetic features (evolving into an isolated language) has made it is difficult to determine the
POS of a word in the sentence without knowing the meaning of the entire sentence.
• We all know that there is no such thing that really understands natural languages
• Simple example: “pig fat”, is parsed into [Nominal [Noun pig] [Noun fat]]
• In a language that is more synthetic, e.g. Latin, this is:
• “ARVĪNAE PORCŌRUM” / “ARVĪNAE PORCĪ” (depends on the number), which literally translate into:
(THE) FATS (OF) PIGS / PIG
• Anotherexample:
• English: Motorcycle helmet (“Helmet FOR a motorcycle”, but not “motorcycle AND a helmet)
• Russian: Mотоциклетный шлем (The part in red converts the word to an adjective)
23
Challenges so far
• It is necessary to identify all of the patterns appear in the documents • This can be solved by implementing them incrementally:
1. We define a few rules
2. Then check them with the sentences one by one
3. Once there is a sentence failed to match create a new rule for it, go to 1
4. Wehavefinishedthesystem
5. Grab a cup of coffee☺
* Actually, this is how the linguists conclude the syntax of languages
24
Summary
• Study this course boys and girls!☺
• Alsorecommended:
• COMP3131/9102(Compilers),
• COMP 3161/9164 (Concepts of Programming Languages),
• COMP 4141 (Theory of Computation),
• COMP 6721 (Formal Methods)
25
Questions?
26