COMP3430 / COMP8430 Data wrangling
Lecture 24: Ontology matching (Lecturer: )
Based on slides by Anika Gross and
(University of Leipzig)
Lecture outline
● What are ontologies
● Ontology annotations and mappings
● Ontology evolution and trend discovery
What are ontologies? (1)
Anatomic Structure, System, or Substance
● Structured representations of knowledge
● Very large ontologies in many domains, for example in the biomedical domain
Organ Kidney Lung
Tissue Skin
… …
Anatomy Medicine Chemistry
Molecular biology
For examples see: https://en.wikipedia.org/wiki/Ontology_(information_science)#Published_examples
What are ontologies? (2)
● Often multiple interrelated ontologies in a domain (e.g. anatomy) ● We need to identify overlapping information between ontologies ● Create mappings between ontologies
Mouse Anatomy
MeSH
SNOMED
UMLS
FMA
NCI Thesaurus GALEN
Ontology based annotations
● Standardised semantic descriptions of object properties
Genes, proteins, …
● Applications:
– Semantic search,
navigation, etc.
– Functional analysis:
Identification of significant characteristics of specific gene/proteins groups
Electronic health records
Publications
Ensembl ID
ENSP00000352999
Ensembl
GO ID
GO:0006915 (apoptosis)
Annotation Mapping
ENSP00000344151
P10646
ENSP00000230480
(TFPI1_HUMAN)
GO:0015808 (L-alanine transport)
GO:0007596
GO:0005615 (extracellular space)
(blood coagulation)
Ontology mappings and alignments
● Overlapping ontologies allow the creation of mappings/alignments
● Useful for data integration and analysis across sources
● Ontology mapping: Set of semantic correspondences between
concepts of different ontologies
● Manual identification or (semi-)
automatic matching approaches
● Use of mappings:
– Ontology merging (such as creation of an integrated cross-species
anatomy ontology)
– Knowledge transfer (for example
experiments for different species)
– Ontology curation (find missing ontology annotations)
body = limbs =
lower extremities upper extremities
= =
< <
body
limbs
limb segments
head neck tail
head neck tail trunk
...
Evolution of ontology-based mappings
● Ontologies are not static!
● Research, new knowledge → Continuous changes ● Release of new versions
● Ontology changes:
additions deletions updates
0
Ontology matching workflow
● Manual creation of mappings between very large ontologies is too labor-intensive
● Semi-automatic generation of semantic correspondences: linguistic, structural, instance-based matching techniques
(see lecture on schema matching, lecture 11)
O1 O2
Matching
further input,
e.g. instances, dictionary
Mapping
sim(O1.a, O2.b) = 0.8 sim(O1.a, O2.c) = 0.5 sim(O1.c, O2.c) = 1.0
Pre- processing
Post- processing
...
Mapping composition
● Indirect composition-based matching
● Via intermediate ontology (IO) or hub
ontology (HO), synonym dictionary, etc.
?
O1
O2
IO
MA_0001421
UBERON:0001092
NCI_C32239
Name: cervical vertebra 1
Synonym: cervical vertebra 1
Name: C1 Vertebra
Synonym: Atlas
Name: atlas
Synonym: C1 vertebra
● Find new correspondences via composition
● Reuse existing mappings to increase match quality and save computation
time
Indirect matching
● Use mappings to intermediate ontologies IO1, ..., IOk to indirectly match O1 and O2
● Reduce matching effort by reusing mappings to IO → Very fast composition
O1
IO1 IO2
IOk
O1
→ IO should have a significant overlap with O1 and O2
→ IO1, ..., IOk may complement each other
→ Centralized hub HO
→ Many mappings to other ontologies → Onew aligned with any Oi via HO
O2
...
...
Onew HO
O2 On
Ontology evolution
● Unstable ontology regions
– Many modifications → Focus of recent development
– Impact of changes on ontology-based algorithms or applications → Redo analyses?
● Stable ontology regions
– Already completed?
– Low interest so far → Further changes necessary?
Where are the changes located?
How has the work progressed?
Potential for future development?
Are there (un)stable ontology regions?
Trend discovery
● Trend discovery based on sliding windows
● Monitor region changes over long periods of time – Ontology O, ontology region of interest OR
– Time interval (tstart, tend)
– Sliding window of size ω – Step width Δ
● Call region discovery algorithm within ω
– Collect change intensities for region of interest over time
O1 O2 O3 O4 O5 O6 O7
... On-2 On-1 On
tend
O8
ω = 4, Δ=1
...
tstart
Outlook and research directions
● Ontologies are becoming increasingly important
– In the life sciences (for example, conference series Data Integration
in the Life Sciences – DILS)
– Knowledge-bases and the semantic Web
– Internet of Things
● Various research areas
– Learning to match and map ontologies (semi-) automatically
– Mapping of dynamic ontologies
– Parallel algorithms for large-scale ontology matching, mapping
and evolution