程序代做CS代考 algorithm COMP3430 / COMP8430 Data wrangling

COMP3430 / COMP8430 Data wrangling
Lecture 24: Ontology matching (Lecturer: )
Based on slides by Anika Gross and
(University of Leipzig)

Lecture outline
● What are ontologies
● Ontology annotations and mappings
● Ontology evolution and trend discovery

What are ontologies? (1)
Anatomic Structure, System, or Substance
● Structured representations of knowledge
● Very large ontologies in many domains, for example in the biomedical domain
Organ Kidney Lung
Tissue Skin
… …
Anatomy Medicine Chemistry
Molecular biology
For examples see: https://en.wikipedia.org/wiki/Ontology_(information_science)#Published_examples

What are ontologies? (2)
● Often multiple interrelated ontologies in a domain (e.g. anatomy) ● We need to identify overlapping information between ontologies ● Create mappings between ontologies
Mouse Anatomy
MeSH
SNOMED
UMLS
FMA
NCI Thesaurus GALEN

Ontology based annotations
● Standardised semantic descriptions of object properties
Genes, proteins, …
● Applications:
– Semantic search,
navigation, etc.
– Functional analysis:
Identification of significant characteristics of specific gene/proteins groups
Electronic health records
Publications
Ensembl ID
ENSP00000352999
Ensembl
GO ID
GO:0006915 (apoptosis)
Annotation Mapping

ENSP00000344151
P10646
ENSP00000230480
(TFPI1_HUMAN)
GO:0015808 (L-alanine transport)
GO:0007596
GO:0005615 (extracellular space)
(blood coagulation)

Ontology mappings and alignments
● Overlapping ontologies allow the creation of mappings/alignments
● Useful for data integration and analysis across sources
● Ontology mapping: Set of semantic correspondences between
concepts of different ontologies
● Manual identification or (semi-)
automatic matching approaches
● Use of mappings:
– Ontology merging (such as creation of an integrated cross-species
anatomy ontology)
– Knowledge transfer (for example
experiments for different species)
– Ontology curation (find missing ontology annotations)
body = limbs =
lower extremities upper extremities
= =
< < body limbs limb segments head neck tail head neck tail trunk ... Evolution of ontology-based mappings ● Ontologies are not static! ● Research, new knowledge → Continuous changes ● Release of new versions ● Ontology changes: additions deletions updates 0 Ontology matching workflow ● Manual creation of mappings between very large ontologies is too labor-intensive ● Semi-automatic generation of semantic correspondences: linguistic, structural, instance-based matching techniques (see lecture on schema matching, lecture 11) O1 O2 Matching further input, e.g. instances, dictionary Mapping sim(O1.a, O2.b) = 0.8 sim(O1.a, O2.c) = 0.5 sim(O1.c, O2.c) = 1.0 Pre- processing Post- processing ... Mapping composition ● Indirect composition-based matching ● Via intermediate ontology (IO) or hub ontology (HO), synonym dictionary, etc. ? O1 O2 IO MA_0001421 UBERON:0001092 NCI_C32239 Name: cervical vertebra 1 Synonym: cervical vertebra 1 Name: C1 Vertebra Synonym: Atlas Name: atlas Synonym: C1 vertebra ● Find new correspondences via composition ● Reuse existing mappings to increase match quality and save computation time Indirect matching ● Use mappings to intermediate ontologies IO1, ..., IOk to indirectly match O1 and O2 ● Reduce matching effort by reusing mappings to IO → Very fast composition O1 IO1 IO2 IOk O1 → IO should have a significant overlap with O1 and O2 → IO1, ..., IOk may complement each other → Centralized hub HO → Many mappings to other ontologies → Onew aligned with any Oi via HO O2 ... ... Onew HO O2 On Ontology evolution ● Unstable ontology regions – Many modifications → Focus of recent development – Impact of changes on ontology-based algorithms or applications → Redo analyses? ● Stable ontology regions – Already completed? – Low interest so far → Further changes necessary? Where are the changes located? How has the work progressed? Potential for future development? Are there (un)stable ontology regions? Trend discovery ● Trend discovery based on sliding windows ● Monitor region changes over long periods of time – Ontology O, ontology region of interest OR – Time interval (tstart, tend) – Sliding window of size ω – Step width Δ ● Call region discovery algorithm within ω – Collect change intensities for region of interest over time O1 O2 O3 O4 O5 O6 O7 ... On-2 On-1 On tend O8 ω = 4, Δ=1 ... tstart Outlook and research directions ● Ontologies are becoming increasingly important – In the life sciences (for example, conference series Data Integration in the Life Sciences – DILS) – Knowledge-bases and the semantic Web – Internet of Things ● Various research areas – Learning to match and map ontologies (semi-) automatically – Mapping of dynamic ontologies – Parallel algorithms for large-scale ontology matching, mapping and evolution