Tutorial on Semantic Web
Adapted from Ivan Herman, W3C
www.w3.org/People/Ivan/CorePresentations/SWTutorial/
Handout is only a selection of slides in lecture
(‹#›)
‹#›
1
This is just a generic slide set. Should be adapted, reviewed, possibly with slides removed, for a specific event. Rule of thumb: on the average, a slide is a minute…
(‹#›)
‹#›
2
A simplified bookstore data
(dataset “A”)
ISBN Author Title Publisher Year
0006511409X id_xyz The Glass Palace id_qpr 2000
ID Name Homepage
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher’s name City
id_qpr Harper Collins London
(‹#›)
‹#›
3
1st: export your data as a set of relations
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
(‹#›)
‹#›
4
Relations form a graph
the nodes refer to the “real” data or contain some literal
how the graph is represented in machine is immaterial for now
Some notes on the exporting the data
(‹#›)
‹#›
5
(‹#›)
‹#›
6
Another bookstore data
(dataset “F”)
A B C D
1 ID Titre Traducteur Original
2 ISBN 2020286682 Le Palais des Miroirs $A12$ ISBN 0-00-6511409-X
3
4
5
6 ID Auteur
7 ISBN 0-00-6511409-X $A11$
8
9
10 Nom
11 Ghosh, Amitav
12 Besse, Christianne
(‹#›)
‹#›
7
2nd: export your second set of data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
(‹#›)
‹#›
8
3rd: start merging your data
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
(‹#›)
‹#›
9
3rd: start merging your data (cont)
http://…isbn/000651409X
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
http://…isbn/000651409X
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
Same URI!
(‹#›)
‹#›
10
3rd: start merging your data
a:title
Ghosh, Amitav
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
http://…isbn/000651409X
(‹#›)
‹#›
11
User of data “F” can now ask queries like:
“give me the title of the original”
well, … « donnes-moi le titre de l’original »
This information is not in the dataset “F”…
…but can be retrieved by merging with dataset “A”!
Start making queries…
(‹#›)
‹#›
12
We “feel” that a:author and f:auteur should be the same
But an automatic merge doest not know that!
Let us add some extra information to the merged data:
a:author same as f:auteur
both identify a “Person”
a term that a community may have already defined:
a “Person” is uniquely identified by his/her name and, say, homepage
it can be used as a “category” for certain type of resources
However, more can be achieved…
(‹#›)
‹#›
13
3rd revisited: use the extra knowledge
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
(‹#›)
‹#›
14
User of dataset “F” can now query:
“donnes-moi la page d’accueil de l’auteur de l’original”
well… “give me the home page of the original’s ‘auteur’”
The information is not in datasets “F” or “A”…
…but was made available by:
merging datasets “A” and datasets “F”
adding three simple extra statements as an extra “glue”
Start making richer queries!
(‹#›)
‹#›
15
Using, e.g., the “Person”, the dataset can be combined with other sources
For example, data in Wikipedia can be extracted using dedicated tools
e.g., the “dbpedia” project can extract the “infobox” information from Wikipedia already…
Combine with different datasets
(‹#›)
‹#›
16
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
http://dbpedia.org/../Amitav_Ghosh
r:type
foaf:name
w:reference
(‹#›)
‹#›
17
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name
w:reference
w:author_of
w:author_of
w:author_of
w:isbn
(‹#›)
‹#›
18
Merge with Wikipedia data
Besse, Christianne
Le palais des miroirs
f:original
f:nom
f:traducteur
f:auteur
f:titre
http://…isbn/2020386682
f:nom
Ghosh, Amitav
http://www.amitavghosh.com
The Glass Palace
2000
London
Harper Collins
a:title
a:year
a:city
a:p_name
a:name
a:homepage
a:author
a:publisher
http://…isbn/000651409X
http://…foaf/Person
r:type
r:type
http://dbpedia.org/../Amitav_Ghosh
http://dbpedia.org/../The_Hungry_Tide
http://dbpedia.org/../The_Calcutta_Chromosome
http://dbpedia.org/../Kolkata
http://dbpedia.org/../The_Glass_Palace
r:type
foaf:name
w:reference
w:author_of
w:author_of
w:author_of
w:born_in
w:isbn
w:long
w:lat
(‹#›)
‹#›
19
It may look like it but, in fact, it should not be…
What happened via automatic means is done every day by Web users!
The difference: a bit of extra rigour so that machines could do this, too
Is that surprising?
(‹#›)
‹#›
20
We could add extra knowledge to the merged datasets
e.g., a full classification of various types of library data
geographical information
etc.
This is where ontologies, extra rules, etc, come in
ontologies/rule sets can be relatively simple and small, or huge, or anything in between…
Even more powerful queries can be asked as a result
It could become even more powerful
(‹#›)
‹#›
21
What did we do? (alternate view)
Inferencing
Query and Update
Web of Data Applications
Browser Applications
Stand Alone Applications
Common “Graph” Format &
Common Vocabularies
“Bridges”
Data on the Web
(‹#›)
‹#›
22
we saw RDF basics yesterday
triples: subject, predicate, object
URI: Uniform Resource Indicator
IRI: Internationalized …
XML and Turtle are two syntax options
(‹#›)
‹#›
RDF/XML principles
«Element for http://…/isbn/2020386682»
«Element for original»
«Element for http://…/isbn/000651409X»
«/Element for original»
«/Element for http://…/isbn/2020386682»
«Element for http://…/isbn/2020386682»
«Element for titre»
Le palais des mirroirs
«/Element for titre»
«/Element for http://…/isbn/2020386682»
Encode nodes and edges as elements or literals:
f:original
f:titre
http://…isbn/202038662
Le palais des miroirs
http://…isbn/0006514X
(‹#›)
‹#›
24
RDF/XML principles (cont.)
«Element for original»
«/Element for f:original»
Encode the resources (i.e., the nodes):
f:original
f:titre
http://…isbn/202038662
Le palais des miroirs
http://…isbn/0006514X
(‹#›)
‹#›
25
RDF/XML principles (cont.)
Encode the properties (i.e., edges) in their own namespaces:
f:original
f:titre
http://…isbn/202038662
Le palais des miroirs
http://…isbn/0006514X
(‹#›)
‹#›
26
Examples of RDF/XML “simplifications”
Le palais des mirroirs
Object references can be put into attributes
Several properties on the same resource
There are other “simplification rules”, see the “RDF/XML Serialization” document for details
(‹#›)
‹#›
27
Consider the following statement:
“the publisher is a «thing» that has a name and an address”
Until now, nodes were identified with a URI. But…
…what is the URI of «thing»?
“Internal” nodes
London
Harper Collins
a:city
a:p_name
a:publisher
http://…isbn/000651409X
(‹#›)
‹#›
28
One solution: create an extra URI
The resource will be “visible” on the Web
care should be taken to define unique URI-s
(‹#›)
‹#›
29
Internal identifier (“blank nodes”)
Internal = these resources are not visible outside
_:A234 a:p_name “HarpersCollins”.
London
Harper Collins
a:city
a:p_name
a:publisher
http://…isbn/000651409X
(‹#›)
‹#›
30
Blank nodes: the system can do it
…
Let the system create a “nodeID” internally (you do not really care about the name…)
London
Harper Collins
a:city
a:p_name
a:publisher
http://…isbn/000651409X
Exercise: complete fragment so it validates
https://www.w3.org/RDF/Validator
(‹#›)
‹#›
31
Same in Turtle
a:p_name “HarpersCollins”;
…
].
London
Harper Collins
a:city
a:p_name
a:publisher
http://…isbn/000651409X
(‹#›)
‹#›
32
Blank nodes require attention when merging
blanks nodes with identical nodeID-s in different graphs are different
implementations must be careful…
Many applications prefer not to use blank nodes and define new URIs “on-the-fly”
From a logic point of view, blank nodes represent an “existential” statement
“there is a resource such that…”
More on blank nodes
(‹#›)
‹#›
33
For example, using Python+RDFLib:
a “Graph” object is created
the RDF file is parsed and results stored in the Graph
the Graph offers methods to retrieve (or add):
triples
(property,object) pairs for a specific subject
(subject,property) pairs for specific object
etc.
the rest is conventional programming…
Similar tools exist in Java, PHP, etc.
RDF in programming practice
(‹#›)
‹#›
34
Python example using RDFLib
# create a graph from a file
graph = rdflib.Graph()
graph.parse(“filename.rdf”, format=”rdfxml”)
# take subject with a known URI
subject = rdflib.URIRef(“URI_of_Subject”)
# process all properties and objects for this subject
for (s,p,o) in graph.triples((subject,None,None)) :
do_something(p,o)
Environments merge graphs automatically
e.g., in Python+RDFLib, the Graph can load several files
the load merges the new statements automatically
(‹#›)
‹#›
35
First step towards the “extra knowledge”:
define the terms we can use
what restrictions apply
what extra relationships are there?
Officially: “RDF Vocabulary Description Language”
the term “Schema” is retained for historical reasons…
Need for RDF schemas
(‹#›)
‹#›
36
Think of well known traditional vocabularies:
use the term “novel”
“every novel is a fiction”
“«The Glass Palace» is a novel”
etc.
RDFS defines resources and classes:
everything in RDF is a “resource”
“classes” are also resources, but…
…they are also a collection of possible resources (i.e., “individuals”)
“fiction”, “novel”, …
Classes, resources, …
(‹#›)
‹#›
37
Relationships are defined among resources:
“typing”: an individual belongs to a specific class
“«The Glass Palace» is a novel”
to be more precise: “«http://…/000651409X» is a novel”
“subclassing”: all instances of one are also the instances of the other (“every novel is a fiction”)
RDFS formalizes these notions in RDF
Classes, resources, … (cont.)
(‹#›)
‹#›
38
RDFS defines the meaning of these terms
(these are all special URI-s, we just use the namespace abbreviation)
Classes, resources in RDF(S)
rdf:type
#Novel
http://…isbn/000651409X
rdfs:Class
rdf:type
(‹#›)
‹#›
39
Source https://www.w3.org/TR/rdf-schema/
(‹#›)
‹#›
Schema example in RDF/XML
The schema part:
The RDF data on a specific novel:
(‹#›)
‹#›
41
An aside: typed nodes in RDF/XML
…
A frequent simplification rule: instead of
use:
ie:
…
…
(‹#›)
‹#›
42
A resource may belong to several classes
rdf:type is just a property…
“«The Glass Palace» is a novel, but «The Glass Palace» is also an «inventory item»…”
i.e., it is not like a datatype!
The type information may be very important for applications
e.g., it may be used for a categorization of possible nodes
probably the most frequently used RDF property…
(remember the “Person” in our example?)
Further remarks on types
(‹#›)
‹#›
43
is not in the original RDF data…
…but can be inferred from the RDFS rules
RDFS environments return that triple, too
Inferred properties
rdf:type
#Novel
http://…isbn/000651409X
#Fiction
rdf:subClassOf
rdf:type
(
(‹#›)
‹#›
44
The RDF Semantics document has a list of (33) entailment rules:
“if such and such triples are in the graph, add this and this”
do that recursively until the graph does not change
The relevant rule for our example:
Inference: let us be formal…
If:
uuu rdfs:subClassOf xxx .
vvv rdf:type uuu .
Then add:
vvv rdf:type xxx .
(‹#›)
‹#›
45
Property is a special class (rdf:Property)
properties are also resources identified by URI-s
There is also a possibility for a “sub-property”
all resources bound by the “sub” are also bound by the other
Range and domain of properties can be specified
i.e., what type of resources serve as object and subject
Properties
(‹#›)
‹#›
46
Properties are also resources (named via URI–s)…
So properties of properties can be expressed as… RDF properties
this twists your mind a bit, but you can get used to it
For example, (P rdfs:domain C) means:
P is a property
C is a class
when using P, I can infer that the “subject” is of type C
Properties (cont.)
(‹#›)
‹#›
47
Property specification example
(‹#›)
‹#›
48
Property specification serialized
In RDF/XML:
:title
rdf:type rdf:Property;
rdfs:domain :Fiction;
rdfs:range rdfs:Literal.
In Turtle:
(‹#›)
‹#›
49
Again, new relations can be deduced. Indeed, if
What does this mean?
:title
rdf:type rdf:Property;
rdfs:domain :Fiction;
rdfs:range rdfs:Literal.
then the system can infer that:
(‹#›)
‹#›
50
Literals may have a data type
floats, integers, booleans, etc., defined in XML Schemas
full XML fragments
(Natural) language can also be specified
Literals
(‹#›)
‹#›
51
Examples for datatypes
:page_number “543”^^xsd:integer ;
:publ_date “2000”^^xsd:gYear ;
:price “6.99”^^xsd:float .
(‹#›)
‹#›
52
Examples for language tags
:title “The Glass Palace”@en ;
fr:titre “Le palais des mirroirs”@fr .
(‹#›)
‹#›
53