www.cardiff.ac.uk/medic/irg-clinicalepidemiology
Semantic Web
Copyright By PowCoder代写 加微信 powcoder
Information modelling & database systems
Lecture content
Semantic Web
data integration
URI = Uniform Resource Identifier
RDF = Resource Description Framework
RSS = RDF Site Summary
Semantic Web
conceived by -Lee: “a web of data that can be processed directly & indirectly by machines”
data itself becomes part of the Web and can be processed independently of application, platform or domain
information is currently shared on the Web in the form of documents
computers can search for these documents
… but humans have to read & interpret them before any useful information can be extrapolated
a software consultant has just received a new project to create a series of SOAP-based Web services
they need to learn a bit about SOAP, so they search for the term using a search engine
the search results will contain documents about soap operas, toiletries, detergents as well as SOAP-based Web services
different semantic associations of the word ‘soap’
search results will vary in relevance
manually sifting through a lot information
Names vs. IDs
names are meant for humans
IDs are meant for computers
e.g. the word ‘soap’ is inherently ambiguous
humans will have no problems disambiguating it using the context (human intelligence)
computers will also have some success disambiguating it using the context (artificial intelligence)
humans will typically modify the search query adding other relevant terms, but that approach relies on human improvisation
one way of systematically tackling the ambiguity problem on the Web is assigning IDs to ‘things’
Semantic Web
does not rely on artificial intelligence, i.e. guesstimates
instead it relies on structured information & inference rules that allow it to ‘understand’ the relationship between different data resources
the computer does not really understand information the way a human can
… but it can be given enough information to make logical connections & decisions rather than to guess
Data integration
Semantic Web = a “web of data” that not only harnesses the seemingly endless amount of data, but also connects the data
the ability to connect data not only on the Web, but also in relational databases and other types of repositories increases the usability of data available
data integration applications for connecting disparate sources, typically require one-to-one mappings between elements (i.e. local IDs) in each data repository
Semantic Web supports efficient data integration based on built-in, universally available semantic information that describes each resource (i.e. global IDs)
Semantic Web acts as one huge database
Semantics & relationships
Semantic Web requires adding semantic metadata
(data about data) to information resources
semantic metadata allows computers to effectively process the data & make inferences about the data
XML has paved the road by adding metadata in the form of human-readable tags
before XML, data was stored in flat files and databases with proprietary formats
XML made data interoperable within a single domain, i.e. the domain defined by an XML schema
XML provides syntactic interoperability only when both parties know & understand the element names used
Semantic Web requires a universal (or global) schema
Semantic Web standards & technologies
the very minimum needed to enable the Semantic Web includes the means of:
uniquely identifying resources
defining relationships between them
these requirements are addressed by using:
URI = Uniform Resource Identifier
RDF = Resource Description Framework
an official W3C recommendation, RDF is an XML-based standard for describing resources
RDF builds on existing XML and URI technologies, using a URI to identify every resource & using URIs to make statements about resources
Uniform Resource Identifier (URI)
Uniform Resource Identifier (URI)
resource – anything that has an identity
e.g. an electronic document, an image, a service, a collection of other resources…
NOTE: not all resources are network retrievable!
e.g. human beings, corporations & books in a library can also be considered resources
resource = the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping
identifier = an object that can act as a reference to something that has identity
Uniform Resource Identifier (URI)
the Web is an information space – URIs are the points in that space
URI is simply a Web identifier
a sequence of characters with a restricted syntax
e.g. the strings starting with ‘http:’ or ‘ftp:’ found on the Web
URI can be further classified as a locator (URL) or a name (URN)
URL (Uniform Resource Locator) – identifies a resource via a representation of its primary access mechanism
URN (Uniform Resource Name) – persistent labelling of a resource with a globally unique identifier
Example URIs
ftp://ftp.is.co.za/rfc/rfc1808.txt
gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
http://www.math.uio.no/faq/compression-faq/part1.html
news:comp.infosystems.www.servers.unix
telnet://melvyl.ucop.edu/
URI syntax
a restricted set of characters: digits, letters & a few graphic symbols
reserved characters – their usage within the URI component is limited to their reserved purpose
reserved = “;” | “/” | “?” | “:” | | “&” | “=” | “+” | “$” | “,”
unreserved characters – allowed in a URI but without a reserved purpose
upper & lower case letters, decimal digits & a limited set of punctuation marks and symbols
unreserved = alphanum | mark
mark = “–” | “_” | “.” | “!” | “~” | “*” | “‘” | “(” | “)”
URI syntax
character must be escaped if it does not have a representation using an unreserved character
i.e. does not correspond to a printable character of the US-ASCII coded character set
… or corresponds to any US-ASCII character that is not allowed in an URI, as explained previously
escaped octet – encoded as a character triplet, consisting of the percent character “%” followed by two hexadecimal digits
e.g. “%20″ = the US-ASCII space character, i.e. ” ”
escaped = “%” hex hex
hex = digit | “A” | “B” | “C” | “D” | “E” | “F” | “a” | “b” | “c” | “d” | “e” | “f”
URI syntax
http://meyerweb.com/eric/tools/dencoder/
URI syntax: excluded URI characters
the reasons for exclusion of some US-ASCII characters
control characters in the US-ASCII coded character set are not allowed, both because they are non-printable and are likely to be misinterpreted
control =
space character may disappear or be introduced when transcribed, typeset or word-processed
whitespace is also used to delimit URI in many contexts
space =
URI syntax: excluded URI characters
delims = “<" | ">” | “#” | “%” | <">
“<" and ">” and (“) are often used as the delimiters around URI in text documents & protocol fields
“#” is used to delimit a URI from a fragment identifier in URI references
“%” is used for the encoding of escaped characters
unwise = “{” | “}” | “|” | “\” | “^” | “[” | “]” | “`”
other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters
URI syntactic components
the URI syntax is dependent upon the scheme
scheme-specific-part does not have to have any general structure or set of semantics common among all URIs
however, a subset of URIs do share a common syntax for representing hierarchical relationships within the namespace
URI scheme
the top level of a URI
most schemes were originally designed to be used with a particular protocol, and often have the same name
… but URI schemes are not protocols themselves!
e.g. the http scheme is mainly used for interacting with Web resources using HyperText Transfer Protocol
… but URIs within the http scheme are also used for other purposes, e.g. RDF resource identifiers and XML namespaces, which are not related to the protocol
some schemes are not associated with any protocol (e.g. file) or do not use the name of a protocol as their prefix (e.g. news)
URI schemes should be registered with IANA (Internet Assigned Numbers Authority)
URI authority
naming authority: the namespace defined by the remainder of the URI is governed by that authority
typically defined by an Internet-based server or a scheme-specific registry of naming authorities
authority = server | reg_name
preceded by a double slash “//”, terminated by the next slash “/”, question-mark “?” or the end of the URI
contains data, specific to the authority (or the scheme if there is no authority component), identifying the resource within the scope of that scheme and authority
the path may consist of a sequence of path segments separated by a single slash “/” character
each path segment may include a sequence of parameters, indicated by the semicolon “;” character
the parameters are not significant to the parsing of relative references
a string of information to be interpreted by the resource
http://www.ncbi.nlm.nih.gov/pubmed?term=Spasic%2C%20Irena%5BFull%20Author%20Name%5D&cmd=DetailsSearch&report=medline&format=text
http://www.ncbi.nlm.nih.gov/books/NBK3862/
Resource Description Framework (RDF)
Resource Description Framework (RDF)
a standard model for data interchange on the Web
extends the linking structure of the Web
RDF lets us use URIs to make statements about resources using triples: (subject, predicate, object)
uses URIs to name the triple elements, i.e. subject, predicate & object
RDF example
English statement:
http://www.example.org/index.html has a creator whose value is
RDF statement:
subject http://www.example.org/index.html
predicate http://purl.org/dc/elements/1.1/creator
object http://www.example.org/staffid/85740
NOTE: URIs instead of names such as ‘creator’ & ‘ ‘!
RDF statements
RDF statements are similar to a number of other formats for recording information, e.g.
rows in a simple relational database
XML elements in an XML document
simple assertions in formal logic
information in these formats can be treated as RDF statements
different formats allows RDF to be used as a unifying model for integrating data from many sources
RDF example
RDF statements may be represented as graphs
nodes: subject & object
arc (edge): predicate
labelled directed graph: arcs have labels & point in a specific direction, from subject to object
http://www.example.org/index.html
http://www.example.org/staffid/85740
http://purl.org/dc/elements/1.1/creator
August 16, 1999
http://www.example.org/terms/creation-date
http://www.example.org/terms/language
RDF: URIs vs. literals
RDF permits the objects of statements to be constant values (called literals) represented by character strings
… but not the subjects or predicates!
literals are used to identify values such as numbers & dates by means of a lexical representation
resources (URIs) vs. values (literals)
URIs are shown as ellipses, literals are shown as boxes
anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals
RDF: URIs vs. literals
example: easier to use the literal “7” than the URI http://dbpedia.org/resource/7_(number)
literals are usually abstract values & describing them in most cases is not necessary nor practical
literals are end nodes in an RDF graph that do not branch out
similar to the OO model: URI object, literal object property, which usually belongs to a primitive data type, which is represented by a single value
using URIs in RDF statements allows us to begin to develop & use a controlled vocabulary on the Web
this vocabulary reflects a shared understanding of the concepts we talk about
of course, URIs do not automatically solve all our problems because, e.g. people can still use different URIs to refer to the same thing
however, URIs are used in the commonly-accessible Web space, thus creating the opportunity to:
identify equivalences among them
migrate toward the use of common references
an XML syntax for RDF: RDF/XML
initial statement:
RDF graph:
Another RDF/XML example
abbreviated
Yet another RDF/XML example
abbreviated
RSS 1.0: RDF Site Summary
aka Really Simple Syndication.
RSS 1.0 – RDF Site Summary
RSS 1.0 (RDF Site Summary) – a lightweight multipurpose extensible metadata description & syndication format
lightweight an XML document
extensible via XML namespaces & RDF
metadata = data about data – descriptive information structured in such a way that allows Web pages to be properly searched & processed in particular by computer
RDF allows for representation of rich metadata
syndication – making data available online for further transmission, aggregation or online publication
an XML application that conforms to RDF specification
a most widely deployed RDF application on the web
packages content into easily distinguishable sections
the feed can be requested by any application able to speak HTTP
ideal for dynamic information: news sites, web logs, sports scores, stock quotes…
an RSS summary is a document describing a channel consisting of URL-retrievable items
RSS
the channel element contains metadata describing the channel itself:
brief description
the rdf:about attribute is a URI which identifies the channel, most commonly:
URL of the homepage being described, or
URL where the RSS file can be found
RSS
an RDF table of contents:
associates the document’s items with the given RSS channel
each item’s rdf:resource {item_uri} must be the same as the associated item element’s rdf:about {item_uri}
RSS
RSS reader
an application which aggregates syndicated web content such as news headlines, blogs, podcasts & video blogs in one location for easy viewing
Demo – Google News
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com