程序代写代做 database graph html 􏰁 IDs

􏰁 IDs
􏰁 data integration
􏰁 URI = Uniform Resource Identifier
􏰁 RDF = Resource Description Framework 􏰁 RDF/XML
􏰁 RSS = RDF Site Summary
Names vs. IDs
􏰁 names are meant for humans
􏰁 IDs are meant for computers
􏰁 e.g. the word ‘soap’ is inherently ambiguous
􏰁 humans will have no problems disambiguating it using
the context (human intelligence)
􏰁 computers will also have some success disambiguating
it using the context (artificial intelligence)
􏰁 humans will typically modify the search query adding other relevant terms, but that approach relies on human improvisation
􏰁 one way of systematically tackling the ambiguity problem on the Web is assigning IDs to ‘things’
4/26/2016
Semantic Web
CMT207 Information modelling & database systems
1
Example
􏰁 a software consultant has just received a new project to create a series of SOAP-based Web services
􏰁 they need to learn a bit about SOAP, so they search for the term using a search engine
􏰁 the search results will contain documents about soap operas, toiletries, detergents as well as SOAP-based Web services
􏰁 different semantic associations of the word ‘soap’ → search results will vary in relevance
→ manually sifting through a lot information
Lecture content
􏰁 Semantic Web 􏰁 data
􏰁 resources
Semantic Web
􏰁 conceived by Tim Berners-Lee: “a web of data that can be processed directly & indirectly by machines”
􏰁 data itself becomes part of the Web and can be processed independently of application, platform or domain
􏰁 information is currently shared on the Web in the form of documents
1. computers can search for these documents
2. … but humans have to read & interpret them before any useful information can be extrapolated
Semantic Web
􏰁 does not rely on artificial intelligence, i.e. guesstimates
􏰁 instead it relies on structured information & inference rules that allow it to ‘understand’ the relationship between different data resources
􏰁 the computer does not really understand information the way a human can
􏰁 … but it can be given enough information to make logical connections & decisions rather than to guess
1

Semantics & relationships
􏰁 Semantic Web requires adding semantic metadata (data about data) to information resources
􏰁 semantic metadata allows computers to effectively process the data & make inferences about the data
􏰁 XML has paved the road by adding metadata in the form
Uniform Resource Identifier (URI) 􏰁 resource – anything that has an identity
􏰁 e.g. an electronic document, an image, a service, a collection of other resources…
􏰁 NOTE: not all resources are network retrievable!
4/26/2016
Data integration
􏰁 Semantic Web = a “web of data” that not only harnesses the seemingly endless amount of data, but also connects the data
􏰁 the ability to connect data not only on the Web, but also in relational databases and other types of repositories increases the usability of data available
􏰁 data integration applications for connecting disparate sources, typically require one-to-one mappings between elements (i.e. local IDs) in each data repository
􏰁 Semantic Web supports efficient data integration based on built-in, universally available semantic information that describes each resource (i.e. global IDs)
􏰁 Semantic Web acts as one huge database
Uniform Resource Identifier (URI)
of human-readable tags
􏰁 before XML, data was stored in flat files and databases
with proprietary formats
􏰁 XML made data interoperable within a single domain, i.e.
the domain defined by an XML schema
􏰁 XML provides syntactic interoperability only when both
parties know & understand the element names used
􏰁 Semantic Web requires a universal (or global) schema
􏰁 e.g. human beings, corporations & books in a library can also be considered resources
􏰁 resource = the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping
􏰁 identifier = an object that can act as a reference to something that has identity
Semantic Web standards & technologies
􏰁 the very minimum needed to enable the Semantic Web includes the means of:
1. uniquely identifying resources
2. defining relationships between them
􏰁 these requirements are addressed by using:
1. URI = Uniform Resource Identifier
2. RDF = Resource Description Framework
􏰁 anofficialW3Crecommendation,RDFisanXML-based
standard for describing resources
􏰁 RDFbuildsonexistingXMLandURItechnologies,using a URI to identify every resource & using URIs to make statements about resources
Uniform Resource Identifier (URI)
􏰁 the Web is an information space – URIs are the points in
that space
􏰁 URI is simply a Web identifier
􏰁 a sequence of characters with a restricted syntax
􏰁 e.g. the strings starting with ‘http:’ or ‘ftp:’ found on the
Web
􏰁 URI can be further classified as a locator (URL) or a name (URN)
􏰁 URL (Uniform Resource Locator) – identifies a resource via a representation of its primary access mechanism
􏰁 URN (Uniform Resource Name) – persistent labelling of a resource with a globally unique identifier
2

4/26/2016
Example URIs
􏰁 ftp://ftp.is.co.za/rfc/rfc1808.txt
􏰁 gopher://spinaltap.micro.umn.edu/00/Weather/Califor
nia/Los%20Angeles
􏰁 http://www.math.uio.no/faq/compression- faq/part1.html
􏰁 mailto:mduerst@ifi.unizh.ch
􏰁 news:comp.infosystems.www.servers.unix
􏰁 telnet://melvyl.ucop.edu/
URI syntax
􏰁 http://meyerweb.com/eric/tools/dencoder/
URI syntax
􏰁 a restricted set of characters: digits, letters & a few graphic symbols
􏰁 reserved characters – their usage within the URI component is limited to their reserved purpose
􏰁 reserved=”;”|”/”|”?”|”:”|”@”|”&”|”=”|”+”| “$” | “,”
􏰁 unreserved characters – allowed in a URI but without a reserved purpose
􏰁 upper & lower case letters, decimal digits & a limited set of punctuation marks and symbols
􏰁 unreserved = alphanum | mark
􏰁 mark=”–” |”_”|”.”|”!”|”~”|”*”|”‘”|”(“|”)”
URI syntax: excluded URI characters
􏰁 the reasons for exclusion of some US-ASCII characters
1. control characters in the US-ASCII coded character set are not allowed, both because they are non-printable and are likely to be misinterpreted
2.
􏰁 control =
space character may disappear or be introduced when transcribed, typeset or word-processed
􏰁 whitespace is also used to delimit URI in many contexts
􏰁 space =
URI syntax
􏰁 character must be escaped if it does not have a representation using an unreserved character
􏰁 i.e. does not correspond to a printable character of the US-ASCII coded character set
􏰁 … or corresponds to any US-ASCII character that is not allowed in an URI, as explained previously
􏰁 escaped octet – encoded as a character triplet, consisting of the percent character “%” followed by two hexadecimal digits
􏰁 e.g. “%20″ = the US-ASCII space character, i.e. ” ”
􏰁 escaped = “%” hex hex
􏰁 hex=digit|”A”|”B”|”C”|”D”|”E”|”F”|”a”|”b”| “c” | “d” | “e” | “f”
URI syntax: excluded URI characters
3. delims=”<"|">“|”#”|”%”|<">
􏰁 “<" and ">” and (“) are often used as the delimiters
around URI in text documents & protocol fields
􏰁 “#” is used to delimit a URI from a fragment identifier in URI references
􏰁 “%” is used for the encoding of escaped characters
4. unwise=”{“|”}”|”|”|”\”|”^”|”[“|”]”|”`”
􏰁 other characters are excluded because gateways and other transport agents are known to sometimes modify such characters, or they are used as delimiters
3

URI scheme
􏰁 ://?
􏰁 the top level of a URI
􏰁 most schemes were originally designed to be used with a particular protocol, and often have the same name
􏰁 … but URI schemes are not protocols themselves!
4/26/2016
URI syntactic components
􏰁 the URI syntax is dependent upon the scheme
􏰁 :
􏰁 scheme-specific-part does not have to have any general structure or set of semantics common among all URIs
􏰁 however, a subset of URIs do share a common syntax for representing hierarchical relationships within the namespace
􏰁 ://?
URI path
􏰁 ://?
􏰁 contains data, specific to the authority (or the scheme if there is no authority component), identifying the resource within the scope of that scheme and authority
􏰁 the path may consist of a sequence of path segments separated by a single slash “/” character
􏰁 each path segment may include a sequence of parameters, indicated by the semicolon “;” character
􏰁 the parameters are not significant to the parsing of relative references
URI query
􏰁 ://?
􏰁 a string of information to be interpreted by the resource 􏰁 Demo…
􏰁 e.g. the http scheme is mainly used for interacting with Web resources using HyperText Transfer Protocol
􏰁 … but URIs within the http scheme are also used for other purposes, e.g. RDF resource identifiers and XML namespaces, which are not related to the protocol
􏰁 some schemes are not associated with any protocol (e.g. file) or do not use the name of a protocol as their prefix (e.g. news)
􏰁 URI schemes should be registered with IANA (Internet Assigned Numbers Authority)
URI authority
􏰁 ://?
􏰁 naming authority: the namespace defined by the
remainder of the URI is governed by that authority
􏰁 typically defined by an Internet-based server or a
scheme-specific registry of naming authorities 􏰁 authority = server | reg_name
􏰁 preceded by a double slash “//”, terminated by the next slash “/”, question-mark “?” or the end of the URI
Resource Description Framework (RDF)
4

4/26/2016
Resource Description Framework (RDF)
􏰁 a standard model for data interchange on the Web
􏰁 extends the linking structure of the Web
􏰁 RDF lets us use URIs to make statements about resources using triples: (subject, predicate, object)
􏰁 uses URIs to name the triple elements, i.e. subject, predicate & object
RDF example
􏰁 RDF statements may be represented as graphs 􏰁 nodes: subject & object
􏰁 arc (edge): predicate
􏰁 labelled directed graph: arcs have labels & point in a specific direction, from subject to object
http://www.example.org/index.html
http://www.example.org/terms/creation-date http://www.example.org/terms/language
August 16, 1999 http://purl.org/dc/elements/1.1/creator English http://www.example.org/staffid/85740
RDF example
􏰁 English statement:
􏰁 http://www.example.org/index.html has a creator
whose value is John Smith 􏰁 RDF statement:
􏰁 subject http://www.example.org/index.html
􏰁 predicate http://purl.org/dc/elements/1.1/creator 􏰁 object http://www.example.org/staffid/85740
􏰁 NOTE: URIs instead of names such as ‘creator’ & ‘John Smith’!
RDF: URIs vs. literals
􏰁 RDF permits the objects of statements to be constant values (called literals) represented by character strings
􏰁 … but not the subjects or predicates!
􏰁 literals are used to identify values such as numbers &
dates by means of a lexical representation
􏰁 resources (URIs) vs. values (literals)
􏰁 URIs are shown as ellipses, literals are shown as boxes
􏰁 anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals
RDF statements
􏰁 RDF statements are similar to a number of other formats for recording information, e.g.
􏰁 rows in a simple relational database 􏰁 XML elements in an XML document 􏰁 simple assertions in formal logic
􏰁 etc.
􏰁 information in these formats can be treated as RDF statements
􏰁 different formats allows RDF to be used as a unifying model for integrating data from many sources
RDF: URIs vs. literals
􏰁 example: easier to use the literal “7” than the URI http://dbpedia.org/resource/7_(number)
􏰁 literals are usually abstract values & describing them in most cases is not necessary nor practical
􏰁 literals are end nodes in an RDF graph that do not branch out
􏰁 similar to the OO model: URI ≈ object, literal ≈ object property, which usually belongs to a primitive data type, which is represented by a single value
5

RDF: URIs
􏰁 using URIs in RDF statements allows us to begin to develop & use a controlled vocabulary on the Web
􏰁 this vocabulary reflects a shared understanding of the concepts we talk about
􏰁 of course, URIs do not automatically solve all our problems because, e.g. people can still use different URIs to refer to the same thing
􏰁 however, URIs are used in the commonly-accessible Web space, thus creating the opportunity to:
􏰁 identify equivalences among them
􏰁 migrate toward the use of common references
RDF/XML
􏰁 an XML syntax for RDF: RDF/XML 􏰁 initial statement:
􏰁 RDF graph:
􏰁 triple:
􏰁 RDF/XML:
Another RDF/XML example
Yet another RDF/XML example
1
RSS 1.0 – RDF Site Summary
􏰁 RSS 1.0 (RDF Site Summary) – a lightweight multipurpose extensible metadata description & syndication format
􏰁 lightweight → an XML document
􏰁 extensible → via XML namespaces & RDF
􏰁 metadata = data about data – descriptive information structured in such a way that allows Web pages to be properly searched & processed in particular by computer
􏰁 RDF allows for representation of rich metadata
􏰁 syndication – making data available online for further transmission, aggregation or online publication
abbreviated
RSS 1.0: RDF Site Summary AKA Really Simple Syndication.
4/26/2016
6

4/26/2016
RSS example
RSS
􏰁 an XML application that conforms to RDF specification
􏰁 a most widely deployed RDF application on the web
􏰁 packages content into easily distinguishable sections
􏰁 the feed can be requested by any application able to speak HTTP
􏰁 ideal for dynamic information: news sites, web logs, sports scores, stock quotes…
􏰁 an RSS summary is a document describing a channel consisting of URL-retrievable items
RSS
􏰁 the channel element contains metadata describing the channel itself:
􏰁 title
􏰁 brief description 􏰁 URL link
􏰁 the rdf:about attribute is a URI which identifies the channel, most commonly:
􏰁 URL of the homepage being described, or 􏰁 URL where the RSS file can be found
RSS reader
􏰁 an application which aggregates syndicated web content such as news headlines, blogs, podcasts & video blogs in one location for easy viewing
􏰁 Demo – Google News
RSS
􏰁 an RDF table of contents:




􏰁 associates the document’s items with the given RSS channel
􏰁 each item’s rdf:resource {item_uri} must be the same as the associated item element’s rdf:about {item_uri}
Summary
7