Document Processing and the Semantic Web Example Applications Unit Practicalities
COMP3220 — Document Processing and the Semantic Web
Week 01 Lecture 1: Introduction and Overview
Diego Moll ́a
Department of Computer Science Macquarie University
COMP3220 2021H1
Diego Moll ́a
W01L1: Introduction and Overview 1/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Acknowledgement of Country
I would like to acknowledge the traditional custodians of the land where I am located, the Darug and Guringai peoples, and pay my respects to their Elders both past and present.
Diego Moll ́a
W01L1: Introduction and Overview 2/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Welcome to COMP3220!
. . . in which you will learn
how to build software applications that use
1 data mining
2 knowledge about language
to do useful things with documents
with particular emphasis on Web solutions and documents.
Diego Moll ́a
W01L1: Introduction and Overview 3/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Programme
1 Document Processing and the Semantic Web
2 Example Applications
3 Unit Practicalities
Reading
Lecture Notes Unit guide
Diego Moll ́a
W01L1: Introduction and Overview 4/31
Document Processing and the Semantic Web
Example Applications Unit Practicalities
Programme
1 Document Processing and the Semantic Web
2 Example Applications
3 Unit Practicalities
W01L1: Introduction and Overview 5/31
Diego Moll ́a
Document Processing and the Semantic Web
Example Applications Unit Practicalities
Document Processing
Information Overload
A lot of information is available as free text.
The most natural form to write information is through free text.
A great deal of digital information is available as free text.
People can read and understand free text easily.
But it’s very hard for machines to process!
Diego Moll ́a
W01L1: Introduction and Overview 6/31
Document Processing and the Semantic Web
Example Applications Unit Practicalities
Document Processing and the Web
The Web
The Web was initially conceived as a means to hyperlink documents.
Most of the information available on the Web is (still) in the form of free text.
This is what is often called unstructured data.
Why Document Processing for the Web?
1 Web search: We want to find information.
2 Spam filtering: We want to ignore (some) information.
3 Sentiment analysis: We want to classify information.
4 Text mining: We want to discover information.
Diego Moll ́a
W01L1: Introduction and Overview 7/31
Document Processing and the Semantic Web
Example Applications Unit Practicalities
The Semantic Web
Adding Semantics to the Web
Web 1.0: The good, old-fashioned Web.
Web 2.0: The social web.
Web 3.0: The semantic web.
The Semantic Web is about adding meta-data so that machines can process it.
Diego Moll ́a
W01L1: Introduction and Overview 8/31
Programme
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
1 Document Processing and the Semantic Web
2 Example Applications
3 Unit Practicalities
W01L1: Introduction and Overview 9/31
Diego Moll ́a
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Conversational Interfaces
Many platforms offer conversational interfaces where you can talk/write to in plain language.
The aim is to produce a seamless user experience.
Siri (Apple iOS), Google Assistant (Google, Android) are personal digital assistants that, among other things, answer your questions.
Amazon’s Echo and Google Home are products that use a speech interface to provide information and control smart devices.
Diego Moll ́a
W01L1: Introduction and Overview 10/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Web Search
Results to queries asked in current search engines may be enriched with information mined from:
Knowledge sources such as Google’s Knowledge Graph. Text mining based on the characteristics of the query.
Diego Moll ́a
W01L1: Introduction and Overview 11/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Google Search (16 Feb 2021)
Diego Moll ́a
W01L1: Introduction and Overview 12/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Google Search (16 Feb 2021)
Diego Moll ́a
W01L1: Introduction and Overview 13/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Sentiment Analysis
Very often used for analysis of opinions in social media.
Diego Moll ́a
W01L1: Introduction and Overview 14/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
Machine Translation
Deep learning has dramatically improved the quality of machine translation.
Diego Moll ́a
W01L1: Introduction and Overview 15/31
Document Processing and the Semantic Web
Example Applications
Unit Practicalities
The Semantic Web
Berners Lee et al. (2001)
The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
The Semantic Web annotates the contents of Web documents with meaning.
The Semantic Web provides mechanisms to specify meaning and reason with meaning.
Still largely unrealised, but it has developed various technologies that are becoming increasingly useful.
Diego Moll ́a
W01L1: Introduction and Overview 16/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Programme
1 Document Processing and the Semantic Web
2 Example Applications
3 Unit Practicalities
W01L1: Introduction and Overview
17/31
Diego Moll ́a
Document Processing and the Semantic Web Example Applications Unit Practicalities
What This Unit is About
COMP3220 explores the issues involved in building significant text processing applications.
Emphasis on non-interactive natural-language text processing systems.
Emphasis also on text processing relative to the Web.
Programming language: Python.
This unit has the following prerequisites:
COMP2110/COMP249, or COMP2200/COMP257.
Diego Moll ́a
W01L1: Introduction and Overview 18/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Staff
Rolf Schwitter: Unit convenor, lecturer (rolf.schwitter@mq.edu.au). Diego Molla: Lecturer (diego.molla-aliod@mq.edu.au).
Abdus Salam: Tutor (abdus.salam@mq.edu.au).
Diego Moll ́a
W01L1: Introduction and Overview 19/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Delivery
Lectures: Practicals:
Live zoom sessions on Monday 9-11am. Recordings will be available in iLearn.
Register to your 2-hour block.
There are online sessions (via zoom) and in-campus sessions.
See timetables.mq.edu.au/2021/
Please Note
Practicals start from this week.
Diego Moll ́a
W01L1: Introduction and Overview 20/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Web Resources
The unit is available in iLearn http://ilearn.mq.edu.au.
All the administrative material presented in this lecture is also available at this site.
Unit Outline.
Administrative Information. Lecture Notes and recordings. Pointers to Reading.
Other Useful Stuff.
You are expected to keep up-to-date by using iLearn for: Relevant news and information.
Discussions.
Submission of assignments.
Diego Moll ́a
W01L1: Introduction and Overview 21/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Github
Some of the material of this unit is available in a public github repository. https://github.com/COMP3220/2021S1
Lecture notes Practicals Code
If you know how to use git, this will be the best way to make sure you have the latest versions.
git is one of the most popular version control systems.
Search the Web for tutorials and additional information on git.
You can use the github browser interface to download individual files.
Diego Moll ́a
W01L1: Introduction and Overview 22/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Learning Outcomes
1 Explain the main techniques that are used to develop and implement intelligent document processing applications.
2 Describe the functionality of the key components in document processing architectures.
3 Implement text processing applications using a programming language.
4 Apply web technology to document processing.
Diego Moll ́a
W01L1: Introduction and Overview 23/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Learning Outcomes
1 Explain the main techniques that are used to develop and implement intelligent document processing applications.
2 Describe the functionality of the key components in document processing architectures.
3 Implement text processing applications using a programming language.
4 Apply web technology to document processing.
Diego Moll ́a
W01L1: Introduction and Overview 23/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Learning Outcomes
1 Explain the main techniques that are used to develop and implement intelligent document processing applications.
2 Describe the functionality of the key components in document processing architectures.
3 Implement text processing applications using a programming language.
4 Apply web technology to document processing.
Diego Moll ́a
W01L1: Introduction and Overview 23/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Learning Outcomes
1 Explain the main techniques that are used to develop and implement intelligent document processing applications.
2 Describe the functionality of the key components in document processing architectures.
3 Implement text processing applications using a programming language.
4 Apply web technology to document processing.
Diego Moll ́a
W01L1: Introduction and Overview 23/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Textbooks
Weeks 1 to 6 will use (mostly):
“NLTK Book”: Steven Bird, Ewan Klein, Edward Loper. Natural Language Processing with Python — Analyzing Text with the Natural Language Toolkit. http://www.nltk.org/book
“Deep Learning Book”: Fran ̧cois Chollet. Deep Learning with Python. (available in the library).
Dan Jurafsky, James H. Martin. Speech and Language Processing. 3rd ed. draft. https://web.stanford.edu/~jurafsky/slp3/
Weeks 7 to 12 are not based on any textbooks; we’ll put a list of online texts.
Every week there will be assigned readings; these readings are essential.
The web site also has pointers to online resources. Recommendations for additions are welcome.
Diego Moll ́a
W01L1: Introduction and Overview 24/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Assessment
Assessment Components
Assignment 1: 5%, due Week 3.
Assignment 2: 20%, due Week 7.
Assignment 3: 15%, due Week 12.
Exam: 60%, online, during the examination period.
Final Assessment
Your final mark and grade are entirely determined by the sum of marks of the individual assessment tasks.
To pass the unit, the sum of marks must be at least 50% of the total assessment marks.
This unit does not have hurdle assessments.
Diego Moll ́a
W01L1: Introduction and Overview 25/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Practical Assignments
1 Simple Document Processing (5%, due Week 3) Use of pre-packaged tools.
Can be used as a diagnostic test (before census date).
2 Document Processing (20%, due Week 7)
Use of techniques used in commercial and research applications.
Use of real (messy) text data.
3 Semantic Web (15%, due Week 12) Integration of Semantic Web technologies.
Diego Moll ́a
W01L1: Introduction and Overview 26/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Submitting your Assignment
Read the assignment specifications. Submit in iLearn.
Hard deadlines:
10% of the maximum mark off per day of delay (or part thereof ).
Plagiarism
You may discuss but not write together.
Read the Academic Honesty Policy. https://staff.mq. edu.au/work/strategy-planning-and-governance/ university-policies-and-procedures/policies/ academic-honesty
Diego Moll ́a
W01L1: Introduction and Overview 27/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Tentative Lecture Schedule — Diego
1 Python for Text Processing (NLTK Ch 1)
2 Information Retrieval (Manning et al.)
3 Text Classification (NLTK Ch 6)
4 Deep Learning for Text (Chollet, Ch. 2 & 3)
5 Text Sequences (Chollet, Ch. 6)
6 Advanced Deep Learning for Text (lecture notes) (recess) – use this time for working on the assignment
Diego Moll ́a
W01L1: Introduction and Overview 28/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Lecture Schedule — Rolf
7 Semantic Technologies (A Review of the Semantic Web Field)
8 RDF, RDF Schema and SPARQL (RDF Primer, SPARQL at
W3C)
9 DBPedia and Wikidata (Wikipedia and DBPedia: a Comparative Study)
10 Ontologies (OWL Primer)
11 Rule Languages (Applications of Answer Set Programming)
12 Recent Trends in Semantic Technologies (lecture notes)
13 Revision
Diego Moll ́a
W01L1: Introduction and Overview 29/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
Important Things To Do
Print out the lecture notes before attending the lecture. Read the practical exercises before attending the session.
time in the sessions is gold.
Read the online Unit Outline; this is your “contract”.
Schedule an average of 10 hours per week for working on this unit:
As in every 10-credit-point unit.
This includes the mid-semester break.
Diego Moll ́a
W01L1: Introduction and Overview 30/31
Document Processing and the Semantic Web Example Applications Unit Practicalities
What’s Next
Week 1
Python for Text Processing
Workshop: Python and Text Processing
Reading
NLTK Chapter 1
http://docs.python.org/tut/tut.html
Diego Moll ́a
W01L1: Introduction and Overview 31/31