COMP3430_Sem2_2021: generate-student-datasets-rl.py
Skip to main content
Wattle
Side panel
Resources
Timetable
Programs and Courses
ANU Email
ISIS
ANU Policies
Academic skills
ANU Careers
Research & learn (ILP)
ANUSA
PARSA
Tjabal Centre
Health and Wellbeing
Home
Access and Inclusion
Counselling
Dean of Students
Health
Mental health
Safety and security
Library
Home
SuperSearch
Subject Guides
Past exam papers
Search eBrick and Reserve
Wattle Support
Report a fault
Help and guides
English (en)
Deutsch (de)
English (en)
Español – Internacional (es)
Français (fr)
Indonesian (id)
Italiano (it)
Laotian (lo)
Thai (th)
Русский (ru)
عربي (ar)
िहन्दी (hi)
한국어 (ko)
日本語 (ja)
简体中文 (zh_cn)
30
Notifications
You have no notifications
See all
10
Dashboard
Profile
Grades
Messages
Preferences
Log out
COMP3430/COMP8430 – Data Wrangling – Sem 2 2021
Dashboard
My courses
COMP3430_Sem2_2021
Topic 6
generate-student-datasets-rl.py
generate-student-datasets-rl.py
9KB Text file Uploaded 24/08/21, 23:02
Click generate-student-datasets-rl.py link to view the file.
◄ dw_assignment_master_rlgt.csv.gz
Jump to…
Jump to…
Announcements Forum
Discussion Forum
Echo360 Active Learning Platform
Important: On/Off-Campus Declaration
Feedback on Software Setup for Online Access
Welcome from the Course Convener
Learning Expectations
Course Outline
Course Schedule
Course Resources
Setup necessary software to use in practical labs
Lecture 1 slides (PDF) – Overview and course introduction
Recording lecture 1, part 1 (WEBM format)
Recording lecture 1, part 2 (WEBM format)
Recording lecture 1, part 1 (MP4 format)
Recording lecture 1, part 2 (MP4 format)
Lecture 2 slides (PDF) – The data wrangling process and understanding data
Recording lecture 2 (WEBM format)
Recording lecture 2 (MP4 format)
Lecture 3 slides (PDF) – Data extraction and storage, data warehousing
Recording lecture 3, part 1 (WEBM format)
Recording lecture 3, part 2 (WEBM format)
Recording lecture 3, part 1 (MP4 format)
Recording lecture 3, part 2 (MP4 format)
Interactive lecture 1 slides (PDF) – Overview and administrative issues
Week 1 Interactive Lecture
Data Cleaning: Problems and Current Approaches (Rahm and Do, 2000)
For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights ( Times, 2014)
Sample python scripts (for you to get an understanding of the level of Python we use during the practical lab sessions)
Lecture 4 slides (PDF) – Web scraping and geocoding of data
Recording lecture 4, part 1 (WEBM format)
Recording lecture 4, part 2 (WEBM format)
Recording lecture 4, part 1 (MP4 format)
Recording lecture 4, part 2 (MP4 format)
Lecture 5 slides (PDF) – Data quality assessment and data profiling
Recording lecture 5, part 1 (WEBM format)
Recording lecture 5, part 2 (WEBM format)
Recording lecture 5, part 1 (MP4 format)
Recording lecture 5, part 2 (MP4 format)
Lecture 6 slides (PDF) – Resolving data quality issues and data cleaning
Recording lecture 6, part 1 (WEBM format)
Recording lecture 6, part 2 (WEBM format)
Recording lecture 6, part 1 (MP4 format)
Recording lecture 6, part 2 (MP4 format)
Week 2 Interactive Lecture Slides (PDF)
Week 2 Interactive Lecture 2 Interactive Lecture Recording
Data quality in context (Strong, Lee and Wang, 1997)
Quiz 1 (covering material from weeks 1 and 2)
Assignment 1 Specification
dw_assignment_master.csv.gz
generate-student-dataset.py
student-check-codes-assign1.txt
Lab signup (opens Monday 2 August 4 pm)
Lecture 7 slides (PDF) – Data transformation, aggregation and reduction
Recording lecture 7, part 1 (WEBM format)
Recording lecture 7, part 2 (WEBM format)
Recording lecture 7, part 1 (MP4 format)
Recording lecture 7, part 2 (MP4 format)
Lecture 8 slides (PDF) – Data parsing and standardisation
Recording lecture 8, part 1 (WEBM format)
Recording lecture 8, part 2 (WEBM format)
Recording lecture 8, part 1 (MP4 format)
Recording lecture 8, part 2 (MP4 format)
Lecture 9 slides (PDF) – Data pre-processing using Rattle and Python
Recording lecture 9, part 1 (WEBM format)
Recording lecture 9, part 2 (WEBM format)
Recording lecture 9, part 1 (MP4 format)
Recording lecture 9, part 2 (MP4 format)
Week 3 Review Lecture Slides (PDF)
Week 3 Review Lecture Recording
Week 3 Review Lecture Demo
Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations (Krishnan, Haas, Franklin and Wu, 2016) (copy)
Lab 1 (week 3) Specification
Lecture 10 slides (PDF) – Overview of data integration
Additional material for lecture 10: WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis (Bellare et al., 2013)
Recording lecture 10, part 1 (WEBM format)
Recording lecture 10, part 2 (WEBM format)
Recording lecture 10, part 1 (MP4 format)
Recording lecture 10, part 2 (MP4 format)
Lecture 11 slides (PDF) – Schema mapping and matching
Recording lecture 11, part 1 (WEBM format)
Recording lecture 11, part 2 (WEBM format)
Recording lecture 11, part 1 (MP4 format)
Recording lecture 11, part 2 (MP4 format)
Lecture 12 slides (PDF) – Overview of record linkage
Recording lecture 12 (WEBM format)
Recording lecture 12 (MP4 format)
Week 4 Interactive Lecture Slides (PDF)
Week 4 Interactive Lecture Recording
Data matching – Chapters 1 and 2 (Christen, 2012)
Lab 2 (week 4) Specification
Lecture 13 slides (PDF) – Data cleaning for record linkage and blocking (1)
Recording lecture 13 (WEBM format)
Recording lecture 13 (MP4 format)
Lecture 14 slides (PDF) – Blocking / indexing (2)
Recording lecture 14 (WEBM format)
Recording lecture 14 (MP4 format)
Week 5 Interactive Lecture Slides (PDF)
Week 5 Interactive Lecture Recording
Quiz 2 (covering material from weeks 2 to 5)
Lecture 15 slides (PDF) – Record pair comparison (1)
Recording lecture 15 (WEBM format)
Recording lecture 15 (MP4 format)
Lecture 16 slides (PDF) – Record pair comparison (2)
Recording lecture 16 (WEBM format)
Recording lecture 16 (MP4 format)
Week 6 Interactive Lecture Recording
Interactive lecture 6 slides 2021
Lab 3 (week 6) Specification
comp3430_comp8430-reclink-lab-3-6.zip (Data sets and Python record linkage programs)
SLK-581 guide for usage
Example solution blocking.py
Lab 3 slides
Assignment 1 submission
General marking feedback assignment 1
Assignment 2 Specification
dw_assignment_master2.csv.gz
generate-student-dataset2.py
student-check-codes-assign2.txt
Education data set description
Assignment 3 Specification
dw_assignment_master_rl1.csv.gz
dw_assignment_master_rl2.csv.gz
dw_assignment_master_rlgt.csv.gz
student-check-codes-assign3.txt
Assignment 4 Specification (
COMP8430 students only)
Adaptive Temporal Entity Resolution on Dynamic Databases (Christen and Gayler, 2013)
Efficient Interactive Training Selection for Large-Scale Entity Resolution (Wang, Vatsalan, and Christen, 2015)
Improving Temporal Record Linkage Using Regression Classification (Hu, Wang, Vatsalan, and Christen, 2017)
Pattern-Mining Based Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage (Christen, Vidanage, Ranbaduge, and Schnell, 2018)
A Scalable and Efficient Subgroup Blocking Scheme for Multidatabase Record Linkage (Ranbaduge, Vatsalan, and Christen, 2018)
Robust Temporal Graph Clustering for Group Record Linkage (Nanayakkara, Christen, and Ranbaduge, 2019)
Secure and Accurate Two-Step Hash Encoding for Privacy-Preserving Record Linkage (Ranbaduge, Christen, and Schnell, 2020)
Lecture 17 slides (PDF) – Record pair classification (1)
Recording lecture 17 (WEBM format)
Recording lecture 17 (MP4 format)
Lecture 18 slides (PDF) – Record pair classification (2)
Recording lecture 18 (WEBM format)
Recording lecture 18 (MP4 format)
Week 7 Interactive Lecture Recording
Interactive lecture 7 slides
Lab 4 (week 7) specification
Example solution comparison.py
Lab 4 slides
Lecture 19 slides (PDF) – Record linkage evaluation (1)
Recording lecture 19, part 1 (WEBM format)
Recording lecture 19, part 2 (WEBM format)
Recording lecture 19, part 1 (MP4 format)
Recording lecture 19, part 2 (MP4 format)
Lecture 20 slides (PDF) – Record linkage evaluation (2)
Recording lecture 20 (WEBM format)
Recording lecture 20 (MP4 format)
Week 8 Interactive Lecture Recording
Interactive lecture 8 slides
Quiz 3 (covering material from weeks 6 to 8)
Lab 5 (week 8) specification
Example solution classification.py
Lab 5 slides
Lecture 21 slides (PDF) – Data fusion
Recording lecture 21 (WEBM format)
Recording lecture 21 (MP4 format)
Lecture 22 slides (PDF) – Advanced record linkage techniques
Recording lecture 22 (WEBM format)
Recording lecture 22 (MP4 format)
Lecture 23 slides (PDF) – Privacy aspects in data wrangling and privacy-preserving record linkage
Recording lecture 23 (WEBM format)
Recording lecture 23 (MP4 format)
Week 9 Interactive Lecture Recording
Interactive lecture 9 slides
Privacy-preserving record linkage using Bloom filters
Assignment 2 submission
Lab 6 (week 9) specification
Example solution evaluation.py
Lab 6 slides
Lecture 24 slides (PDF) – Ontology matching
Recording lecture 24 (WEBM format)
Recording lecture 24 (MP4 format)
Lecture 25 slides (PDF) – Wrangling dynamic and spatial data
Recording lecture 25 (WEBM format)
Recording lecture 25 (MP4 format)
Interactive lecture 10
Interactive lecture 10 slides
Lab 7 (week 10) specification
comp3430_comp8430-reclink-lab7-datasets.zip
Python module saveLinkResult.py
Lab 7 slides
Extra lab specification (for those who are intersted)
Python module privacyPreservingRecordLinkage.py
Assignment 3 submission
Assignment 4 submission (COMP8430 students only!)
student-check-codes-assign3.txt ►
COMP3430_Sem2_2021
Participants
Grades
General
Topic 1
Topic 2
Topic 3
Topic 4
Topic 5
Topic 6
Topic 7
Topic 8
Topic 9
Topic 10
Topic 11
Topic 12
Dashboard
Site home
Calendar
My courses
COMP3900_Sem2_2021
COMP3430_Sem2_2021
COMP2610_Sem2_2021
COMP1600_Sem2_2021
Contacts
Messages selected:
1
×
Contacts
0
Settings
Contacts
Requests
0
No contacts
No contact requests
Contact request sent
Personal space
Save draft messages, links, notes etc. to access later.
Delete for me and for everyone else
Block
Unblock
Remove
Add
Delete
Delete
Send contact request
Accept and add to contacts
Decline
OK
Cancel
Starred
()
No starred conversations
Group
()
No group conversations
Private
()
No private conversations
Contacts
Non-contacts
Load more
Messages
Load more
No results
Search people and messages
Privacy
You can restrict who can message you
Accept messages from:
My contacts only
My contacts and anyone in my courses
Notification preferences
General
Use enter to send