程序代写CS代考 database Today

Today
• What is data linkage, when and why is it needed and by whom?
• What are some challenges? • How to define similarity
• Scalability
• How to merge/group records
• Thanks to
• Dr Ben Rubinstein for use of lecture materials on movies example

Data linkage: what is it?
• We collect information about entities such as people, products, images, songs, …
• Combining, grouping, matching electronic records of the same real- word entities.
• Two hospitals H1 and H2 wants to know if same patients visited both hospitals.
Example from Data Matching book by : security
Match data about people scheduled to fly to Australia by plane, with information across different databases, to identify high risk passengers before boarding. Databases with information such as
• Previous visits to Australia
• Previous visa applications/cancellations • Crime databases …
A famous senator Sen. told the Senate Judiciary Committee in 2004 that he had been stopped and interrogated on at least five occasions as he
attempted to board flights at several different airports. A Bush
administration official explained to the Washington Post that Kennedy
had been held up because the name “T. Kennedy” had become a popular pseudonym among terror suspects.
http://edition.cnn.com/2015/12/07/politics/no-fly-mistakes-cat-stevens-ted-kennedy-john-lewis/

Applications: business
• Two businesses collaborate with each other for a marketing campaign. Need a combined database of individuals to target
• Bob moves into a new home and wishes to be connected to electricity provider. For verification, provider matches the address Bob supplies against its “master” list of street addresses. Not always reliable!
• Online shopping comparison
• Is product X in Store A the same as product Y in Store B?

Data linkage applications – cont.
Src: K. C4.0

Centrelink: “Robo-debt” collection
Herald 10/4/17
• Data matching using Centrelink data and Tax office data
• System checks for “discrepancies” in income
• Example data matching issue
• Welfare recipient reports to Centrelink working for a company with its trading name.
Tax office records show a different registered company name.
• Failure to match between the two names triggered conclusion that some income was not being declared
• Automated notice …

Examples of problematic matching
• Income:
May’16: Maccas $7,000 June’16: Maccas $4,000
ATO Income:
2015-16: McDonald’s $11,000
Discrepancy detected – potential undeclared income  Automated process triggered -> letter to
• A famous senator Sen. told the Senate Judiciary Committee in 2004 that he had been stopped and interrogated on at least five occasions as he attempted to board flights at several different airports. A Bush administration official explained to the Washington Post that Kennedy had been held up because the name “T. Kennedy” had become a popular pseudonym among terror suspects.

Record linkage – terminology
Combine related/equivalent records across sources (sometimes within a single source)
Studied across communities –different terminology • Statistics: Record linkage [Dunn’46]
• Databases: Entity resolution, data integration, deduplication
• Natural Language Processing: coreference resolution, named-entity
recognition
…meaning and scope varies