COMP9321 Data Services Engineering
Term 1, 2021 Week 1: Introduction
2
Source: * https://www.infoq.com/articles/narayanan-soa-data-services
Data Services
• What? Data services are software services that encapsulate operations on key data entities relevant to the consumer
• Why? Data nowadays is stored in multiple systems and require multiple interfaces or mechanisms to interact with them. There are varying channels (e.g., legacy systems, Online, third-party) and mechanisms (e.g., event driven, on demand, batch process) that need to be served as well adding additional challenges to data services. Without an abstraction layer for data consumers that insulate them from this complexity we will end up with a spaghetti of point to point integrations between data sources and data consumers
3
Let’s Go Deeper
Data Recording… The Beginning
4
Source: Wikipedia.org
5
Not that Deep
6
Data-oriented Services …
7
Information Systems/Applications Integration
A set of services and solutions for bringing together disparate application and business processes as needed to meet the diverse information requirements of your customers, partners, suppliers and employees.
Motivations: Streamlining business operations, globalisation, competition, mergers and acquisition, new business models, technology development, etc.
– e.g., merger of two companies (data + processes)
Problems: systems to be integrated are not homogeneous.
• theyareindividuallydeveloped(ad-hoc)systemsovertime
• someare“off-the-shelf”packages
• different execution platforms, technologies and business rules
Heterogeneity at different levels: language, platform, schema (data, process) • Dataintegration,Process/Systemsintegration
8
Picture from Mashups: Concepts, Models and Architectures
Data Level Integration …
Data integration = combining data from different sources and providing users with a unified view over them
Data Level Integration …
9
Picture from View-based Integration, Yannis Katsis, CSE, UC San Diego
10
Picture from Mashups: Concepts, Models and Architectures
Data Level Integration …
11
Picture from Mashups: Concepts, Models and Architectures
Data Level Integration …
System Level Integration …
In enterprise environments, pick any sizeable organisation. You will see many departments performing different functionality
In silos, often supported by software systems
12
A Typical Purchase Order Process
In reality: communication/coordination between the silos needed
13
An example of (real) Purchase Order Process
14
Going outside of your system boundary …
15
The evolution of programming abstractions
Services: “customer” and “service provider”
Lines of code vs. Services – consider software building exercise as ‘building services’, ‘discovering services’ and ‘combining services’
WEB services
Web = platform/language neutral
16
17
The evolution of programming abstractions
In SOA, we talk about software as a service … That is, SOA is about building software systems composed of a collection of (software) services
A software service:
• A software asset that is deployed at an endpoint and is continuously maintained by a provider for user by one or multiple clients
• Services have explicit contracts that establish their purpose and how they should be used • Software services are (supposed to be) reusable (“compose-able”) …
– like lego blocks
– “my” (the developer) service could be used in scenarios that I never anticipated
Simplified view of services (or API ?!)
Service
–
a way of integrating your applications as a set of linked
orientation
more complicated ‘services’
–
services. If you can define the services, you can begin to link the services to
18
realise
19
So again…Why Data Services?
20
Sexy Job
“I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”
“The ability to take data, to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data.”
Hal Varian, Google’s Chief Economist
Data is Popular
21
Data is Popular?
22
Data is Popular
23
24
Data is valuable
Economist.com
25
Data is Massive
“There are 2.5 quintillion bytes (EB) of data created each day at our current pace, but
that pace is only accelerating with the growth of the Internet of Things (IoT).” 1
» 2.7 Zetabytes of data exist in the digital universe today.2
» Facebook stores, accesses, and analyzes 30+ Petabytes of user generated data.3
» Akamai analyzes 75 million events per day to better target advertisements.3
» Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.4
» In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.5
1. Forbes, How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read
2. Wikibon, The Rapid Growth in Unstructured Data
3. Wikibon, Taming Big Data
4. SAS, Big Data Meets Big Data Analytics
5. TechCrunch, Google Processing 20,000 Terabytes A Day, And Growing
Data Every Minute
26 Source: https://www.domo.com/learn/data-never-sleeps-5?aid=ogsm072517_1&sf100871281=1
Where does data come from?
27
28
Where does data come from?
29
The connected world
30
31
Even more data…
32
What can we do with the data?
Life Sciences
Clinical research is a slow and expensive process, with trials failing for a variety of reasons. Advanced analytics, artificial intelligence (AI) and the Internet of Medical Things (IoMT) unlocks the potential of improving speed and efficiency at every stage of clinical research by delivering more intelligent, automated solutions.
Banking
Financial institutions gather and access analytical insight from large volumes of unstructured data in order to make sound financial decisions. Big data analytics allows them to access the information they need when they need it, by eliminating overlapping, redundant tools and systems.
Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html
33
What can we do with the data?
Manufacturing
For manufacturers, solving problems is nothing new. They wrestle with difficult problems on a daily basis – from complex supply chains, to motion applications, to labor constraints and equipment breakdowns. That’s why big data analytics is essential in the manufacturing industry, as it has allowed competitive organizations to discover new cost saving opportunities and revenue opportunities.
Health Care
Big data is a given in the health care industry. Patient records, health plans, insurance information and other types of information can be difficult to manage – but are full of key insights once analytics are applied. That’s why big data analytics technology is so important to heath care. By analyzing large amounts of information – both structured and unstructured – quickly, health care providers can provide lifesaving diagnoses or treatment options almost immediately.
Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html
34
What can we do with the data?
Government
Retail
Customer service has evolved in the past several years, as savvier shoppers expect retailers to understand exactly what they need, when they need it. Big data analytics technology helps retailers meet those demands. Armed with endless amounts of data from customer loyalty programs, buying habits and other sources, retailers not only have an in-depth understanding of their customers, they can also predict trends, recommend new products – and boost profitability.
Certain government agencies face a big challenge: tighten the budget without compromising quality or productivity. This is particularly troublesome with law enforcement agencies, which are struggling to keep crime rates down with relatively scarce resources. And that’s why many agencies use big data analytics; the technology streamlines operations while giving the agency a more holistic view of criminal activity.
Source: https://www.sas.com/en_au/insights/analytics/big-data-analytics.html
Also
Spam/False Information Detection Credit card fraud detection Recommendation systems
Human activity recognition/prediction Machine translation
Face/Scene recognition Image caption Self-driving cars
35
36
Unraveling Power of Deeply Connected World
• Produce a treasure trove of big data
» datathatcanhelpcitiespredictaccidentsandcrimes
• Give doctors real-time insight into information from pacemakers or biochips
» enable optimized productivity across industries through predictive maintenance on equipment and machinery
• Create true smart homes with connected appliances
• Provide critical communication between self-driving cars •…
37
Looks promising…Yet how?
What do you do with all this data?
» Too much data to search through it manually or processing in traditional ways…
But there is valuable information in the data:
» How can we use it for fun, profit, and/or the greater good?
Boosting in computing power helps.
» Machine learning is key tool we use to make sense of very large datasets.
So What is Next?
• In order to build a data service you need to know how to work with data
• Accessing the data from multiple sources
• Cleansing the data (e.g., removing corrupted or useless data)
• Manipulating the data (e.g., merging, transformation, normalization)
• Presenting the data (visualization) 38
39
Useful Reading
• View-based Data Integration, Yannis Katsis (http://db.ucsd.edu/wp-
content/uploads/pdfs/355.pdf)
• Mashups: Concepts, Models and Architectures, Daniel, Florian, Matera, Maristella (https://link.springer.com/book/10.1007%2F978-3-642-55049-2)
40
Q&A