AI Primer (IMDA Publications)
AI PRIMER
(IMDA PUBLICATIONS)
TYPES OF DATA
ARTIFICIAL INTELLIGENCE DESCRIBED ON A SINGLE
CHART
Source: Schulte Research Estimates
Physical
Data
IoT
Digital
Data
Infra.
Neural
Networks:
Machine
Learning
AI
1. Financial Services
2. Cognitive Services
3. Lifestyle/ Health
4. Autonomous cars
5. Robotics
6. Advertising
Cloud
Q
u
a
n
tu
m
C
o
m
p
u
tin
g
5bn
Mobile
Devices
40bn
Sensors
DATA SCIENCE VS. BIG DATA VS. DATA ANALYTICS
Data Science: Dealing with unstructured and structured data, Data Science is a field that comprises
everything that related to data cleansing, preparation, and analysis. Data Science is the combination
of statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, the
ability to look at things differently, and the activity of cleansing, preparing, and aligning the data. In
simple terms, it is the umbrella of techniques used when trying to extract insights and information
from data.
Big Data: Big Data refers to humongous volumes of data that cannot be processed effectively with
the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t
aggregated and is most often impossible to store in the memory of a single computer. The definition
of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity or high-variety
information assets that demand cost-effective, innovative forms of information processing that enable
enhanced insight, decision making, and process automation.”
Data Analytics: the science of examining raw data to conclude that information. Data Analytics
involves applying an algorithmic or mechanical process to derive insights and, for example, running
through several data sets to look for meaningful correlations between each other. It is used in several
industries to allow organizations and companies to make better decisions as well as verify and
disprove existing theories or models. The focus of Data Analytics lies in inference, which is the
process of deriving conclusions that are solely based on what the researcher already knows.
https://www.simplilearn.com/data-science-vs-
big-data-vs-data-analytics-article
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
DATA SCIENCE VS. BIG DATA VS. DATA ANALYTICS
Data Science: Dealing with unstructured and structured
data, Data Science is a field that comprises everything
that related to data cleansing, preparation, and analysis.
Data Science is the combination of statistics,
mathematics, programming, problem-solving, capturing
data in ingenious ways, the ability to look at things
differently, and the activity of cleansing, preparing, and
aligning the data. In simple terms, it is the umbrella of
techniques used when trying to extract insights and
information from data.
https://www.simplilearn.com/data-science-vs-
big-data-vs-data-analytics-article
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
DATA SCIENCE VS. BIG DATA VS. DATA ANALYTICS
Big Data: Big Data refers to humongous volumes of data that
cannot be processed effectively with the traditional
applications that exist. The processing of Big Data begins with
the raw data that isn’t aggregated and is most often impossible
to store in the memory of a single computer. The definition of
Big Data, given by Gartner, is, “Big data is high-volume, and
high-velocity or high-variety information assets that demand
cost-effective, innovative forms of information processing that
enable enhanced insight, decision making, and process
automation.”
https://www.simplilearn.com/data-science-vs-
big-data-vs-data-analytics-article
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
DATA SCIENCE VS. BIG DATA VS. DATA ANALYTICS
Data Analytics: the science of examining raw data to conclude
that information. Data Analytics involves applying an
algorithmic or mechanical process to derive insights and, for
example, running through several data sets to look for
meaningful correlations between each other. It is used in
several industries to allow organizations and companies to
make better decisions as well as verify and disprove existing
theories or models. The focus of Data Analytics lies in
inference, which is the process of deriving conclusions that are
solely based on what the researcher already knows.
https://www.simplilearn.com/data-science-vs-
big-data-vs-data-analytics-article
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
DATA
STRUCTURE
STRUCTURED VS.
UNSTRUCTURED
DATA
https://www.prowebscraper.com/blog/str
uctured-vs-unstructured-data-best-thing-
you-need-to-know/
https://www.prowebscraper.com/blog/structured-vs-unstructured-data-best-thing-you-need-to-know/
WHAT IS
STRUCTURED
DATA?
Structured data refers to any data that resides in
a fixed field within a record or file. This includes
data contained in relational databases and
spreadsheets.
Structured data Examples :
Meta-data (Time and date of creation, File
size, Author etc.)
Library Catalogues (date, author, place,
subject, etc)
Census records (birth, income, employment,
place etc.)
Economic data (GDP, PPI, ASX etc.)
Facebook like button
Phone numbers (and the phone book)
Databases (structuring fields)
WHAT IS
UNSTRUCTURE
D DATA?
Unstructured data (or unstructured information)
is the kind of information that either does not
have a predefined data model or is not
organized in a pre-defined manner.
Unstructured data examples are as follows :
Text files (Word processing, spreadsheets,
presentations etc.)
Email body
Social Media ( Data from Facebook, Twitter,
LinkedIn)
Website (YouTube, Instagram, photo sharing
sites )
Mobile data ( Text messages )
Communications ( Chat, Instant Messaging,
phone recordings, collaboration software )
Media ( MP3-MPEG Audio Layer-3-
compressed file, digital photos, audio and video
files )
COMPUTER OR MACHINE-GENERATED
Machine Generated Structured Data sources Machine Generated Unstructured Data sources
Sensor data: When you talk about radio frequency ID tags,
smart meters, medical devices, and Global Positioning System
data, you are basically referring to machine generated structured
data. Supply chain management and inventory control is what
gets the companies interested in this.
Satellite images: When you take into consideration the weather
data or the data that government agencies procure through its
satellite surveillance imagery, you are talking about machine
generated unstructured data. Google Earth and similar
mechanisms aptly illustrate the point.
Web log data: When systems and mechanisms such as servers,
applications and networks etc. work, they soak in different types
of data regarding whatever is the operation. It means enormous
piles of data of diverse kinds. Based on this data, you can deal
with service-level agreements or predict security breaches.
Scientific data: All scientific data that includes seismic imagery,
atmospheric data and high energy Physics so and so forth stand
for machine generated unstructured data.
Point-of-sale data: When the digital transactions take place
over the counter of a shopping mall, the machine captures a lot
of data. This is machine generated structured data related to
barcode and other relevant details of the product etc.
Photographs and video: When machines capture images and
video for the purposes of security, surveillance and traffic, the
data that is produced is machine generated unstructured data.
Financial data: Computer programs are used with respect to
financial data a lot more now. Processes are automated with the
help of these programs. Take the case of stock-trading. It carries
structured data such as the company symbol and dollar value. A
part of this data is machine generated and some of it is human
generated.
Radar or sonar data: This includes vehicular, meteorological,
and oceanographic seismic profiles.
HUMAN-GENERATED
Human Generated Structured Data sources Human Generated Unstructured Data sources
Input data: When a human user enters input such as
name, age, income, non-free-form survey responses etc.
into a computer, it is human generated structured data.
Companies can find this type of data quite useful in
studying customer behavior.
Text internal to your company: This is the type of data
that is restricted to a given company such as documents,
logs, survey results, emails etc. Such enterprise
information forms a big part of such unstructured text
information in the world.
Clickstream data: This is the type of data generated
through the process of a user clicking a link on a website.
Businesses like this type of data because it allows them to
study customer behavior and purchase patterns.
Social media data: This kind of data is generated when
human users interact with social media platforms such as
Facebook, Twitter, Flickr, YouTube, LinkedIn etc.
Gaming-related data: When a human user makes a
move in a game on a virtual platform, it produces a piece
of information. How users navigate a gaming portfolio is
a source of a lot of interesting data.
Mobile data: This type of data includes information such
as text messages and location information.
Website content: This type of data is derived from a site
delivering unstructured content such as YouTube, Flickr,
Instagram etc.
CHARACTERISTICS
Structured data Unstructured data
Flexibility
Schema dependent rigorous
schema
Absence of schema, Very
flexible
Scalability
Scaling DB schema is
difficult
Highly scalable
Robustness Robust –
Query Performance
Structured query allows
complex joins
Only textual query possible
Accessibility Easy to access Hard to access
Availability Percentage wise lower Percentage wise higher
Association Organized Scattered and dispersed
Analysis Efficient to analysis
Additional preprocessing is
needed
Appearance Formally defined Free- From
STRUCTURED DATA STORAGE TECHNIQUE
This type of data storage is used in the context of storage-area network (SAN)
environments. In such environments, data is stored in volumes which is also referred to as
blocks.
An arbitrary identifier is assigned to every block. It allows the block to be stored and
retrieved but there would be no metadata providing further context.
Virtual machine file system volumes and structured database storage are the use cases of
block storage.
When it comes to block storage, raw storage volumes are created on the device. With the
aid of a server-based system, the volumes are connected and each one is treated as an
individual hard drive.
UNSTRUCTURED
DATA STORAGE
TECHNIQUE
This particular technique is basically a way of storing,
organizing and accessing data on disk. The difference
however is that it is done so in a more scalable and cost-
effective manner.
This kind of storage system makes it possible to retain
huge volumes of unstructured data. When it comes to
storing photos on Facebook, songs on Spotify, or files in
collaboration services such as Dropbox, object storage
come into play.
Each object incorporates data, a lot of metadata and a
singularly unique identifier. This kind of storage can be
done at different levels such as device level, system level
and interface level.
Since objects are robust, this kind of storage works well
for long-term storage of data archives, analytics data and
service provider storage with SLAs (Service-level
agreement) linked with data delivery.
https://www.prowebscraper.c
om/blog/structured-vs-
unstructured-data-best-thing-
you-need-to-know/
Source for the last 9 slides.
https://www.prowebscraper.com/blog/structured-vs-unstructured-data-best-thing-you-need-to-know/
8 VITAL ALTERNATIVE DATA TYPES
App Usage: behavioural data from purchase, etc
Credit/Debit Card: Buying patterns and choices
Geo-Location: Tracking Wi-Fi or Bluetooth beacons
Public Data: Patents, Government Contracts, Import/Export data, etc
Satellite: Satellite feed and low-level drones for supply chain, tracking agriculture yields
and oil and gas storage, etc
Social or Sentiment: Social media, news, management communications, comments, shares,
likes on social media.
Web Data: data scrapped from websites for product descriptions, flight bookings, real
estate listing, etc
Web Traffic: Demographics of visitors visiting a particular website for travel bookings and e-
commerce as examples.
THE 8 V’S OF
BIG DATA
FROM 3 V’S
(1,5,6)
https://www.m-brain.com/home/technology/big-data-
with-8-vs/
https://www.m-brain.com/home/technology/big-data-with-8-vs/
https://www.educba.com/small-data-vs-big-data/
STORAGE
DISTRIBUTED VS CENTRALIZED NETWORKS FOR STORAGE
Centralized data networks are those that maintain all the data in a single computer, location
and to access the information you must access the main computer of the system, known as
“server”.
On the other hand, a distributed data network works as a single logical data network,
installed in a series of computers (nodes) located in different geographic locations and that
are not connected to a single processing unit, but are fully connected among them to
provide integrity and accessibility to information from any point. In this system all the nodes
contain information and all the clients of the system are in equal condition. In this way,
distributed data networks can perform autonomous processing. The clear example is the
blockchain, but there are others such as Spanner, a distributed database created by
Google.
https://icommunity.io/en/redes-centralizadas-vs-
distribuidas/ Source for the next six slides
ADVANTAGES AND
DISADVANTAGES OF
CENTRALIZED, DECENTRALIZED
AND DISTRIBUTED DATA
NETWORKS.
Centralized and distributed
networks have different
characteristics and also have
different advantages and
disadvantages. For example,
centralized networks are
the easiest to maintain since
they have only one point of
failure, this is not the case of
the distributed ones, which in
theory are more difficult to
maintain.
Centralised Decentralised Distributed
BLOCKCHAIN IS A DISTRIBUTED DATA NETWORK
There are other types of distributed data networks besides the blockchain. In fact, the
consensus and the immutability of the data are not unique characteristics of the blockchain,
since there are other distributed data networks that also have these characteristics, such as:
Paxos, Raft, Google HDFS, Zebra, CouchDB, Datomic, among other.
But there are two characteristics that really differentiate the blockchain from the rest of the
data networks: the access control for writing and reading data is truly decentralized, unlike
other distributed data networks where it is logically centralized; and the ability to secure
transactions as there is no need for trusted third parties in a competitive environment.
The blockchain has unique characteristics over the rest of the available data networks.
However, this does not mean that for all possible cases of data storage, the best option is
always to use the blockchain. This really depends on the needs and requirements of a
company or organization when using a database.
COMPARATIVE SUMMARY
1. Security:
CENTRALIZED: If someone has access to the server with the information, any data can be added,
modified and deleted.
DISTRIBUTED: All data is distributed between the nodes of the network. If something is added, edited
or deleted in any computer, it will be reflected in all the computers in the network. If some legal
amendments are accepted, new information will be disseminated among other users throughout the
network. Otherwise, the data will be copied to match the other nodes. Therefore, the system is self-
sufficient and self-regulating. The databases are protected against deliberate attacks or accidental
changes of information.
2. Availability:
CENTRALIZED: If there are several requests, the server can break down and no longer respond.
DISTRIBUTED: Can withstand significant pressure on the network. All the nodes in the network have the
data. Then, the requests are distributed among the nodes. Therefore, the pressure does not fall on a
computer, but on the entire network. In this case, the total availability of the network is much greater
than in the centralized one.
COMPARATIVE SUMMARY
3. Accessibility:
CENTRALIZED: If the central storage has problems, you will not be able to obtain your
information unless the problems are solved. In addition, different users have different needs,
but the processes are standardized and can be inconvenient for customers.
DISTRIBUTED: Given that the number of computers in the distributed network is large, DDoS
attacks are possible only in case their capacity is much greater than that of the network. But that
would be a very expensive attack. In a centralized model, the response time is very similar in
this case. Therefore, it can be considered that distributed networks are secure.
COMPARATIVE SUMMARY
4. Data transfer rates:
CENTRALIZED: If the nodes are located in different countries or continents, the connection with
the server can become a problem.
DISTRIBUTED: In distributed networks, the client can choose the node and work with all the
required information.
5. e-Scalability:
CENTRALIZED: Centralized networks are difficult to scale because the capacity of the server is
limited and the traffic can not be infinite. In a centralized model, all clients are connected to the
server. Only the server stores all the data. Therefore, all requests to receive, change, add or
delete data go through the main computer. But server resources are finite. As a result, he is able
to carry out his work effectively only for the specific number of participants. If the number of
clients is greater, the server load may exceed the limit during the peak time.
DISTRIBUTED: Distributed models do not have this problem since the load is shared among
several computers.
ARTIFICIAL
INTELLIGENCE
TECHNOLOGY
STUDY – AI
PRIMER
Definition of AI
History of AI
• Symbolic AI
• Machine Learning
• Deep Learning
Primer Knowledge of AI
• Machine Learning
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning
• Deep Learning
DEFINITION
OF AI
AI originated more than 50 years ago, and it is generally
agreed that John McCarthy coined the phrase “artificial
intelligence” in a written proposal for a workshop in
Dartmouth in 1956. AI is now commonly understood as
the study and engineering of computations that make it
possible to perceive, reason, act, learn and adapt .
In the widely referenced book, “Artificial Intelligence: A
Modern Approach ”, Dr Stuart Russell and Dr Peter
Norvig define AI as:
“The study of agents that receive percepts from the
environment and perform actions.”
DEFINITIONS
AND NOT
DEFINITION
The various definitions of AI, laid out along two
dimensions are also discussed.
The definitions on top are concerned with thought
processes and reasoning, whereas the ones on the
bottom address behaviour.
The definitions on the left measure success in terms
of fidelity to human performance, whereas the ones
on the right measure against an ideal performance
measure, rationality:
• SYMBOLIC AI
Early in the 1940s and 1950s, a handful of scientists
from a variety of fields, including mathematics,
psychology, engineering, economics, and political
science, began to discuss the possibility of creating
an artificial brain.
The term “Artificial Intelligence” was coined at a
Dartmouth conference and AI research was founded
as an academic discipline in 1956.
At the early stage, teaching machine how to play
chess was one of the main research focuses on AI.
Chess has its playing rules, many experts in AI
believed that AI could be achieved by having
programmers handcraft a sufficiently large set of
explicit rules for manipulating knowledge, these rules
are human-readable representations of problems
and logics.
This is known as “Symbolic AI”, and it was the
dominant paradigm in AI from the 1950s to the late
1980s. Figure 2 illustrates how Symbolic AI works.
ILLUSTRATION OF SYMBOLIC AI
Symbolic AI reached its peak popularity during the “Expert Systems” booms
of the 1980s.
Expert systems are a logical and knowledge-based approach.
Their power came from the expert knowledge they contained, but it also
limited the further development of expert systems.
The knowledge acquisition problem, knowledge base increasing and updates
issues are all the major challenges of expert systems.
A new type of AI approaches became required over rule-based technologies
at that time.
• MACHINE
LEARNING
Machine learning, reorganized as a subset field of AI,
started to flourish in the 1990s.
Different from Symbolic AI, machine learning does
not require humans to know the existing rules.
It arises from this question: “could a computer go
beyond “what we know how to order it to
perform” (Symbolic AI), and learn on its own how
to perform a specified task?”
Based on this, with machine learning, humans input
data as well as the expected answers from the data,
and the machine will “learn” by itself and outcome
the rules.
These learned rules can then be applied to new data
to produce new answers.
Figure 3 in the next page illustrates the simple
structure of machine leaning.
ILLUSTRATION OF MACHINE LEARNING
(FRANCOIS, 2017)
THE DIFFERENCE
Starting from the 1990s, AI changed its goal from achieving AI
to tackling solvable problems of a practical nature.
It shifted focus away from the symbolic approaches it had
inherited from AI, and toward methods and models borrowed
from statistics and probability theory (Langley 2011).
• DEEP
LEARNING
Following a series of ups and downs, often
referred to as “AI summers and winters”, as
interest in AI has alternately grown and
diminished.
This is illustrated in Figure 4. In this
evolution roadmap, we can see AI is a
general field, which covers machine
learning.
Deep learning is one hot branch of
machine learning, which is also the symbol
of the current AI boom, about eight years
ago.
EVOLUTION OF AI
LEARNING – MACHINE VERSUS DEEP
Compared to machine learning, deep learning automates the feature engineering of the
input data (the process of learning the optimal features of the data to create the best
outcome), and allows algorithms to automatically discover complex patterns and
relationships in the input data.
Deep learning is based on Artificial Neural Networks (ANNs), which were inspired by
information processing and distributed communication nodes in biological systems, like
the human brain.
Figure 5 simply shows the information processing framework of human brain and ANNs.
ANN imitates the human brain’s process through using multiple layers to progressively
extract different levels of feature/interpretation from raw input data (each hidden layer
represents one feature/interpretation of the data).
In essence, deep learning algorithms “learn how to learn”.
ALTHOUGH AI
RESEARCH
STARTED IN THE
1950S, ITS
EFFECTIVENESS
AND PROGRESS
HAVE BEEN MOST
SIGNIFICANT
OVER THE LAST
DECADE, DRIVEN
BY THREE
MUTUALLY
REINFORCING
FACTORS:
The availability of big data: various data
sources including businesses, e-commerce,
social media, science, wearable devices,
government, etc.
Dramatically improvement of machine
learning algorithms: the sheer amount of
available data accelerates algorithm
innovation.
More powerful computing ability and cloud-
based services: make it possible to realize and
implement the advanced AI algorithms, like
deep neural networks.
Significant progress in algorithms, hardware, and big data technology,
combined with the financial incentives to find new products, have also
contributed to the AI technology renaissance.
Today, AI has transformed from “let the machine know what we know” to
“let the machine learn what we may don’t know” to “let the machine
automatically learn how to learn”.
Researchers are working on much wider applications of AI that will
revolutionize the ways in which people work, communicate, study and enjoy
ourselves.
Products and services incorporating such innovation will become part of
people’s day-to-day lives in the near future
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
Dendrites are the segments
of the neuron that receive
stimulation in order for the
cell to become active. They
conduct electrical messages
to the neuron cell body for
the cell to function.
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
Axon, also called nerve fibre,
portion of a nerve cell (neuron)
that carries nerve impulses
away from the cell body. A
neuron typically has one axon
that connects it with other
neurons or with muscle or
gland cells. Some axons may
be quite long, reaching, for
example, from the spinal cord
down to a toe.
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
The function of the synapse is
to transfer electric activity
(information) from one cell to
another. The transfer can be
from nerve to nerve (neuro-
neuro), or nerve to muscle
(neuro-myo). The region
between the pre- and
postsynaptic membrane is very
narrow, only 30-50 nm.
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
HUMAN BRAIN
AND
ARTIFICIAL
NEURAL
NETWORKS
ACTIVATION
Activation functions are
mathematical equations
that determine the output
of a neural network.
The function is attached to
each neuron in the network,
and determines whether it
should be activated
(“fired”) or not, based on
whether each neuron’s input
is relevant for the model’s
prediction.
A cost function is then adopted to measure the “error”, that
is the difference between true output value and the predicted
output value.
It basically judges how wrong or bad the learned model is in
its current form.
The ideal goal is to have zero cost.
Usually, a minimum cost value is set as a stop criteria.
FEED-FORWARD AND
BACKPROPAGATION LEARNING
1. BACKPROPAGATION
2. COST FUNCTION = ERROR
3. ADJUST THE WEIGHTS
4. MINIMISE THE COST
FUNCTION
5. GET THE “OPTIMUM”
WEIGHTS LAYER BY LAYER
ERROR,
BACKPROPAGA
TION,
GRADIENT
DESCENT
After getting the “error”, the backpropagation process
is followed to reduce the current error cost.
Backpropagation is to tweak the weights of the
previous layer, which aims to get the value we want in
the current layer.
We do this recursively throughout however many
layers are in the network.
Gradient Descent is usually used to tweak the weights.
It is a first-order iterative optimization algorithm for
finding the global minimum of the cost function.
In general, when we adjust the current weight, we
move to the left or right of the current value location,
and we can figure out which direction produces a
slope with a lower value than the current value, and we
can take a small step in that direction and then try
again (Figure 9).
GRADIENT
DESCENT
FORWARD AND
BACKWARD
Feed-forward and backpropagation is a cycle
learning process.
We need to repeat, maybe thousands even
millions of times before we can find the
global minimum value of the cost function.
Once a neural network is trained, it may be
used to analyze new data.
That is, the practitioner stops the training and
allows the network to function in forward
propagation mode only.
The forward propagation output is the
predicted model used to interpret and make
sense of previously unknown input data.
DEMO 1: LET US DO A HANDS-ON ON NEURAL
NETWORK
Open the file Demo 1 Neural Network_David Lee
Watch the Videos
FURTHER ON
DEEP LEARNING
PRIMER
KNOWLEDGE
OF AI
To further understand how current AI works, we
will introduce the primer knowledge of deep
learning in this section.
As machine learning is the base of deep
learning, so a general introduction of some
basic knowledge about machine learning will be
shown first.
MACHINE
LEARNING
Machine learning involves the creation of
algorithms which can modify/adjust itself
without human intervention to produce desired
output- by feeding itself through input data.
Through this learning process, the machine can
categorize similar people or things, discover or
identify hidden or unknown patterns and
relationships, also can detect anomalous
behaviors in the given data, which allow it to
make predictions/estimations possible
outcomes or actions of future data.
Therefore, to do machine learning, we usually
follow five steps, from data collection, data
preparation to modelling, understanding and
delivering the results (as shown in figure 6).
MACHINE LEARNING WORKFLOW
Step 1 and 2 are data preparation work, it transforms the raw
data into structured data that the machine can read.
For example, to do image classification (“dog” or “cat”), we
should know what kind of image features we need to extract
and how to extract, like texture, edge, and shape.
We call these features as input data, usually represented as a
vector or matrix (𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑛), 𝑥𝑖 is one structured
feature.
And output data is the corresponding label (“dog” or “cat”).
Step 4, and 5 are straightforward and easily understandable.
Step 3 Model building is the key process of machine learning.
The processes machines use to learn are known as algorithms.
Based on different algorithms used at this step, machine
learning can be further categorized into four big types:
supervised learning, unsupervised learning, semi-supervised
learning and reinforcement learning.
• SUPERVISED
LEARNING
The supervised learning algorithm is as its name shown, it
is trained/taught using given examples.
The examples are labelled, means the desired output for
input is known.
For example, a credit card application can be labelled
either as approved or rejected.
The algorithm received a set of inputs (the applicants’
information) along with the corresponding outputs
(whether the application was approved or not) to foster
learning.
The model building or the algorithm learning is a process
to minimize the error between the estimated output and
the correct output.
Learning stops when the algorithm achieves an
acceptable level of performance, such as the error is
smaller than the pre-defined minimum error.
The trained algorithm is then applied to unlabeled data
to predict the possible output value, such as whether a
new credit card application should be approved or not.
This is helpful to what we are familiar with, called Know
Your Customer (KYC) in bank business.
There are multiple supervised learning algorithms, Bayesian statistics,
regression analysis, decision trees, random forests, support vector
machines (SVM), ensemble models and so on.
Practical applications include risk assessment, fraud detection, image,
speech and text recognition, etc.
• UNSUPERVISED
LEARNING
Different from supervised learning, in unsupervised learning,
the algorithm is not trained/taught on the “right answer”. The
algorithm tries to explore the given data and detect or mine
the hidden patterns and relationships within the data. In this
case, there is no answer key. Learning is based on the
similarity/distance among the given data points.
Take bank customer understanding as an example,
unsupervised learning can be used to identify several
groups of bank customers. The customers in a specific
group are with similar demographic information or same
bank product selections. The learned homogenous groups
can help the bank to figure out the hidden relationship
within the customer’s demographics and their bank
products selection.
This would provide useful insights on customer targeting
when the bank would like to promote a product to new
customers. Also, unsupervised learning works well with
transactional data in that it can be used to identify a group
of individuals with similar purchase behavior who can then
be treated as a single homogenous unit during marketing
promotions.
Association rule mining, clustering like K-means, nearest-neighbour
mapping, self-organizing mapping, dimensionality reduction like
principal component analysis, are all the common and popular
unsupervised learning algorithms.
Practical applications cover market basket analysis, customer
segmentation, anomaly detection and so on.
DEMO 2: LET US DO A HANDS-ON ON K-MEAN AND K-NEAREST
NEIGHBOUR
Open Demo 2 K Mean and K NN_David Lee
Exercise in Data_k Mean_David Lee Exercise (1).xls
Answers in Data_k Mean_David Lee Exercise (2).xls
FURTHER ON
DEEP LEARNING
• SEMI-
SUPERVISED
LEARNING
Semi-supervised learning is used to address
similar problems as supervised learning.
However, in semi-supervised learning, the
machine is provided both labelled and
unlabelled data.
A small amount of labelled data is combined
with a large amount of unlabelled data.
When the cost associated with labelling is too
high to allow for a fully labelled training process,
semi-supervised learning is normally utilized.
Using the labelled data, semi-supervised
learning algorithms first use a large amount of
unlabelled data.
A new model will further be trained using the
new labelled data set.
For example, an online news portal wants to do web
pages classification or labelling.
Let’s say the requirement is to classify web pages into
different categories (i.e. Sports, Politics, Business,
Entertainment, etc.).
In this case, it is prohibitively expensive to go through
hundreds of millions of web pages and manually label
them.
Therefore the intent of semi-supervised learning is to
take as much advantage of the unlabelled data as
possible, to improve the trained model.
Image classification and text classification are good
practical applications of semi-supervised learning.
• REINFORCEMENT
LEARNING
The intent of reinforcement learning is to find the
best actions that lead to maximum reward or drive
the most optimal outcome.
The machine is provided with a set of allowed
actions, rules, and potential end states. In other
words, the rules of the game are defined. By
applying the rules, exploring different actions and
observing resulting reactions the machine learns
to exploit the rules to create the desired outcome.
Thus determining what series of actions, in what
circumstances, will lead to an optimal or optimized
result.
Reinforcement learning is the equivalent of
teaching someone to play a game. The rules and
objectives are clearly defined.
However, the outcome of any single game
depends on the judgment of the player who must
adjust his approach in response to the incumbent
environment, skill and actions of a given opponent.
It is often utilized in gaming and robotics.
A BEGINNER’S
GUIDE TO DEEP
REINFORCEMENT
LEARNING
https://pathmind.com/wiki/deep-reinforcement-learning
Deep reinforcement learning combines artificial neural
networks with a reinforcement learning architecture that
enables software-defined agents to learn the best actions
possible in virtual environment in order to attain their goals.
While neural networks are responsible for recent AI
breakthroughs in problems like computer vision, machine
translation and time series prediction – they can also combine
with reinforcement learning algorithms to create something
astounding like Deepmind’s AlphaGo, an algorithm that beat
the world champions of the Go board game.
Google DeepMind’s Deep Q-learning playing Atari Breakout
https://youtu.be/V1eYniJ0Rnk
MarI/O – Machine Learning for Video Games
https://youtu.be/qv6UVOQ0F44
https://pathmind.com/wiki/deep-reinforcement-learning
GOOGLE DEEPMIND’S DEEP Q-LEARNING
https://www.youtube.com/watch?v=V1eYniJ0Rnk&feature=youtu.be
The algo will play Atari breakout.
The most important thing to know to know is that al the agent is given is sensory input (what you see on the
screen) andit was ordered to maximize the score on the screen.
No domain knowledge is involved! This means that the algorithm does not know the concept of a ball or
what the controls exactly do.
Starting out – 10 minutes of training
The algorithm tries to hit the ball back, but it is yet to clumsy to manage.
After 120 minutes of training. It plays like an expert.
After 240 minutes of training
This is where the magic happens: It realizes that digging a tunnel through the wall is the most effective
technique to beat the game.
MORE – ALPHAGO
Reinforcement Learning – Ep. 30 (Deep Learning SIMPLIFIED)
https://www.youtube.com/watch?v=e3Jy2vShroE
Simulation and Automated Deep Learning
https://youtu.be/EHP47tM6ctc
Data is to machine learning what life is to human
learning. The output of a machine learning algorithm is
entirely dependent on the input data it is exposed to.
Therefore, to train a good machine learning model,
experts need to do good data preparation beforehand.
To some extent, machine learning performance depends
on the quality of the input data.
Deep learning follows a similar workflow as machine
learning, while the main advantage is deep learning
does not necessarily need structured data as input.
Imitating the way how our human brain works to solve
problems- by passing queries through various
hierarchies of concepts and related questions to find an
answer, deep learning uses artificial neural networks to
hierarchically define specific features via multiple layers
(as Figure 5 shown).
Deep learning weakens the dependence of machine
learning on feature engineering, which makes it general
and easier to apply to more fields. The following section
illustrates the primer knowledge about how deep
learning works.
DEEP
LEARNING
We know deep learning do the mapping of input to
output via a sequence of simple data transformations
(layers) in an Artificial Neural Network.
Take face recognition as an example, as shown in
Figure 7, data (face image) is presented to the
network via the input layer, which connects to one or
more hidden layers. The hidden layers further
connect to an output layer.
Each hidden layer represents one level of face image
features (greyscale, eye shape, facial contours, etc.).
Every node on each layer is connected to the nodes
on the neighbor layer with a weight value.
The actual processing of deep learning is done by
adjusting the weights of each connection to realize
input-output mapping.
EXAMPLE OF ANN USING FOR FACE RECOGNITION
https://fortune.com/longform/ai-artificial-
intelligence-deep-machine-learning/
Source for the next 4 slides
https://fortune.com/longform/ai-artificial-intelligence-deep-machine-learning/
https://fortune.com/l
ongform/ai-artificial-
intelligence-deep-
machine-learning/
Google launched the deep-learning-focused Google Brain project in 2011, introduced
neural nets into its speech-recognition products in mid-2012, and retained neural nets
pioneer Geoffrey Hinton in March 2013. It now has more than 1,000 deep-learning projects
underway, it says, extending across search, Android, Gmail, photo, maps, translate,
YouTube, and self-driving cars. In 2014 it bought DeepMind, whose deep reinforcement
learning project, AlphaGo, defeated the world’s go champion, Lee Sedol, in March,
achieving an artificial intelligence landmark.
MICROSOFT
Microsoft introduced deep learning into its commercial speech-recognition products,
including Bing voice search and X-Box voice commands, during the first half of 2011. The
company now uses neural nets for its search rankings, photo search, translation systems,
and more. “It’s hard to convey the pervasive impact this has had,” says Lee. Last year it won
the key image-recognition contest, and in September it scored a record low error rate on a
speech-recognition benchmark: 6.3%.
In December 2013, Facebook hired French neural nets innovator Yann LeCun to direct its new
AI research lab. Facebook uses neural nets to translate about 2 billion user posts per day in more
than 40 languages, and says its translations are seen by 800 million users a day. (About half its
community does not speak English.) Facebook also uses neural nets for photo search and photo
organization, and it’s working on a feature that would generate spoken captions for untagged
photos that could be used by the visually impaired.
BAIDU
In May 2014, Baidu hired Andrew Ng, who had earlier helped launch and lead the Google Brain
project, to lead its research lab. China’s leading search and web services site, Baidu uses neural
nets for speech recognition, translation, photo search, and a self-driving car project, among
others. Speech recognition is key in China, a mobile-first society whose main language, Mandarin,
is difficult to type into a device. The number of customers interfacing by speech has tripled in the
past 18 months, Baidu says.
ARTIFICIAL
NEURAL
NETWORKS
(ANN)
Most ANNs contain a learning scheme that modifies the
connection weights based on the input patterns and connection
types that it is presented. This brings different deep neural
networks, such as Convolutional deep neural networks (CNNs),
recurrent neural networks (RNNs).
Here we take a simple ANN to illustrate the learning process:
feed-forward and backpropagation, which is similar to biological
neural networks. Human brains learn to do complex things, such
as recognizing objects, not by processing exhaustive rules but
through experience, feedback, adjust and learn. Figure 8 gives an
illustration of this process.
In the beginning, all the connections are randomly assigned
weight values. In feed-forward step, all the input nodes receive
their respective values from the given input and generate a
combination, like linear transmission, to the nodes in hidden
layers.
Upon receiving the initial input, the hidden layers make a random
guess as to what that pattern might be via using the assigned
weights. There are various activation functions for the calculation
at the hidden and output layers. The sigmoid or logistic function
remains the most popular among users.
DEMO 3: LET US MOVE TO THE DEMO FOR CNN, GAN AND VAE
Open Demo 3 Deep Learning Demo_David Lee
APPLICATIONS: MORE EXAMPLES
TESLA AUTOPILOT
https://www.tesla.com/autopi
lot?redirect=no
https://www.tesla.com/autopilot?redirect=no
https://medium.com/@tomyuz/a-sentiment-analysis-approach-to-
predicting-stock-returns-d5ca8b75a42
https://medium.com/@tomyuz/a-sentiment-analysis-approach-to-predicting-stock-returns-d5ca8b75a42
DEMO 4: NATURAL LANGUAGE PROCESSING
Open Demo 4 NLP_David Lee
REFERENCES
Langley, P. (2011). The changing science of machine learning. Machine Learning, 82(3), 275-279.
Francois, C. (2017). Deep learning with Python. Manning Publications Co., NY, USA.
The rest of the materials are prepared with SUSS FinTech and Blockchain Team (Prof Reng Jin, Wang Yu,
Low Swee Won) and a joint book with IMDA based on https://www.imda.gov.sg/infocomm-media-
landscape/services-40
https://www.imda.gov.sg/infocomm-media-landscape/services-40
DEMO 5: DEEP FAKES
Open Demo 5 Deepfakes_David Lee