2. Topics to be covered…..
• Introduction to Machine Learning
• Supervised Learning
• Unsupervised Learning
• Python libraries for Machine Learning
3. ML
Machine Learning (ML) is an automated learning
with little or no human intervention. It involves
programming computers so that they learn from
the available inputs. The main purpose of
machine learning is to explore and construct
algorithms that can learn from the previous data
and make predictions on new input data.
4. What is Machine Learning?
• The capability of Artificial Intelligence systems
to learn by extracting patterns from data is
known as Machine Learning.
• Machine Learning is an idea to learn from
examples and experience, without being
explicitly programmed. Instead of writing code,
you feed data to the generic algorithm, and it
builds logic based on the data given.
5. Introduction to Machine Learning
• Python is a popular platform used for research
and development of production systems. It is
a vast language with number of modules,
packages and libraries that provides multiple
ways of achieving a task.
• Python and its libraries like NumPy, Pandas,
SciPy, Scikit-Learn, Matplotlib are used in data
science and data analysis. They are also
extensively used for creating scalable machine
learning algorithms.
6. • Python implements popular machine learning
techniques such as Classification, Regression,
Recommendation, and Clustering.
• Python offers ready-made framework for
performing data mining tasks on large
volumes of data effectively in lesser time
7. Growth of Machine Learning
• Machine learning is preferred approach to
– Speech recognition, Natural language processing
– Computer vision
– Medical outcomes analysis
– Robot control
– Computational biology
• This trend is accelerating
– Improved machine learning algorithms
– Improved data capture, networking, faster computers
– Software too complex to write by hand
– New sensors / IO devices
– Demand for self-customization to user, environment
– It turns out to be difficult to extract knowledge from human expertsfailure of
expert systems in the 1980’s.
8. Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
• Unsupervised learning
– Training data does not include desired outputs
• Reinforcement learning
– Rewards from sequence of actions
9. • Similarly, there are four categories of machine
learning algorithms as shown below:
• Supervised learning algorithm
• Unsupervised learning algorithm
• Semi-supervised learning algorithm
• Reinforcement learning algorithm
10. Supervised Learning
• Supervised learning, as the name indicates, has
the presence of a supervisor as a teacher.
Basically supervised learning is when we teach or
train the machine using data that is well labeled.
Which means some data is already tagged with
the correct answer. After that, the machine is
provided with a new set of examples(data) so
that the supervised learning algorithm analyses
the training data(set of training examples) and
produces a correct outcome from labeled data.
11. • For instance, suppose you are given a basket filled
with different kinds of fruits. Now the first step is
to train the machine with all different fruits one by
one like this:
If the shape of the object is rounded and has a depression at the top, is
red in color, then it will be labeled as –Apple.
If the shape of the object is a long curving cylinder having Green-
Yellow color, then it will be labeled as –Banana.
12. • Now suppose after training the data, you have
given a new separate fruit, say Banana from
the basket, and asked to identify it.
•Since the machine has already learned the things from previous
data and this time has to use it wisely. It will first classify the fruit
with its shape and color and would confirm the fruit name as
BANANA and put it in the Banana category. Thus the machine
learns the things from training data(basket containing fruits) and
then applies the knowledge to test data(new fruit).
13. Supervised learning is classified into two categories
of algorithms:
• Classification: A classification problem is when the
output variable is a category, such as “Red” or
“blue” or “disease” and “no disease”.
• Regression: A regression problem is when the
output variable is a real value, such as “dollars” or
“weight”.
14. Supervised learning deals with or learns with “labeled” data. This
implies that some data is already tagged with the correct answer.
Types:-
Regression
Logistic Regression
Classification
Naive Bayes Classifiers
K-NN (k nearest neighbors)
Decision Trees
Support Vector Machine
15. Advantages:-
• Supervised learning allows collecting data and produces data
output from previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-
world computation problems.
Disadvantages:-
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time.
So, it requires a lot of time.
16. Unsupervised learning
• Unsupervised learning is the training of a machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information without
guidance. Here the task of the machine is to group
unsorted information according to similarities, patterns,
and differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that
means no training will be given to the machine. Therefore
the machine is restricted to find the hidden structure in
unlabeled data by itself.
17. For instance, suppose it is given an image having
both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t
categorize it as ‘dogs and cats ‘. But it can categorize them according to their
similarities, patterns, and differences, i.e., we can easily categorize the above picture
into two parts. The first may contain all pics having dogs in them and the second part
may contain all pics having cats in them. Here you didn’t learn anything before,
which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
18. Unsupervised learning is classified into two categories
of algorithms:
• Clustering: A clustering problem is where you want
to discover the inherent groupings in the data, such
as grouping customers by purchasing behavior.
• Association: An association rule learning problem is
where you want to discover rules that describe
large portions of your data, such as people that buy
X also tend to buy Y
20. Supervised vs. Unsupervised Machine
Learning
Parameters
Supervised machine
learning
Unsupervised
machine learning
Input Data
Algorithms are trained
using labeled data.
Algorithms are used
against data that is not
labeled
Computational
Complexity
Simpler method
Computationally
complex
Accuracy Highly accurate Less accurate
21. Reinforcement Learning
Reinforcement learning is an area of Machine Learning. It is
about taking suitable action to maximize reward in a particular
situation. It is employed by various software and machines to find
the best possible behavior or path it should take in a specific
situation. Reinforcement learning differs from supervised learning
in a way that in supervised learning the training data has the answer
key with it so the model is trained with the correct answer itself
whereas in reinforcement learning, there is no answer but the
reinforcement agent decides what to do to perform the given task. In
the absence of a training dataset, it is bound to learn from its
experience.
22. • Example: The problem is as follows: We have an agent and a reward, with
many hurdles in between. The agent is supposed to find the best possible path
to reach the reward. The following problem explains the problem more
easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to
get the reward that is the diamond and avoid the hurdles that are fired. The robot
learns by trying all the possible paths and then choosing the path which gives him
the reward with the least hurdles. Each right step will give the robot a reward and
each wrong step will subtract the reward of the robot. The total reward will be
calculated when it reaches the final reward that is the diamond.
23. Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model
will start
• Output: There are many possible outputs as there are a variety of
solutions to a particular problem
• Training: The training is based upon the input, The model will
return a state and the user will decide to reward or punish the
model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
24. Types of Reinforcement: There are two types of
Reinforcement:
• Positive
Positive Reinforcement is defined as when an event, occurs due to a particular behaviour,
increases the strength and the frequency of the behaviour. In other words, it has a positive effect
on behaviour. Advantages of reinforcement learning are:
– Maximizes Performance
– Sustain Change for a long period of time
– Too much Reinforcement can lead to an overload of states which can diminish the results
• Negative
Negative Reinforcement is defined as strengthening of behaviour because a negative condition is
stopped or avoided. Advantages of reinforcement learning:
– Increases Behaviour
– Provide defiance to a minimum standard of performance
– It Only provides enough to meet up the minimum behaviour
25. Libraries and Packages
• To understand machine learning, you need to have basic
knowledge of Python programming. In addition, there are a
number of libraries and packages generally used in
performing various machine learning tasks as listed below:
– numpy - is used for its N-dimensional array objects
– pandas – is a data analysis library that includes dataframes
– matplotlib – is 2D plotting library for creating graphs and plots
– scikit-learn - the algorithms used for data analysis and data mining
tasks
– seaborn – a data visualization library based on matplotlib