Artificial intelligence

PRESENTED BY
A.KEERTHIKA
M.SC(IT)

Let, suppose that we observe a response Y and p different
predictors X = (X₁, X₂,…., Xp). In general, we can say:
 Y =f(X) + εHere f is an unknown function, and ε is
the random error term.
 In essence, statistical learning refers to a set of
approaches for estimating f.
 In cases where we have set of X readily available, but
the output Y, not so much, the error averages to zero,
and we can say:
 ¥ = ƒ(X)

 where ƒ represents our estimate of f and ¥ represents the
resulting prediction.
 Hence for a set of predictors X, we can say:
 E(Y — ¥)² = E[f(X) + ε — ƒ(X)]²=> E(Y — ¥)² = [f(X) -
ƒ(X)]² + Var(ε)
where,
 E(Y — ¥)² represents the expected value of the squared
difference between actual and expected result.
 [f(X) — ƒ(X)]² represents the reducible error. It is
reducible because we can potentially improve the accuracy
of ƒ by better modeling.
 Var(ε) represents the irreducible error. It is irreducible
because no matter how well we estimate ƒ, we cannot
reduce the error introduced by variance in ε.

 Variables, Y, can be broadly be characterised
as quantitative or qualitative( also known
as categorical). Quantitative variables take on
numerical values, e.g., age, height, income, price, and
much more. Estimating qualitative responses is often
termed as a regression problem. Qualitative
variables take on categorical values, e.g., gender,
brand, parts of speech, and much more. Estimating
qualitative responses is often termed
as a classification problem.

 Variance refers to the amount by which ƒ would change if
we estimated with different training data sets. In general,
when we over-fit a model on a given training data
set(reducible error in training set is very low but on test set
is very high), we get a model that has higher variance since
any change in the data points would results in a
significantly different model.
 Bias refers to the error introduced by approximating a real-
life problem, which may be extremely complicated by a
much simpler model — for example, modeling non-linear
problems with a linear model. In general, when we over-fit,
a model on given data set it results in very less bias.

Linear regression is a statistical method belonging to
supervised learning used for predicting quantitative
responses.
 Simple Linear Regression approach predicts a
quantitative response ¥ based on a single variable X
assuming a linear relationship. We can say :
 ¥ ≈ β₀ + β₁XOur

 The non-constant variance of error terms.
 Outliers: when the actual prediction is very far from the
estimated one, can arise due to inaccurate recording of
data.
 High-leverage points: Unusual values of the predictors
impact the regression line known as high leverage points.
 Collinearity: where two or more predictor variables are
closely related to each other, it may be challenging to weed
out the individual effect of a single predictor variable.

 KNN Regression is a non-parametric approach towards
estimating or predicting values, which do not assume
the form of ƒ(X). It estimates/predicts ƒ(x₀) where x₀ is
a prediction point by averaging out all N₀ responses
closest to x₀. We can say:

 The responses as we discussed till now, may not always
be quantitative, it can be also qualitative, predicting these
qualitative responses is called classification.
 We will discuss various statistical approaches to classification
including:
 SVM
 Logistic Regression
 KNN Classifier
 GAM
 Trees
 Random Forest
 Boosting

 SVM or support vector machine is the classifier that
maximizes the margin. The goal of a classifier in our
example below is to find a line or (n-1) dimension
hyper-plane that separates the two classes present in
the n-dimensional space. I have written a
detailed article explaining the derivation and
formulation of SVM. In my opinion, it is one of the
most powerful techniques in our tool box of statistical
methods in AI.

 KNN(K nearest neighbors) Classifier is a lazy learning
technique, where the training data set is represented on a
Euclidean hyperplane, and test data is assigned the labels
based on the K Euclidean distance metrics.
 Practical Aspects
 K should be chosen empirically and preferably odd to avoid
tie situation.
 KNN should have both discrete and continuous target
functions.
 Weighted contribution(e.g. distance based) from different
neighbors can be used computing the final label.

 Advantages Of KNN
 We can learn a complex target function.
 Zero loss of any information.
 Disadvantages of KNN
 Classification cost of new instances is very high.
 Significant computation takes place at classification
time.

All the above methods had some form of annotated data
set. But when we want to learn patterns in our data
without any annotations unsupervised learning comes
into the picture.
The most widely used statistical method for
unsupervised learning is K-Means Clustering. We take
k random points in our data set and map all other
points to one of the K regions based on their closeness
to K chosen random points. Then we change the K
random points to the centroid of the clusters thus
formed. We do that until we observe a negligible
change in the cluster formed after each iteration.

Artificial intelligence

More Related Content

Similar to Artificial intelligence (20)

More from keerthikaA8 (11)

Recently uploaded (20)

Artificial intelligence