Discriminant analysis

Presented by
Amritashish
Bagchi,
Anshuman Mishra
& Sukanta
Goswami

Definition
Discriminant analysis is a multivariate
statistical technique used for classifying a
set of observations into pre defined groups.

OBJECTIVE
 To

understand group differences and to
predict the likelihood that a particular entity
will belong to a particular class or group
based on independent variables.

Purpose
1) The

main purpose is to classify a subject into
one of the two groups on the basis of some
independent traits.
2) A second purpose of the discriminant analysis
is to study the relationship between group
membership and the variables used to
predict the group membership.

Situations for its use
 When

the dependent variable is
dichotomous or multichotomous.

 Independent

variables are metric, i.e.
interval or ratio.

Application of discriminant
analysis
 To

identify the characteristics on the basis
of which one can classify an individual as1. basketballer or volleyballer on the basis
of anthropometric variables.
2. High or low performer on the basis of skill.
3. Juniors or seniors category on the basis
of the maturity parameters.

What we do in discriminant
analysis





It is also known as discriminant function analysis.
In, discriminant analysis, the dependent variable is
a categorical variable, whereas independent
variables are metric.
after developing the discriminant model, for a
given set of new observation the discriminant
function Z is computed, and the subject/ object is
assigned to first group if the value of Z is less than 0
and to second group if more than 0. This criterion
holds true if an equal number of observations are
taken in both the groups for developing a
discriminant function.

Assumptions
1. Sample size
 group sizes of the dependent should not be
grossly different i.e. 80:20, here logistic
regression may be prefer.
 should be at least five times the number of
independent variables.
2. Normal distribution
 Each of the independent variable is normally
distributed.

3. Homogeneity of variances / covariances
 All variables have linear and homoscedastic
relationships.
4. Outliers
 Outliers should not be present in the data.
DA is highly sensitive to the inclusion of
outliers.
5. Non-multicollinearity
 There should be any correlation among the
independent variables.

6. Mutually exclusive
 The groups must be mutually exclusive, with
every subject or case belonging to only one
group.
7. Classification
 Each of the allocations for the dependent
categories in the initial classiﬁcation are
correctly classiﬁed.
8. Variability
 No independent variables should have a zero
variability in either of the groups formed by
the dependent variable.

Terminology
Variables in the analysis
2) Discriminant function
 A discriminant function is a latent variable
which is constructed as a linear combination
of independent variables, such that
 Z= c+b1X1+ b2X2+…+bnXn
 The discriminant function is also known as
canonical root. This discriminant function is
used to classify the subject/cases into one of
the two groups on the basis of the observed
values of the predictor variables
1)







3) Classification matrix
In DA, it serves as a yardstick in measuring the
accuracy of a model in classifying an individual
/case into one of the two groups. It is also known
as confusion matrix, assignment matrix,or
prediction matrix. It tells us as to what percentage
of the existing data points are correctly classified
by the model developed in DA.
4) Stepwise method of discriminant analysis
Discriminant function can be developed either by
entering all independent variables together or in
stepwise depending upon whether the study is
confirmatory or exploratory.










5) Power of discriminatory variables
After developing the model in the discriminant
analysis based on the selected independent
variables, it is important to know the relative
importance of the variables so selected.
6) Box’s M Test
By using Box’s M Tests, we test a null hypothesis that
the covariance matrices do not differ between
groups formed by the dependent variable. If the
Box’s M Test is insignificant, it indicates that the
assumptions required for DA holds true.
7) Eigen values
Eigen value is the index of overall fit.






8) WILKS lambda
It measures the efficiency of discriminant
function in the model.
Its value shows, how much percentage of
variability in dependent variable is not
explained by the independent variables.






9) Cannonial correlation
The canonical correlation is the multiple
correlation between the predictors and the
discriminant function. With only one function it
provides an index of overall model ﬁt which is
interpreted as being the proportion of
variance explained (R2).

STEPS IN ANALYSIS :


STEP 1.
In step one the
independent variables
which have the
discriminating power are
being chosen.



STEP 2.

A discriminant
function model is
developed by using
the coefficients of
independent
variables

STEPS IN ANALYSIS Contd…


STEP 3.
In step three
Wilk’s lambda is
computed for
testing the
significance of
discriminant
function.

STEP 4.
In step four the
independent
variables which
possess importance
in discriminating the
groups are being
found.


STEPS IN ANALYSIS Contd…
 STEP 5.

In step five classification of subjects to their
respective group is being made.

APPLICATION OF SPSS
Eg. To identify the players into different categories
during selection process

.

Box's Test of Equality of Covariance Matrices

Where,
Height
Back explosive power
Judgement
Patience

Means of the Transformed Groups Centroids

 Mean


of group 1
Mean of group 2
(Batsmen )
(Bowler)





4.390

-4.390

0

Discriminant analysis

More Related Content

What's hot

Viewers also liked

Similar to Discriminant analysis

Recently uploaded

Discriminant analysis