Presented by
Amritashish
Bagchi,
Anshuman Mishra
& Sukanta
Goswami
Definition
Discriminant analysis is a multivariate
statistical technique used for classifying a
set of observations into pre defined groups.

OBJECTIVE
 To

understand group differences and to
predict the likelihood that a particular entity
will belong to a particular class or group
based on independent variables.
Purpose
1) The

main purpose is to classify a subject into
one of the two groups on the basis of some
independent traits.
2) A second purpose of the discriminant analysis
is to study the relationship between group
membership and the variables used to
predict the group membership.
Situations for its use
 When

the dependent variable is
dichotomous or multichotomous.

 Independent

variables are metric, i.e.
interval or ratio.
Application of discriminant
analysis
 To

identify the characteristics on the basis
of which one can classify an individual as1. basketballer or volleyballer on the basis
of anthropometric variables.
2. High or low performer on the basis of skill.
3. Juniors or seniors category on the basis
of the maturity parameters.
What we do in discriminant
analysis





It is also known as discriminant function analysis.
In, discriminant analysis, the dependent variable is
a categorical variable, whereas independent
variables are metric.
after developing the discriminant model, for a
given set of new observation the discriminant
function Z is computed, and the subject/ object is
assigned to first group if the value of Z is less than 0
and to second group if more than 0. This criterion
holds true if an equal number of observations are
taken in both the groups for developing a
discriminant function.
Assumptions
1. Sample size
 group sizes of the dependent should not be
grossly different i.e. 80:20, here logistic
regression may be prefer.
 should be at least five times the number of
independent variables.
2. Normal distribution
 Each of the independent variable is normally
distributed.
3. Homogeneity of variances / covariances
 All variables have linear and homoscedastic
relationships.
4. Outliers
 Outliers should not be present in the data.
DA is highly sensitive to the inclusion of
outliers.
5. Non-multicollinearity
 There should be any correlation among the
independent variables.
6. Mutually exclusive
 The groups must be mutually exclusive, with
every subject or case belonging to only one
group.
7. Classification
 Each of the allocations for the dependent
categories in the initial classification are
correctly classified.
8. Variability
 No independent variables should have a zero
variability in either of the groups formed by
the dependent variable.
Terminology
Variables in the analysis
2) Discriminant function
 A discriminant function is a latent variable
which is constructed as a linear combination
of independent variables, such that
 Z= c+b1X1+ b2X2+…+bnXn
 The discriminant function is also known as
canonical root. This discriminant function is
used to classify the subject/cases into one of
the two groups on the basis of the observed
values of the predictor variables
1)






3) Classification matrix
In DA, it serves as a yardstick in measuring the
accuracy of a model in classifying an individual
/case into one of the two groups. It is also known
as confusion matrix, assignment matrix,or
prediction matrix. It tells us as to what percentage
of the existing data points are correctly classified
by the model developed in DA.
4) Stepwise method of discriminant analysis
Discriminant function can be developed either by
entering all independent variables together or in
stepwise depending upon whether the study is
confirmatory or exploratory.









5) Power of discriminatory variables
After developing the model in the discriminant
analysis based on the selected independent
variables, it is important to know the relative
importance of the variables so selected.
6) Box’s M Test
By using Box’s M Tests, we test a null hypothesis that
the covariance matrices do not differ between
groups formed by the dependent variable. If the
Box’s M Test is insignificant, it indicates that the
assumptions required for DA holds true.
7) Eigen values
Eigen value is the index of overall fit.





8) WILKS lambda
It measures the efficiency of discriminant
function in the model.
Its value shows, how much percentage of
variability in dependent variable is not
explained by the independent variables.






9) Cannonial correlation
The canonical correlation is the multiple
correlation between the predictors and the
discriminant function. With only one function it
provides an index of overall model fit which is
interpreted as being the proportion of
variance explained (R2).
STEPS IN ANALYSIS :


STEP 1.
In step one the
independent variables
which have the
discriminating power are
being chosen.



STEP 2.

A discriminant
function model is
developed by using
the coefficients of
independent
variables
STEPS IN ANALYSIS Contd…


STEP 3.
In step three
Wilk’s lambda is
computed for
testing the
significance of
discriminant
function.

STEP 4.
In step four the
independent
variables which
possess importance
in discriminating the
groups are being
found.

STEPS IN ANALYSIS Contd…
 STEP 5.

In step five classification of subjects to their
respective group is being made.
APPLICATION OF SPSS
Eg. To identify the players into different categories
during selection process

.
Group statistics
Box's Test of Equality of Covariance Matrices
Where,
Height
Back explosive power
Judgement
Patience
Means of the Transformed Groups Centroids

 Mean


of group 1
Mean of group 2
(Batsmen )
(Bowler)





4.390

-4.390

0
Discriminant analysis

Discriminant analysis

  • 1.
  • 2.
    Definition Discriminant analysis isa multivariate statistical technique used for classifying a set of observations into pre defined groups. OBJECTIVE  To understand group differences and to predict the likelihood that a particular entity will belong to a particular class or group based on independent variables.
  • 3.
    Purpose 1) The main purposeis to classify a subject into one of the two groups on the basis of some independent traits. 2) A second purpose of the discriminant analysis is to study the relationship between group membership and the variables used to predict the group membership.
  • 4.
    Situations for itsuse  When the dependent variable is dichotomous or multichotomous.  Independent variables are metric, i.e. interval or ratio.
  • 5.
    Application of discriminant analysis To identify the characteristics on the basis of which one can classify an individual as1. basketballer or volleyballer on the basis of anthropometric variables. 2. High or low performer on the basis of skill. 3. Juniors or seniors category on the basis of the maturity parameters.
  • 6.
    What we doin discriminant analysis    It is also known as discriminant function analysis. In, discriminant analysis, the dependent variable is a categorical variable, whereas independent variables are metric. after developing the discriminant model, for a given set of new observation the discriminant function Z is computed, and the subject/ object is assigned to first group if the value of Z is less than 0 and to second group if more than 0. This criterion holds true if an equal number of observations are taken in both the groups for developing a discriminant function.
  • 7.
    Assumptions 1. Sample size group sizes of the dependent should not be grossly different i.e. 80:20, here logistic regression may be prefer.  should be at least five times the number of independent variables. 2. Normal distribution  Each of the independent variable is normally distributed.
  • 8.
    3. Homogeneity ofvariances / covariances  All variables have linear and homoscedastic relationships. 4. Outliers  Outliers should not be present in the data. DA is highly sensitive to the inclusion of outliers. 5. Non-multicollinearity  There should be any correlation among the independent variables.
  • 9.
    6. Mutually exclusive The groups must be mutually exclusive, with every subject or case belonging to only one group. 7. Classification  Each of the allocations for the dependent categories in the initial classification are correctly classified. 8. Variability  No independent variables should have a zero variability in either of the groups formed by the dependent variable.
  • 10.
    Terminology Variables in theanalysis 2) Discriminant function  A discriminant function is a latent variable which is constructed as a linear combination of independent variables, such that  Z= c+b1X1+ b2X2+…+bnXn  The discriminant function is also known as canonical root. This discriminant function is used to classify the subject/cases into one of the two groups on the basis of the observed values of the predictor variables 1)
  • 11.
        3) Classification matrix InDA, it serves as a yardstick in measuring the accuracy of a model in classifying an individual /case into one of the two groups. It is also known as confusion matrix, assignment matrix,or prediction matrix. It tells us as to what percentage of the existing data points are correctly classified by the model developed in DA. 4) Stepwise method of discriminant analysis Discriminant function can be developed either by entering all independent variables together or in stepwise depending upon whether the study is confirmatory or exploratory.
  • 12.
          5) Power ofdiscriminatory variables After developing the model in the discriminant analysis based on the selected independent variables, it is important to know the relative importance of the variables so selected. 6) Box’s M Test By using Box’s M Tests, we test a null hypothesis that the covariance matrices do not differ between groups formed by the dependent variable. If the Box’s M Test is insignificant, it indicates that the assumptions required for DA holds true. 7) Eigen values Eigen value is the index of overall fit.
  • 13.
       8) WILKS lambda Itmeasures the efficiency of discriminant function in the model. Its value shows, how much percentage of variability in dependent variable is not explained by the independent variables.    9) Cannonial correlation The canonical correlation is the multiple correlation between the predictors and the discriminant function. With only one function it provides an index of overall model fit which is interpreted as being the proportion of variance explained (R2).
  • 15.
    STEPS IN ANALYSIS:  STEP 1. In step one the independent variables which have the discriminating power are being chosen.  STEP 2. A discriminant function model is developed by using the coefficients of independent variables
  • 16.
    STEPS IN ANALYSISContd…  STEP 3. In step three Wilk’s lambda is computed for testing the significance of discriminant function. STEP 4. In step four the independent variables which possess importance in discriminating the groups are being found. 
  • 17.
    STEPS IN ANALYSISContd…  STEP 5. In step five classification of subjects to their respective group is being made.
  • 18.
    APPLICATION OF SPSS Eg.To identify the players into different categories during selection process .
  • 25.
  • 26.
    Box's Test ofEquality of Covariance Matrices
  • 27.
  • 32.
    Means of theTransformed Groups Centroids  Mean  of group 1 Mean of group 2 (Batsmen ) (Bowler)   4.390 -4.390 0