SlideShare a Scribd company logo
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-1
Chapter 6
Chapter 6
Logistic Regression: Regression with a
Logistic Regression: Regression with a
Binary Dependent Variable
Binary Dependent Variable
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-2
LEARNING OBJECTIVES
LEARNING OBJECTIVES
Upon completing this chapter, you should be able to
Upon completing this chapter, you should be able to
do the following:
do the following:
• State the circumstances under which logistic
State the circumstances under which logistic
regression should be used instead of multiple
regression should be used instead of multiple
regression.
regression.
• Identify the types of dependent and independent
Identify the types of dependent and independent
variables used in the application of logistic
variables used in the application of logistic
regression.
regression.
• Describe the method used to transform binary
Describe the method used to transform binary
measures into the likelihood and probability
measures into the likelihood and probability
measures used in logistic regression.
measures used in logistic regression.
Chapter 6
Chapter 6
Logistic Regression: Regression with a
Logistic Regression: Regression with a
Binary Dependent Variable
Binary Dependent Variable
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-3
LEARNING OBJECTIVES continued . . .
LEARNING OBJECTIVES continued . . .
Upon completing this chapter, you should be able to
Upon completing this chapter, you should be able to
do the following:
do the following:
• Interpret the results of a logistic regression
Interpret the results of a logistic regression
analysis and assessing predictive accuracy, with
analysis and assessing predictive accuracy, with
comparisons to both multiple regression and
comparisons to both multiple regression and
discriminant analysis.
discriminant analysis.
• Understand the strengths and weaknesses of
Understand the strengths and weaknesses of
logistic regression compared to discriminant
logistic regression compared to discriminant
analysis and multiple regression.
analysis and multiple regression.
Chapter 6
Chapter 6
Logistic Regression: Regression with a
Logistic Regression: Regression with a
Binary Dependent Variable
Binary Dependent Variable
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-4
Logistic Regression . . . is a specialized
Logistic Regression . . . is a specialized
form of regression that is designed to predict
form of regression that is designed to predict
and explain a binary (two-group) categorical
and explain a binary (two-group) categorical
variable rather than a metric dependent
variable rather than a metric dependent
measure. Its variate is similar to regular
measure. Its variate is similar to regular
regression and made up of metric
regression and made up of metric
independent variables. It is less affected than
independent variables. It is less affected than
discriminant analysis when the basic
discriminant analysis when the basic
assumptions, particularly normality of the
assumptions, particularly normality of the
independent variables, are not met.
independent variables, are not met.
Logistic Regression Defined
Logistic Regression Defined
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-5
Logistic Regression May Be Preferred . . .
Logistic Regression May Be Preferred . . .
When the dependent variable has only two groups, logistic
regression may be preferred for two reasons:
• Discriminant analysis assumes multivariate normality and equal
variance-covariance matrices across groups, and these
assumptions are often not met. Logistic regression does not
face these strict assumptions and is much more robust when
these assumptions are not met, making its application
appropriate in many situations.
• Even if the assumptions are met, some researchers prefer
logistic regression because it is similar to multiple regression. It
has straightforward statistical tests, similar approaches to
incorporating metric and nonmetric variables and nonlinear
effects, and a wide range of diagnostics.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-6
Multiple Regression Decision Process
Multiple Regression Decision Process
Stage 1: Objectives of Logistic Regression
Stage 1: Objectives of Logistic Regression
Stage 2: Research Design for Logistic Regression
Stage 2: Research Design for Logistic Regression
Stage 3: Assumptions of Logistic Regression
Stage 3: Assumptions of Logistic Regression
Stage 4: Estimation of the Logistic Regression Model
Stage 4: Estimation of the Logistic Regression Model
and Assessing Overall Fit
and Assessing Overall Fit
Stage 5: Interpretation of the Results
Stage 5: Interpretation of the Results
Stage 6: Validation of the Results
Stage 6: Validation of the Results
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-7
Logistic regression is best suited to address
Logistic regression is best suited to address
two research objectives . . .
two research objectives . . .
• Identifying the independent variables that
Identifying the independent variables that
impact group membership in the dependent
impact group membership in the dependent
variable.
variable.
• Establishing a classification system based on
Establishing a classification system based on
the logistic model for determining group
the logistic model for determining group
membership.
membership.
Stage 1: Objectives of Logistic Regression
Stage 1: Objectives of Logistic Regression
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-8
Stage 2: Research Design for
Stage 2: Research Design for
Logistic Regression
Logistic Regression
• The binary nature of the dependent variable (0 – 1)
The binary nature of the dependent variable (0 – 1)
means the error term has a binomial distribution
means the error term has a binomial distribution
instead of a normal distribution, and it thus invalidates
instead of a normal distribution, and it thus invalidates
all testing based on the assumption of normality.
all testing based on the assumption of normality.
• The variance of the dichotomous variable is not
The variance of the dichotomous variable is not
constant, creating instances of heteroscedasticity as
constant, creating instances of heteroscedasticity as
well.
well.
• Neither of the above violations can be remedied
Neither of the above violations can be remedied
through transformations of the dependent or
through transformations of the dependent or
independent variables. Logistic regression was
independent variables. Logistic regression was
developed to specifically deal with these issues.
developed to specifically deal with these issues.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-9
Stage 3: Assumptions of
Stage 3: Assumptions of
Logistic Regression
Logistic Regression
• The advantages of logistic regression are
The advantages of logistic regression are
primarily the result of the general lack of
primarily the result of the general lack of
assumptions.
assumptions.
• Logistic regression does not require any specific
Logistic regression does not require any specific
distributional form for the independent variables.
distributional form for the independent variables.
• Heteroscedasticity of the independent variables is
Heteroscedasticity of the independent variables is
not required.
not required.
• Linear relationships between the dependent and
Linear relationships between the dependent and
independent variables are not required.
independent variables are not required.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-10
Stage 4: Estimation of Logistic Regression
Stage 4: Estimation of Logistic Regression
Model and Assessing Overall Fit
Model and Assessing Overall Fit
• Transforming the dependent variable
Transforming the dependent variable
• Estimating the coefficients
Estimating the coefficients
• Transforming a probability into odds and
Transforming a probability into odds and
logit values
logit values
• Model estimation
Model estimation
• Assessing the goodness of fit
Assessing the goodness of fit
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-11
Estimating the Coefficients
Estimating the Coefficients
Two basic steps . . .
Two basic steps . . .
1.
1. Transforming a probability into odds and logit values
Transforming a probability into odds and logit values
2.
2. Model estimation using a maximum likelihood
Model estimation using a maximum likelihood
approach, not least squares as in multiple
approach, not least squares as in multiple
regression
regression
• The estimation process maximizes the likelihood
The estimation process maximizes the likelihood
that an event will occur – the event being a
that an event will occur – the event being a
respondent is assigned to one group versus
respondent is assigned to one group versus
another
another
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-12
Transforming a Probability into
Transforming a Probability into
Odds and Logit Values
Odds and Logit Values
o The logistic transformation has two basic steps:
The logistic transformation has two basic steps:
Restating a probability as odds, and
Restating a probability as odds, and
Calculating the logit values.
Calculating the logit values.
o Instead of using ordinary least squares to
Instead of using ordinary least squares to
estimate the model, the maximum likelihood
estimate the model, the maximum likelihood
method is used.
method is used.
o The basic measure of how well the maximum
The basic measure of how well the maximum
likelihood estimation procedure fits is the
likelihood estimation procedure fits is the
likelihood value.
likelihood value.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-13
Model Estimation Fit – Between Model
Model Estimation Fit – Between Model
comparisons . . .
comparisons . . .
Comparisons of the likelihood values follow three
Comparisons of the likelihood values follow three
steps:
steps:
1.
1. Estimate a Null Model – which acts as the
Estimate a Null Model – which acts as the
“baseline” for making comparisons of improvement
“baseline” for making comparisons of improvement
in model fit.
in model fit.
2.
2. Estimate Proposed Model – the model containing
Estimate Proposed Model – the model containing
the independent variables to be included in the
the independent variables to be included in the
logistic regression.
logistic regression.
3.
3. Assess – 2LL Difference.
Assess – 2LL Difference.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-14
Comparison to Multiple Regression . . .
Comparison to Multiple Regression . . .
Correspondence of Primary Elements of Model Fit
Correspondence of Primary Elements of Model Fit
Multiple Regression Logistic Regression
Total Sum of Squares -2LL of Base Model
Error Sum of Squares -2LL of Proposed Model
Regression Sum of Squares Difference of -LL for
Base and Proposed Models
F test of model fit Chi-square Test of -
2LL Difference
Coefficient of determination “Pseudo” R2
measures
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-15
Stage 5: Interpretation of the Results
Stage 5: Interpretation of the Results
• Testing for significance of the coefficients –
Testing for significance of the coefficients –
based on the Wald statistic
based on the Wald statistic
• Interpreting the coefficients
Interpreting the coefficients
• Directionality of the relationship
Directionality of the relationship
• Magnitude of the relationship of metric
Magnitude of the relationship of metric
independent variables
independent variables
• Interpreting nonmetric independent variables
Interpreting nonmetric independent variables
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-16
Directionality of the Relationship
Directionality of the Relationship
A positive relationship means an increase in the
A positive relationship means an increase in the
independent variable is associated with an increase in the
independent variable is associated with an increase in the
predicted probability, and vice versa. But the direction of
predicted probability, and vice versa. But the direction of
the relationship is reflected differently for the original and
the relationship is reflected differently for the original and
exponentiated logistic coefficients.
exponentiated logistic coefficients.
• Original coefficient signs indicate the direction of the
Original coefficient signs indicate the direction of the
relationship.
relationship.
• Exponentiated coefficients are interpreted differently
Exponentiated coefficients are interpreted differently
since they are the logarithms of the original coefficients
since they are the logarithms of the original coefficients
and do not have negative values. Thus, exponentiated
and do not have negative values. Thus, exponentiated
coefficients above 1.0 represent a positive relationship
coefficients above 1.0 represent a positive relationship
and values less than 1.0 represent negative
and values less than 1.0 represent negative
relationships
relationships.
.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-17
Magnitude of the Relationship . . .
Magnitude of the Relationship . . .
The magnitude of metric independent
The magnitude of metric independent
variables is interpreted differently for original and
variables is interpreted differently for original and
exponentiated logistic coefficients:
exponentiated logistic coefficients:
• Original logistic coefficients
Original logistic coefficients – are less useful in
– are less useful in
determining the magnitude of the relationship since
determining the magnitude of the relationship since
the reflect the change in the logit (logged odds)
the reflect the change in the logit (logged odds)
value.
value.
• Exponentiated coefficients
Exponentiated coefficients – directly reflect the
– directly reflect the
magnitude of the change in the odds value. But their
magnitude of the change in the odds value. But their
impact is multiplicative and a coefficient of 1.0
impact is multiplicative and a coefficient of 1.0
denotes no change (1.0 times the independent
denotes no change (1.0 times the independent
variable = no change).
variable = no change).
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-18
Rules of Thumb 6–1
Rules of Thumb 6–1
Logistic Regression
Logistic Regression
• Logistic regression is the preferred method for two-
Logistic regression is the preferred method for two-
group (binary) dependent variables due to its
group (binary) dependent variables due to its
robustness, ease of interpretation and diagnostics.
robustness, ease of interpretation and diagnostics.
• Sample size considerations for logistic regression are
Sample size considerations for logistic regression are
primarily focused on the size of each group, which
primarily focused on the size of each group, which
should have 10 times the number of estimated model
should have 10 times the number of estimated model
coefficients (the number of variables).
coefficients (the number of variables).
• Sample size should be met in both the analysis and
Sample size should be met in both the analysis and
holdout samples.
holdout samples.
• Model significance tests are made with a chi-square
Model significance tests are made with a chi-square
test on the differences in the log likelihood values (-
test on the differences in the log likelihood values (-
2LL) between two models.
2LL) between two models.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-19
Rules of Thumb 6–1 continued . . .
Rules of Thumb 6–1 continued . . .
Logistic Regression
Logistic Regression
• Coefficients are expressed in two forms: original and
Coefficients are expressed in two forms: original and
exponentiated to assist in interpretation.
exponentiated to assist in interpretation.
• Interpretation of the coefficients for direction and
Interpretation of the coefficients for direction and
magnitude is:
magnitude is:
Direction can be directly assessed in the original
Direction can be directly assessed in the original
coefficients (positive or negative signs) or indirectly in
coefficients (positive or negative signs) or indirectly in
the exponentiated coefficients (less than 1 are
the exponentiated coefficients (less than 1 are
negative, greater than 1 are positive).
negative, greater than 1 are positive).
Magnitude is best assessed by the exponentiated
Magnitude is best assessed by the exponentiated
coefficient, with the percentage change in the
coefficient, with the percentage change in the
dependent variable shown by: Percentage change =
dependent variable shown by: Percentage change =
(Exponentiated Coefficient – 1.0) * 100
(Exponentiated Coefficient – 1.0) * 100
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-20
Stage 6: Validation of the Results
Stage 6: Validation of the Results
• Involves ensuring both the internal and
Involves ensuring both the internal and
external validity of the results.
external validity of the results.
• The most common form of estimating external
The most common form of estimating external
validity is creation of a holdout or validation
validity is creation of a holdout or validation
sample and calculating the hit ratio.
sample and calculating the hit ratio.
• A second approach is cross-validation,
A second approach is cross-validation,
typically achieved with a jackknife or “leave-
typically achieved with a jackknife or “leave-
one-out” process of calculating the hit ratio.
one-out” process of calculating the hit ratio.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-21
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
6-22
Variable Description Variable Type
Data Warehouse Classification Variables
X1 Customer Type nonmetric
X2 Industry Type nonmetric
X3 Firm Size nonmetric
X4 Region nonmetric
X5 Distribution System nonmetric
Performance Perceptions Variables
X6 Product Quality metric
X7 E-Commerce Activities/Website metric
X8 Technical Support metric
X9 Complaint Resolution metric
X10 Advertising metric
X11 Product Line metric
X12 Salesforce Image metric
X13 Competitive Pricing metric
X14 Warranty & Claims metric
X15 New Products metric
X16 Ordering & Billing metric
X17 Price Flexibility metric
X18 Delivery Speed metric
Outcome/Relationship Measures
X19 Satisfaction metric
X20 Likelihood of Recommendation metric
X21 Likelihood of Future Purchase metric
X22 Current Purchase/Usage Level metric
X23 Consider Strategic Alliance/Partnership in Future nonmetric
Description of HBAT Primary Database Variables
Description of HBAT Primary Database Variables

More Related Content

Similar to RegressionwithABinaryDependentVariables.ppt (20)

PPTX
Logistic Regression in machine learning ppt
raminder12_kaur
 
PPTX
Logistic regression with SPSS
LNIPE
 
PDF
Logistic regression sage
Pakistan Gum Industries Pvt. Ltd
 
PPTX
basics of Logistic-regression power point presentation
DharmishthaChaudhari
 
PPT
Logistic regression and analysis using statistical information
AsadJaved304231
 
PPTX
Logistic-regression.pptx
sherinjoyson
 
PDF
Regression-Logistic-4.pdf
jiregnaetichadako
 
PDF
Logistic regression
Rupak Roy
 
PPT
M8.logreg.ppt
SuaibDanish
 
PPT
M8.logreg.ppt
TanyaWadhwani4
 
PPTX
Logistic Regression.pptx
Muskaan194530
 
PPTX
LOGISTIC_REGRESSION for AI and ML Beginners
DebdattaBhattacharya1
 
PPTX
Logistical Regression.pptx
Ramakrishna Reddy Bijjam
 
PPTX
Logistics Regression Using Python.pptx
SharmilaMore5
 
PPT
Estatística aplicada a saúde: regressão logística
CleberCarmo5
 
PPTX
Logistic regression with SPSS examples
Gaurav Kamboj
 
PDF
Log reg pdf.pdf
DevarapalliVamsi1
 
PPTX
Logistic regression
DrZahid Khan
 
PPTX
conditional probablity in logistic regression
mikaelgirum
 
PDF
3ml.pdf
MianAdnan27
 
Logistic Regression in machine learning ppt
raminder12_kaur
 
Logistic regression with SPSS
LNIPE
 
Logistic regression sage
Pakistan Gum Industries Pvt. Ltd
 
basics of Logistic-regression power point presentation
DharmishthaChaudhari
 
Logistic regression and analysis using statistical information
AsadJaved304231
 
Logistic-regression.pptx
sherinjoyson
 
Regression-Logistic-4.pdf
jiregnaetichadako
 
Logistic regression
Rupak Roy
 
M8.logreg.ppt
SuaibDanish
 
M8.logreg.ppt
TanyaWadhwani4
 
Logistic Regression.pptx
Muskaan194530
 
LOGISTIC_REGRESSION for AI and ML Beginners
DebdattaBhattacharya1
 
Logistical Regression.pptx
Ramakrishna Reddy Bijjam
 
Logistics Regression Using Python.pptx
SharmilaMore5
 
Estatística aplicada a saúde: regressão logística
CleberCarmo5
 
Logistic regression with SPSS examples
Gaurav Kamboj
 
Log reg pdf.pdf
DevarapalliVamsi1
 
Logistic regression
DrZahid Khan
 
conditional probablity in logistic regression
mikaelgirum
 
3ml.pdf
MianAdnan27
 

More from ssuser69ff25 (10)

PPT
Course25465NeuralNetworkDeepLearning.ppt
ssuser69ff25
 
PPT
logisticregressionJeffWitnerMarch2016.ppt
ssuser69ff25
 
PPT
LogisticRegressionDichotomousResponse.ppt
ssuser69ff25
 
PPTX
BinaryLogisticRegressionwithaDepend.pptx
ssuser69ff25
 
PPT
Deep-Learning-2017-Lecture3FullyConnected.ppt
ssuser69ff25
 
PPT
DataStructureLists Stacks Queues and more
ssuser69ff25
 
PPTX
Long-term real-time network traffic flow prediction using LSTM recurrent neur...
ssuser69ff25
 
PPT
Deep-Learning-2017-Lecture3FullyConnected.ppt
ssuser69ff25
 
PPT
dataStructure Course about lists stacks queues and more
ssuser69ff25
 
PPTX
Long-term real-time network traffic flow prediction using LSTM recurrent neur...
ssuser69ff25
 
Course25465NeuralNetworkDeepLearning.ppt
ssuser69ff25
 
logisticregressionJeffWitnerMarch2016.ppt
ssuser69ff25
 
LogisticRegressionDichotomousResponse.ppt
ssuser69ff25
 
BinaryLogisticRegressionwithaDepend.pptx
ssuser69ff25
 
Deep-Learning-2017-Lecture3FullyConnected.ppt
ssuser69ff25
 
DataStructureLists Stacks Queues and more
ssuser69ff25
 
Long-term real-time network traffic flow prediction using LSTM recurrent neur...
ssuser69ff25
 
Deep-Learning-2017-Lecture3FullyConnected.ppt
ssuser69ff25
 
dataStructure Course about lists stacks queues and more
ssuser69ff25
 
Long-term real-time network traffic flow prediction using LSTM recurrent neur...
ssuser69ff25
 
Ad

Recently uploaded (20)

PDF
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
PDF
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
PPTX
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
PPTX
Structural Functiona theory this important for the theorist
cagumaydanny26
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PDF
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PDF
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
PPTX
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
PPTX
drones for disaster prevention response.pptx
NawrasShatnawi1
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
BioSensors glucose monitoring, cholestrol
nabeehasahar1
 
PRIZ Academy - Change Flow Thinking Master Change with Confidence.pdf
PRIZ Guru
 
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
Structural Functiona theory this important for the theorist
cagumaydanny26
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
drones for disaster prevention response.pptx
NawrasShatnawi1
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Ad

RegressionwithABinaryDependentVariables.ppt

  • 1. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-1 Chapter 6 Chapter 6 Logistic Regression: Regression with a Logistic Regression: Regression with a Binary Dependent Variable Binary Dependent Variable
  • 2. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-2 LEARNING OBJECTIVES LEARNING OBJECTIVES Upon completing this chapter, you should be able to Upon completing this chapter, you should be able to do the following: do the following: • State the circumstances under which logistic State the circumstances under which logistic regression should be used instead of multiple regression should be used instead of multiple regression. regression. • Identify the types of dependent and independent Identify the types of dependent and independent variables used in the application of logistic variables used in the application of logistic regression. regression. • Describe the method used to transform binary Describe the method used to transform binary measures into the likelihood and probability measures into the likelihood and probability measures used in logistic regression. measures used in logistic regression. Chapter 6 Chapter 6 Logistic Regression: Regression with a Logistic Regression: Regression with a Binary Dependent Variable Binary Dependent Variable
  • 3. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-3 LEARNING OBJECTIVES continued . . . LEARNING OBJECTIVES continued . . . Upon completing this chapter, you should be able to Upon completing this chapter, you should be able to do the following: do the following: • Interpret the results of a logistic regression Interpret the results of a logistic regression analysis and assessing predictive accuracy, with analysis and assessing predictive accuracy, with comparisons to both multiple regression and comparisons to both multiple regression and discriminant analysis. discriminant analysis. • Understand the strengths and weaknesses of Understand the strengths and weaknesses of logistic regression compared to discriminant logistic regression compared to discriminant analysis and multiple regression. analysis and multiple regression. Chapter 6 Chapter 6 Logistic Regression: Regression with a Logistic Regression: Regression with a Binary Dependent Variable Binary Dependent Variable
  • 4. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-4 Logistic Regression . . . is a specialized Logistic Regression . . . is a specialized form of regression that is designed to predict form of regression that is designed to predict and explain a binary (two-group) categorical and explain a binary (two-group) categorical variable rather than a metric dependent variable rather than a metric dependent measure. Its variate is similar to regular measure. Its variate is similar to regular regression and made up of metric regression and made up of metric independent variables. It is less affected than independent variables. It is less affected than discriminant analysis when the basic discriminant analysis when the basic assumptions, particularly normality of the assumptions, particularly normality of the independent variables, are not met. independent variables, are not met. Logistic Regression Defined Logistic Regression Defined
  • 5. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-5 Logistic Regression May Be Preferred . . . Logistic Regression May Be Preferred . . . When the dependent variable has only two groups, logistic regression may be preferred for two reasons: • Discriminant analysis assumes multivariate normality and equal variance-covariance matrices across groups, and these assumptions are often not met. Logistic regression does not face these strict assumptions and is much more robust when these assumptions are not met, making its application appropriate in many situations. • Even if the assumptions are met, some researchers prefer logistic regression because it is similar to multiple regression. It has straightforward statistical tests, similar approaches to incorporating metric and nonmetric variables and nonlinear effects, and a wide range of diagnostics.
  • 6. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-6 Multiple Regression Decision Process Multiple Regression Decision Process Stage 1: Objectives of Logistic Regression Stage 1: Objectives of Logistic Regression Stage 2: Research Design for Logistic Regression Stage 2: Research Design for Logistic Regression Stage 3: Assumptions of Logistic Regression Stage 3: Assumptions of Logistic Regression Stage 4: Estimation of the Logistic Regression Model Stage 4: Estimation of the Logistic Regression Model and Assessing Overall Fit and Assessing Overall Fit Stage 5: Interpretation of the Results Stage 5: Interpretation of the Results Stage 6: Validation of the Results Stage 6: Validation of the Results
  • 7. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-7 Logistic regression is best suited to address Logistic regression is best suited to address two research objectives . . . two research objectives . . . • Identifying the independent variables that Identifying the independent variables that impact group membership in the dependent impact group membership in the dependent variable. variable. • Establishing a classification system based on Establishing a classification system based on the logistic model for determining group the logistic model for determining group membership. membership. Stage 1: Objectives of Logistic Regression Stage 1: Objectives of Logistic Regression
  • 8. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-8 Stage 2: Research Design for Stage 2: Research Design for Logistic Regression Logistic Regression • The binary nature of the dependent variable (0 – 1) The binary nature of the dependent variable (0 – 1) means the error term has a binomial distribution means the error term has a binomial distribution instead of a normal distribution, and it thus invalidates instead of a normal distribution, and it thus invalidates all testing based on the assumption of normality. all testing based on the assumption of normality. • The variance of the dichotomous variable is not The variance of the dichotomous variable is not constant, creating instances of heteroscedasticity as constant, creating instances of heteroscedasticity as well. well. • Neither of the above violations can be remedied Neither of the above violations can be remedied through transformations of the dependent or through transformations of the dependent or independent variables. Logistic regression was independent variables. Logistic regression was developed to specifically deal with these issues. developed to specifically deal with these issues.
  • 9. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-9 Stage 3: Assumptions of Stage 3: Assumptions of Logistic Regression Logistic Regression • The advantages of logistic regression are The advantages of logistic regression are primarily the result of the general lack of primarily the result of the general lack of assumptions. assumptions. • Logistic regression does not require any specific Logistic regression does not require any specific distributional form for the independent variables. distributional form for the independent variables. • Heteroscedasticity of the independent variables is Heteroscedasticity of the independent variables is not required. not required. • Linear relationships between the dependent and Linear relationships between the dependent and independent variables are not required. independent variables are not required.
  • 10. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-10 Stage 4: Estimation of Logistic Regression Stage 4: Estimation of Logistic Regression Model and Assessing Overall Fit Model and Assessing Overall Fit • Transforming the dependent variable Transforming the dependent variable • Estimating the coefficients Estimating the coefficients • Transforming a probability into odds and Transforming a probability into odds and logit values logit values • Model estimation Model estimation • Assessing the goodness of fit Assessing the goodness of fit
  • 11. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-11 Estimating the Coefficients Estimating the Coefficients Two basic steps . . . Two basic steps . . . 1. 1. Transforming a probability into odds and logit values Transforming a probability into odds and logit values 2. 2. Model estimation using a maximum likelihood Model estimation using a maximum likelihood approach, not least squares as in multiple approach, not least squares as in multiple regression regression • The estimation process maximizes the likelihood The estimation process maximizes the likelihood that an event will occur – the event being a that an event will occur – the event being a respondent is assigned to one group versus respondent is assigned to one group versus another another
  • 12. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-12 Transforming a Probability into Transforming a Probability into Odds and Logit Values Odds and Logit Values o The logistic transformation has two basic steps: The logistic transformation has two basic steps: Restating a probability as odds, and Restating a probability as odds, and Calculating the logit values. Calculating the logit values. o Instead of using ordinary least squares to Instead of using ordinary least squares to estimate the model, the maximum likelihood estimate the model, the maximum likelihood method is used. method is used. o The basic measure of how well the maximum The basic measure of how well the maximum likelihood estimation procedure fits is the likelihood estimation procedure fits is the likelihood value. likelihood value.
  • 13. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-13 Model Estimation Fit – Between Model Model Estimation Fit – Between Model comparisons . . . comparisons . . . Comparisons of the likelihood values follow three Comparisons of the likelihood values follow three steps: steps: 1. 1. Estimate a Null Model – which acts as the Estimate a Null Model – which acts as the “baseline” for making comparisons of improvement “baseline” for making comparisons of improvement in model fit. in model fit. 2. 2. Estimate Proposed Model – the model containing Estimate Proposed Model – the model containing the independent variables to be included in the the independent variables to be included in the logistic regression. logistic regression. 3. 3. Assess – 2LL Difference. Assess – 2LL Difference.
  • 14. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-14 Comparison to Multiple Regression . . . Comparison to Multiple Regression . . . Correspondence of Primary Elements of Model Fit Correspondence of Primary Elements of Model Fit Multiple Regression Logistic Regression Total Sum of Squares -2LL of Base Model Error Sum of Squares -2LL of Proposed Model Regression Sum of Squares Difference of -LL for Base and Proposed Models F test of model fit Chi-square Test of - 2LL Difference Coefficient of determination “Pseudo” R2 measures
  • 15. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-15 Stage 5: Interpretation of the Results Stage 5: Interpretation of the Results • Testing for significance of the coefficients – Testing for significance of the coefficients – based on the Wald statistic based on the Wald statistic • Interpreting the coefficients Interpreting the coefficients • Directionality of the relationship Directionality of the relationship • Magnitude of the relationship of metric Magnitude of the relationship of metric independent variables independent variables • Interpreting nonmetric independent variables Interpreting nonmetric independent variables
  • 16. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-16 Directionality of the Relationship Directionality of the Relationship A positive relationship means an increase in the A positive relationship means an increase in the independent variable is associated with an increase in the independent variable is associated with an increase in the predicted probability, and vice versa. But the direction of predicted probability, and vice versa. But the direction of the relationship is reflected differently for the original and the relationship is reflected differently for the original and exponentiated logistic coefficients. exponentiated logistic coefficients. • Original coefficient signs indicate the direction of the Original coefficient signs indicate the direction of the relationship. relationship. • Exponentiated coefficients are interpreted differently Exponentiated coefficients are interpreted differently since they are the logarithms of the original coefficients since they are the logarithms of the original coefficients and do not have negative values. Thus, exponentiated and do not have negative values. Thus, exponentiated coefficients above 1.0 represent a positive relationship coefficients above 1.0 represent a positive relationship and values less than 1.0 represent negative and values less than 1.0 represent negative relationships relationships. .
  • 17. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-17 Magnitude of the Relationship . . . Magnitude of the Relationship . . . The magnitude of metric independent The magnitude of metric independent variables is interpreted differently for original and variables is interpreted differently for original and exponentiated logistic coefficients: exponentiated logistic coefficients: • Original logistic coefficients Original logistic coefficients – are less useful in – are less useful in determining the magnitude of the relationship since determining the magnitude of the relationship since the reflect the change in the logit (logged odds) the reflect the change in the logit (logged odds) value. value. • Exponentiated coefficients Exponentiated coefficients – directly reflect the – directly reflect the magnitude of the change in the odds value. But their magnitude of the change in the odds value. But their impact is multiplicative and a coefficient of 1.0 impact is multiplicative and a coefficient of 1.0 denotes no change (1.0 times the independent denotes no change (1.0 times the independent variable = no change). variable = no change).
  • 18. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-18 Rules of Thumb 6–1 Rules of Thumb 6–1 Logistic Regression Logistic Regression • Logistic regression is the preferred method for two- Logistic regression is the preferred method for two- group (binary) dependent variables due to its group (binary) dependent variables due to its robustness, ease of interpretation and diagnostics. robustness, ease of interpretation and diagnostics. • Sample size considerations for logistic regression are Sample size considerations for logistic regression are primarily focused on the size of each group, which primarily focused on the size of each group, which should have 10 times the number of estimated model should have 10 times the number of estimated model coefficients (the number of variables). coefficients (the number of variables). • Sample size should be met in both the analysis and Sample size should be met in both the analysis and holdout samples. holdout samples. • Model significance tests are made with a chi-square Model significance tests are made with a chi-square test on the differences in the log likelihood values (- test on the differences in the log likelihood values (- 2LL) between two models. 2LL) between two models.
  • 19. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-19 Rules of Thumb 6–1 continued . . . Rules of Thumb 6–1 continued . . . Logistic Regression Logistic Regression • Coefficients are expressed in two forms: original and Coefficients are expressed in two forms: original and exponentiated to assist in interpretation. exponentiated to assist in interpretation. • Interpretation of the coefficients for direction and Interpretation of the coefficients for direction and magnitude is: magnitude is: Direction can be directly assessed in the original Direction can be directly assessed in the original coefficients (positive or negative signs) or indirectly in coefficients (positive or negative signs) or indirectly in the exponentiated coefficients (less than 1 are the exponentiated coefficients (less than 1 are negative, greater than 1 are positive). negative, greater than 1 are positive). Magnitude is best assessed by the exponentiated Magnitude is best assessed by the exponentiated coefficient, with the percentage change in the coefficient, with the percentage change in the dependent variable shown by: Percentage change = dependent variable shown by: Percentage change = (Exponentiated Coefficient – 1.0) * 100 (Exponentiated Coefficient – 1.0) * 100
  • 20. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-20 Stage 6: Validation of the Results Stage 6: Validation of the Results • Involves ensuring both the internal and Involves ensuring both the internal and external validity of the results. external validity of the results. • The most common form of estimating external The most common form of estimating external validity is creation of a holdout or validation validity is creation of a holdout or validation sample and calculating the hit ratio. sample and calculating the hit ratio. • A second approach is cross-validation, A second approach is cross-validation, typically achieved with a jackknife or “leave- typically achieved with a jackknife or “leave- one-out” process of calculating the hit ratio. one-out” process of calculating the hit ratio.
  • 21. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-21
  • 22. Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 6-22 Variable Description Variable Type Data Warehouse Classification Variables X1 Customer Type nonmetric X2 Industry Type nonmetric X3 Firm Size nonmetric X4 Region nonmetric X5 Distribution System nonmetric Performance Perceptions Variables X6 Product Quality metric X7 E-Commerce Activities/Website metric X8 Technical Support metric X9 Complaint Resolution metric X10 Advertising metric X11 Product Line metric X12 Salesforce Image metric X13 Competitive Pricing metric X14 Warranty & Claims metric X15 New Products metric X16 Ordering & Billing metric X17 Price Flexibility metric X18 Delivery Speed metric Outcome/Relationship Measures X19 Satisfaction metric X20 Likelihood of Recommendation metric X21 Likelihood of Future Purchase metric X22 Current Purchase/Usage Level metric X23 Consider Strategic Alliance/Partnership in Future nonmetric Description of HBAT Primary Database Variables Description of HBAT Primary Database Variables