SlideShare a Scribd company logo
BUSINESS ANALYTICS FOUNDATION WITH R
TOOLS
Lesson 4 - Predictive Modeling Techniques
Part 2
Copyright 2016,Beamsync, All rights reserved.
• A measure of goodness of fit - How well your model does fit the data?
COEFFICIENT OF DETERMINATION R2 :
R2 = 0 , no linear relationship
R2 = -1 , negative linear relationship
R2 = +1 , positive linear relationship
Copyright 2016,Beamsync, All rights reserved.
• Based on R2 value , we can explain how well the model explains the data and the percentage of
differences that are explained by this model.
• The differences between observations that are not explained by the model is the error term or
residual .
• Suppose we have a case in which R2 value is 0.74. This means that 74% of variance in the values of
the dependent variable is explained by the model and the remaining 26 % which is not explained is
its residual or error term.
HOW GOOD IS THE MODEL ?
Copyright 2016,Beamsync, All rights reserved.
HOW TO FIND LINEAR REGRESSION EQUATION
SUBJECT AGE (X) GLUCOSE LEVEL (Y) XY X2 Y2
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Y = a + bX => 65.14 + 0.38x
Copyright 2016,Beamsync, All rights reserved.
• It’s a statistical method that is used in analyzing datasets where one or more independent variables
would determine the outcome.
• In this type of regression the dependent variables are binary, data been coded as 1 for TRUE and 0
for FALSE (dichotomouscharacteristics).
• The goal of logistic regression is to find the best fitting model to describe the relationship between
the dichotomous characteristic and a set of independent variables.
• Logistic regression generates the coefficients of a formula to predict a logit transformation of the
probability of presence of the characteristic of interest:
logit (p) = β0 + β1 x1 + β2 x2 +β3 x3 + βn xn
where, p is the probability of presence of the characteristic of interest.
• The logit transformation is defined as the logged
odds: odds = (p / 1-p) and logit(p) = ln(p / 1-p)
LOGISTIC REGRESSION
Copyright 2016,Beamsync, All rights reserved.
METHOD TO DEVELOP A LOGISTIC MODEL
Observation-performance
windows
Data preparation, data treatment,
data hygiene
Derived variables identification
Fine and coarse classing
Logistic modeling and diagnostics
Data
Logistic
Regression
Model
Copyright 2016,Beamsync, All rights reserved.
• Linear regression is mainly used to establish a relationship between dependent and independent
variable. It helps in estimating the impact of independent variable over a dependent variable.
• Example – using a linear regression, the relationship between temperature (T) and ice cream sales
(I) is found to be
I = 2T + 4000
• This equation says that for every 1 degree raise in temperature , there is a demand of 4002 ice
creams.
• Logistic regression helps in finding out the probability of an event and this event is captured in
binary format i.e. 0 or1.
• Example – In order to know whether customers will buy a product or not, run a Logistic Regression
on the data. The dependent variable would be a binary variable .
• In terms of graphical representation, Linear Regression gives a linear line as an output, once the
values are plotted on the graph. Whereas, the logistic regression gives an S-shaped line
LINEAR REGRESSION VS LOGISTIC REGRESSION
Copyright 2016,Beamsync, All rights reserved.
CLUSTER ANALYSIS
Intra-cluster
distance is
minimized
• It groups the data objects based on the information that is found in the data that describes the
objects in other groups.
• The goal of this procedure is that the objects in a group are similar to one another and are different
from the objects in other groups.
• The greater the similarity within a group and greater the difference between the groups, more
distinct is the clustering.
• Cluster Analysis provides a way for users to discover potential relationships and construct
systematic structures in large numbers of variables and observations Inter-cluster
distance is
maximized
Copyright 2016,Beamsync, All rights reserved.
Thank You
Beamsync is providing business analytics training in Bangalore along with
certification. If you are looking your career into analytics schedule you’re
training here: http://beamsync.com/business-analytics-training-bangalore/
Copyright 2016,Beamsync, All rights reserved.

More Related Content

What's hot (20)

PPTX
Matlab Data And Statistics
DataminingTools Inc
 
PPTX
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Alexey Kovyazin
 
ODP
Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Alexey Kovyazin
 
PPT
Fernandos Statistics
teresa_soto
 
PPT
Emilie Rousselin Stastistics
Emilie Rousselin
 
PPT
Sales Force Alignment
Ricky Bilakhia
 
PPT
5 6 Scatter Plots & Best Fit Lines
Bitsy Griffin
 
PPTX
Pca(principal components analysis)
kalung0313
 
PPS
Scatter Plot
Nishant Narendra
 
PPTX
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
Prashant Srivastav
 
PDF
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT
 
PPTX
Missing Data and Causes
akanni azeez olamide
 
PDF
A Comparative Study for Anomaly Detection in Data Mining
IRJET Journal
 
TXT
Logistic regression
Ayushi Gupta
 
PPTX
Exploring Data
DataminingTools Inc
 
PDF
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
FAO
 
PDF
PCA (Principal component analysis)
Learnbay Datascience
 
PPTX
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
PDF
Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks
Lucidworks
 
PDF
Machine learning meetup
QuantUniversity
 
Matlab Data And Statistics
DataminingTools Inc
 
Firebird: cost-based optimization and statistics, by Dmitry Yemanov (in English)
Alexey Kovyazin
 
Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Alexey Kovyazin
 
Fernandos Statistics
teresa_soto
 
Emilie Rousselin Stastistics
Emilie Rousselin
 
Sales Force Alignment
Ricky Bilakhia
 
5 6 Scatter Plots & Best Fit Lines
Bitsy Griffin
 
Pca(principal components analysis)
kalung0313
 
Scatter Plot
Nishant Narendra
 
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
Prashant Srivastav
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT
 
Missing Data and Causes
akanni azeez olamide
 
A Comparative Study for Anomaly Detection in Data Mining
IRJET Journal
 
Logistic regression
Ayushi Gupta
 
Exploring Data
DataminingTools Inc
 
The RuLIS approach to outliers (Marcello D'Orazio,FAO)
FAO
 
PCA (Principal component analysis)
Learnbay Datascience
 
Data Science Meetup: DGLARS and Homotopy LASSO for Regression Models
Colleen Farrelly
 
Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks
Lucidworks
 
Machine learning meetup
QuantUniversity
 

Viewers also liked (14)

PDF
COMPARING PROGRAMMER PRODUCTIVITY IN OPENACC AND CUDA: AN EMPIRICAL INVESTIGA...
IJCSEA Journal
 
PPTX
Power de ldya
naty_fer_94
 
PPT
Handroanthus heptaphyllus - Ipê Rosa
Viviane Lauck
 
PDF
LUNG CANCER TREATMENT: THE SURGEONS ROLE AND PERSPECTIVE
flasco_org
 
PPT
Metodo cientifco
misprimerosaprendizajes
 
PDF
Наиболее интересные технологические нововведения IBM i
Aliaksei Hlinski
 
PDF
Plant a child
Prateek Gupta
 
PPTX
Original
flopitox27
 
PDF
JW_Gov Innovation Process Master Class_inV2
jabirwalji
 
PPTX
Antimicrobial Stewardship in Oncology Care
flasco_org
 
PPT
Excel parte 2
Daniel Casas
 
DOCX
hieu ro hon ve bitcoin. tai sao bitcoin khong phai la “tien ao”
Vuong Bitcoin
 
PDF
Com
Heinrich79
 
PDF
MA-Overview-Brochure
Alysson Taisson
 
COMPARING PROGRAMMER PRODUCTIVITY IN OPENACC AND CUDA: AN EMPIRICAL INVESTIGA...
IJCSEA Journal
 
Power de ldya
naty_fer_94
 
Handroanthus heptaphyllus - Ipê Rosa
Viviane Lauck
 
LUNG CANCER TREATMENT: THE SURGEONS ROLE AND PERSPECTIVE
flasco_org
 
Metodo cientifco
misprimerosaprendizajes
 
Наиболее интересные технологические нововведения IBM i
Aliaksei Hlinski
 
Plant a child
Prateek Gupta
 
Original
flopitox27
 
JW_Gov Innovation Process Master Class_inV2
jabirwalji
 
Antimicrobial Stewardship in Oncology Care
flasco_org
 
Excel parte 2
Daniel Casas
 
hieu ro hon ve bitcoin. tai sao bitcoin khong phai la “tien ao”
Vuong Bitcoin
 
MA-Overview-Brochure
Alysson Taisson
 
Ad

Similar to Business Analytics Foundation with R tools - Part 2 (20)

PDF
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
PDF
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
PDF
Telecom customer churn prediction
Saleesh Satheeshchandran
 
PPTX
Forecasting Using the Predictive Analytics
PRPrasad1
 
PPTX
Supervised Machine Learning Algorithms
engrfarhanhanif
 
PPTX
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
PPTX
Predicting Employee Attrition
Shruti Mohan
 
PPTX
Logistical Regression.pptx
Ramakrishna Reddy Bijjam
 
PPT
Summer 07-mfin7011-tang1922
stone55
 
PPT
Recommender system
Bhumi Patel
 
PDF
2018 p 2019-ee-a2
uetian12
 
PPTX
Reuqired ppt for machine learning algirthms and part
SiddheshMhatre27
 
PDF
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
IJDKP
 
PDF
Poster
fan yang
 
PDF
cannonicalpresentation-110505114327-phpapp01.pdf
JermaeDizon2
 
PPTX
Research methodology Regression Modeling.pptx
keshavkumar403723
 
PPTX
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
PPTX
Logistic Regression.pptx
Muskaan194530
 
PPT
Logistic regression and analysis using statistical information
AsadJaved304231
 
IT-601 Lecture Notes-UNIT-2.pdf Data Analysis
Dr. Radhey Shyam
 
KIT-601 Lecture Notes-UNIT-2.pdf
Dr. Radhey Shyam
 
Telecom customer churn prediction
Saleesh Satheeshchandran
 
Forecasting Using the Predictive Analytics
PRPrasad1
 
Supervised Machine Learning Algorithms
engrfarhanhanif
 
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
Predicting Employee Attrition
Shruti Mohan
 
Logistical Regression.pptx
Ramakrishna Reddy Bijjam
 
Summer 07-mfin7011-tang1922
stone55
 
Recommender system
Bhumi Patel
 
2018 p 2019-ee-a2
uetian12
 
Reuqired ppt for machine learning algirthms and part
SiddheshMhatre27
 
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
IJDKP
 
Poster
fan yang
 
cannonicalpresentation-110505114327-phpapp01.pdf
JermaeDizon2
 
Research methodology Regression Modeling.pptx
keshavkumar403723
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Madhav Mishra
 
Logistic Regression.pptx
Muskaan194530
 
Logistic regression and analysis using statistical information
AsadJaved304231
 
Ad

More from Beamsync (6)

PPTX
Business Analytics Foundation with R tool - Part 5
Beamsync
 
PPTX
Basic Analytic Techniques - Using R Tool - Part 1
Beamsync
 
PPTX
Introduction to Business Analytics Course Part 10
Beamsync
 
PPTX
Introduction to Business Analytics Course Part 9
Beamsync
 
PPTX
Introduction to Business Analytics Course Part 7
Beamsync
 
PPTX
Introduction to Business Analytics Part 1
Beamsync
 
Business Analytics Foundation with R tool - Part 5
Beamsync
 
Basic Analytic Techniques - Using R Tool - Part 1
Beamsync
 
Introduction to Business Analytics Course Part 10
Beamsync
 
Introduction to Business Analytics Course Part 9
Beamsync
 
Introduction to Business Analytics Course Part 7
Beamsync
 
Introduction to Business Analytics Part 1
Beamsync
 

Recently uploaded (20)

PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PDF
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
PDF
community health nursing question paper 2.pdf
Prince kumar
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PDF
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
The-Ever-Evolving-World-of-Science (1).pdf/7TH CLASS CURIOSITY /1ST CHAPTER/B...
Sandeep Swamy
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
community health nursing question paper 2.pdf
Prince kumar
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
LAW OF CONTRACT ( 5 YEAR LLB & UNITARY LLB)- MODULE-3 - LEARN THROUGH PICTURE
APARNA T SHAIL KUMAR
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 

Business Analytics Foundation with R tools - Part 2

  • 1. BUSINESS ANALYTICS FOUNDATION WITH R TOOLS Lesson 4 - Predictive Modeling Techniques Part 2 Copyright 2016,Beamsync, All rights reserved.
  • 2. • A measure of goodness of fit - How well your model does fit the data? COEFFICIENT OF DETERMINATION R2 : R2 = 0 , no linear relationship R2 = -1 , negative linear relationship R2 = +1 , positive linear relationship Copyright 2016,Beamsync, All rights reserved.
  • 3. • Based on R2 value , we can explain how well the model explains the data and the percentage of differences that are explained by this model. • The differences between observations that are not explained by the model is the error term or residual . • Suppose we have a case in which R2 value is 0.74. This means that 74% of variance in the values of the dependent variable is explained by the model and the remaining 26 % which is not explained is its residual or error term. HOW GOOD IS THE MODEL ? Copyright 2016,Beamsync, All rights reserved.
  • 4. HOW TO FIND LINEAR REGRESSION EQUATION SUBJECT AGE (X) GLUCOSE LEVEL (Y) XY X2 Y2 1 43 99 4257 1849 9801 2 21 65 1365 441 4225 3 25 79 1975 625 6241 4 42 75 3150 1764 5625 5 57 87 4959 3249 7569 6 59 81 4779 3481 6561 Σ 247 486 20485 11409 40022 Y = a + bX => 65.14 + 0.38x Copyright 2016,Beamsync, All rights reserved.
  • 5. • It’s a statistical method that is used in analyzing datasets where one or more independent variables would determine the outcome. • In this type of regression the dependent variables are binary, data been coded as 1 for TRUE and 0 for FALSE (dichotomouscharacteristics). • The goal of logistic regression is to find the best fitting model to describe the relationship between the dichotomous characteristic and a set of independent variables. • Logistic regression generates the coefficients of a formula to predict a logit transformation of the probability of presence of the characteristic of interest: logit (p) = β0 + β1 x1 + β2 x2 +β3 x3 + βn xn where, p is the probability of presence of the characteristic of interest. • The logit transformation is defined as the logged odds: odds = (p / 1-p) and logit(p) = ln(p / 1-p) LOGISTIC REGRESSION Copyright 2016,Beamsync, All rights reserved.
  • 6. METHOD TO DEVELOP A LOGISTIC MODEL Observation-performance windows Data preparation, data treatment, data hygiene Derived variables identification Fine and coarse classing Logistic modeling and diagnostics Data Logistic Regression Model Copyright 2016,Beamsync, All rights reserved.
  • 7. • Linear regression is mainly used to establish a relationship between dependent and independent variable. It helps in estimating the impact of independent variable over a dependent variable. • Example – using a linear regression, the relationship between temperature (T) and ice cream sales (I) is found to be I = 2T + 4000 • This equation says that for every 1 degree raise in temperature , there is a demand of 4002 ice creams. • Logistic regression helps in finding out the probability of an event and this event is captured in binary format i.e. 0 or1. • Example – In order to know whether customers will buy a product or not, run a Logistic Regression on the data. The dependent variable would be a binary variable . • In terms of graphical representation, Linear Regression gives a linear line as an output, once the values are plotted on the graph. Whereas, the logistic regression gives an S-shaped line LINEAR REGRESSION VS LOGISTIC REGRESSION Copyright 2016,Beamsync, All rights reserved.
  • 8. CLUSTER ANALYSIS Intra-cluster distance is minimized • It groups the data objects based on the information that is found in the data that describes the objects in other groups. • The goal of this procedure is that the objects in a group are similar to one another and are different from the objects in other groups. • The greater the similarity within a group and greater the difference between the groups, more distinct is the clustering. • Cluster Analysis provides a way for users to discover potential relationships and construct systematic structures in large numbers of variables and observations Inter-cluster distance is maximized Copyright 2016,Beamsync, All rights reserved.
  • 9. Thank You Beamsync is providing business analytics training in Bangalore along with certification. If you are looking your career into analytics schedule you’re training here: http://beamsync.com/business-analytics-training-bangalore/ Copyright 2016,Beamsync, All rights reserved.