SlideShare a Scribd company logo
Datasets using R-Studio
Usha Rani Singh
1
Datasets for cars
Dataset is a collection of related information which is useful to
analyze data and derive the outputs
The dataset contains information in various forms, and it isn't
straightforward for the analyzer to extract the data and present
it to the business
2
Preparing Dataset for cars
Preparing and analyzing the dataset is very important for any
threat information, which helps to provide accurate data
We have to consider the data which provide more value or
relevant for the problem
Categorize the data into regression, classification, clustering,
and ranking
It is difficult to establish data collection mechanism and data is
scattered into various forms and departments
We have to make consistency in the data
Data sample has been reduced, and at the same time it should
consist of the required information
Preparing Dataset for cars
We have to clean the data so that the processing time will be
faster and accurate
Complex datasets have to be decomposed into multiple parts
Data normalization has to be performed to improve the quality
of the data
R Studio for dataset
Pie Diagram for data set
ggplot_1 for dataset
ggplot_2 for data set
ggplot_3 for dataset
Dataset used
Thank you!
11
Microsoft
Excel Worksheet
Microsoft
Excel Worksheet
Week 10 – Analysing Data sets in RapidMiner
The data sets used for this weeks analysis relates to the CSRIC
best practices:
The CSRIC Best Practices Search Tool allows you to search
CSRIC's collection of Best Practices using a variety of criteria
including Network Type, Industry Role, Keywords, Priority
Levels, and BP Number. The Communications Security,
Reliability and Interoperability Council's (CSRIC) mission is to
provide recommendations to the FCC to ensure, among other
things, optimal security and reliability of communications
systems, including telecommunications, media, and public
safety. CSRIC’s members focus on a range of public safety and
homeland security-related communications matters, including:
(1) the reliability and security of communications systems and
infrastructure, particularly mobile systems; (2) 911, Enhanced
911 (E911), and Next Generation 911 (NG911); and (3)
emergency alerting.
The CSRIC's recommendations will address the prevention and
remediation of detrimental cyber events, the development of
best practices to improve overall communications reliability,
the availability and performance of communications services
and emergency alerting during natural disasters, terrorist
attacks, cyber security attacks or other events that result in
exceptional strain on the communications infrastructure, the
rapid restoration of communications services in the event of
widespread or major disruptions and the steps communications
providers can take to help secure end-users and servers.
I have used RapidMiner to analyze the data set :
The statistical view of various names, types and attributes
related to the data set.
Visualization of public safety vs prioritization
Overall prioritization pie chart
Bar graph comparing various network types and internet/data
usage
customer-segmentation-data set.zip
Mall_Customers.csv
CustomerID,Gender,Age,Annual Income (k$),Spending Score
(1-100)
1,Male,19,15,39
2,Male,21,15,81
3,Female,20,16,6
4,Female,23,16,77
5,Female,31,17,40
6,Female,22,17,76
7,Female,35,18,6
8,Female,23,18,94
9,Male,64,19,3
10,Female,30,19,72
11,Male,67,19,14
12,Female,35,19,99
13,Female,58,20,15
14,Female,24,20,77
15,Male,37,20,13
16,Male,22,20,79
17,Female,35,21,35
18,Male,20,21,66
19,Male,52,23,29
20,Female,35,23,98
21,Male,35,24,35
22,Male,25,24,73
23,Female,46,25,5
24,Male,31,25,73
25,Female,54,28,14
26,Male,29,28,82
27,Female,45,28,32
28,Male,35,28,61
29,Female,40,29,31
30,Female,23,29,87
31,Male,60,30,4
32,Female,21,30,73
33,Male,53,33,4
34,Male,18,33,92
35,Female,49,33,14
36,Female,21,33,81
37,Female,42,34,17
38,Female,30,34,73
39,Female,36,37,26
40,Female,20,37,75
41,Female,65,38,35
42,Male,24,38,92
43,Male,48,39,36
44,Female,31,39,61
45,Female,49,39,28
46,Female,24,39,65
47,Female,50,40,55
48,Female,27,40,47
49,Female,29,40,42
50,Female,31,40,42
51,Female,49,42,52
52,Male,33,42,60
53,Female,31,43,54
54,Male,59,43,60
55,Female,50,43,45
56,Male,47,43,41
57,Female,51,44,50
58,Male,69,44,46
59,Female,27,46,51
60,Male,53,46,46
61,Male,70,46,56
62,Male,19,46,55
63,Female,67,47,52
64,Female,54,47,59
65,Male,63,48,51
66,Male,18,48,59
67,Female,43,48,50
68,Female,68,48,48
69,Male,19,48,59
70,Female,32,48,47
71,Male,70,49,55
72,Female,47,49,42
73,Female,60,50,49
74,Female,60,50,56
75,Male,59,54,47
76,Male,26,54,54
77,Female,45,54,53
78,Male,40,54,48
79,Female,23,54,52
80,Female,49,54,42
81,Male,57,54,51
82,Male,38,54,55
83,Male,67,54,41
84,Female,46,54,44
85,Female,21,54,57
86,Male,48,54,46
87,Female,55,57,58
88,Female,22,57,55
89,Female,34,58,60
90,Female,50,58,46
91,Female,68,59,55
92,Male,18,59,41
93,Male,48,60,49
94,Female,40,60,40
95,Female,32,60,42
96,Male,24,60,52
97,Female,47,60,47
98,Female,27,60,50
99,Male,48,61,42
100,Male,20,61,49
101,Female,23,62,41
102,Female,49,62,48
103,Male,67,62,59
104,Male,26,62,55
105,Male,49,62,56
106,Female,21,62,42
107,Female,66,63,50
108,Male,54,63,46
109,Male,68,63,43
110,Male,66,63,48
111,Male,65,63,52
112,Female,19,63,54
113,Female,38,64,42
114,Male,19,64,46
115,Female,18,65,48
116,Female,19,65,50
117,Female,63,65,43
118,Female,49,65,59
119,Female,51,67,43
120,Female,50,67,57
121,Male,27,67,56
122,Female,38,67,40
123,Female,40,69,58
124,Male,39,69,91
125,Female,23,70,29
126,Female,31,70,77
127,Male,43,71,35
128,Male,40,71,95
129,Male,59,71,11
130,Male,38,71,75
131,Male,47,71,9
132,Male,39,71,75
133,Female,25,72,34
134,Female,31,72,71
135,Male,20,73,5
136,Female,29,73,88
137,Female,44,73,7
138,Male,32,73,73
139,Male,19,74,10
140,Female,35,74,72
141,Female,57,75,5
142,Male,32,75,93
143,Female,28,76,40
144,Female,32,76,87
145,Male,25,77,12
146,Male,28,77,97
147,Male,48,77,36
148,Female,32,77,74
149,Female,34,78,22
150,Male,34,78,90
151,Male,43,78,17
152,Male,39,78,88
153,Female,44,78,20
154,Female,38,78,76
155,Female,47,78,16
156,Female,27,78,89
157,Male,37,78,1
158,Female,30,78,78
159,Male,34,78,1
160,Female,30,78,73
161,Female,56,79,35
162,Female,29,79,83
163,Male,19,81,5
164,Female,31,81,93
165,Male,50,85,26
166,Female,36,85,75
167,Male,42,86,20
168,Female,33,86,95
169,Female,36,87,27
170,Male,32,87,63
171,Male,40,87,13
172,Male,28,87,75
173,Male,36,87,10
174,Male,36,87,92
175,Female,52,88,13
176,Female,30,88,86
177,Male,58,88,15
178,Male,27,88,69
179,Male,59,93,14
180,Male,35,93,90
181,Female,37,97,32
182,Female,32,97,86
183,Male,46,98,15
184,Female,29,98,88
185,Female,41,99,39
186,Male,30,99,97
187,Female,54,101,24
188,Male,28,101,68
189,Female,41,103,17
190,Female,36,103,85
191,Female,34,103,23
192,Female,32,103,69
193,Male,33,113,8
194,Female,38,113,91
195,Female,47,120,16
196,Female,35,120,79
197,Female,45,126,28
198,Male,32,126,74
199,Male,32,137,18
200,Male,30,137,83
Mall Customer Segmentation Data Analysis.pptx
Mall Customer Segment Data Analysis using RFM
Vivek Ijjagiri
Agenda
2
Introduction
Mall Customer Segmentation data
Mall Customer Segment analysis data using RFM
Problem Solving
Clustering
Conclusion
References
Introduction
When we want to increase the sales we need to do planning for
marketing spend, or while formulating a new promotion, as a
retail marketer we have to be more careful about how we
segment and target the customers. It would be a waste of time
and money if, for example, we launch an ad campaign that is
central to a lot of customers. Such untargeted marketing and
advertising is not likely to have a high conversion fee and may
additionally even hurt our company value.
Retailers now use sophisticated strategies to section their
customers and goal their marketing efforts to these segments.
RFM analysis is one such famous patron segmentation technique
that can assist shops to maximize the return on their advertising
investments.
Why RFM.?
Improving customer segmentation marketing and widely used
for surveys.
Superior and simplistic compared to other methods.(CHAID and
logistic regression)
Focuses on transaction information and delivering better
marketing to customers.
What is RFM?
R => Recency
F => Frequency
M=> Monetary
How are we using the RFM and target customers?
Simple we score the customers based on the RFM from high to
low.
Greater the score there’s likely more chance to buy a product or
take a new offer or promotion.
It’ll help us identify customers that are most likely to respond
to a new offer or promotion.
Identifying the most valuable RFM segments can capitalize on
chance relationships in the data used for this analysis.
Mall Customer Segment analysis data using RFM
7
Recency: Recency is most important predictor of customers who
did the purchases recently. Customers who have purchased
recently a product are more likely to purchase again from your
store/mall compared to those who did not purchase recently.
Frequency: The second most important factor is how frequently
these customers purchase from you. The higher the frequency,
the higher of chances of them purchasing the products again.
Monetary: The third factor is the amount of money these
customers have spent on purchases. Customers who have spent
higher are more likely to purchase based on their recent
purchase compared to those who have spent less.
How are we going to calculate RFM?
To implement the RFM analysis, we need to further process the
data set in by the following steps:
Find the most recent date for each ID and calculate the days to
the now or some other date, to get the Recency data
Calculate the quantity of translations of a customer, to get the
Frequency data
Sum the amount of money a customer spent and divide it by
Frequency, to get the amount per transaction on average, that is
the Monetary data.
8
Problem Solving
Make sure we have the following libraries to procced with the
data analysis, if the libraries not found in your R Studio install
those packages.
library(data.table)
library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)
library(rmarkdown)
9
Load and examine data
> Mall_Customers<- fread('data.csv’)
> glimpse(Mall_Customers)
Ijjagiri, Vivek (IV) - This is like a transposed version of print:
columns run down the page, and data runs across. This makes it
possible to see every column in a data frame. It's a little like str
applied to a data frame but it tries to show you as much data as
possible. (And it always shows the underlying data, even when
applied to a remote data source.)
View Data
14
Data Cleanup
Or
WRangle
15
> Mall_Customers<- Mall_Customers%>%
mutate(Quantity = replace(Quantity, Quantity<=0, NA),
UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
> Mall_Customers<- Mall_Customers%>%
drop_na()
Recode Variables
> df_data <- df_data %>%
mutate(InvoiceNo=as.factor(InvoiceNo),
StockCode=as.factor(StockCode),
InvoiceDate=as.Date(InvoiceDate, '%m/%d/%Y
%H:%M'), CustomerID=as.factor(CustomerID),
Country=as.factor(Country))
> df_data <- df_data %>%
mutate(total_dolar = Quantity*UnitPrice)
> glimpse(df_data) | summary(df_data)
16
Calculate RFM
> df_RFM <- df_data %>%
group_by(CustomerID) %>%
summarise(recency=as.numeric(as.Date("2012-01-01")-
max(InvoiceDate)),
frequency=n_distinct(InvoiceNo), monitery=
sum(total_dolar)/n_distinct(InvoiceNo))
> summary(df_RFM)
17
Calculate RFM
> kable(head(df_RFM))
18
K-means clustering is one of the simplest and popular
unsupervised machine learning algorithms.
The objective of K-means is simple: group similar data points
together and discover underlying patterns.
To achieve this objective, K-means looks for a fixed number (k)
of clusters in a dataset.”
A cluster refers to a collection of data points aggregated
together because of certain similarities.
In other words, the K-means algorithm identifies k number of
centroids, and then allocates every data point to the nearest
cluster, while keeping the centroids as small as possible.
K Means Clustering Algorithm
1.Specify number of clusters K.
2.Initialize centroids by first shuffling the dataset and then
randomly selecting K data points for the centroids without
replacement.
3.Keep iterating until there is no change to the centroids. i.e
assignment of data points to clusters isn’t changing.
K Means clustering algorithm
Recency
Recency – How recently did the customer purchase?
> Customer_Purchase_Recency <- df_RFM$recency
> hist(Customer_Purchase_Recency, main = 'Recency')
20
Frequency
Frequency – How often do they purchase?
> Customer_Purchase_Frequency <- df_RFM$frequency
> hist(Customer_Purchase_Frequency, main = ‘Frequency')
21
Monetary
Monetary Value – How much do they spend?
> Customer_Purchase_Monitery <- df_RFM$monitery
> hist(Customer_Purchase_Monitery, main = ‘Monetary’,
breaks=50 )
22
Monetary Log
Because the data is skewed, we use log scale to normalize
> MoniteryLog <- log(df_RFM$monitery)
> hist(MoniteryLog, main ='MoniteryLog')
23
Ijjagiri, Vivek (IV) -
https://www.rdocumentation.org/packages/amap/versions/0.8-
17/topics/hcluster
Ijjagiri, Vivek (IV) - This function is a mix of function hclust
and function dist. hcluster(x, method = "euclidean",link =
"complete") = hclust(dist(x, method = "euclidean"),method =
"complete")) It use twice less memory, as it doesn't store
distance matrix.
For more details, see documentation of hclust and Dist.
Clustering
> DataFrame_Clustering <- df_RFM
> DataFrame_CustomerID <-
DataFrame_Clustering$CustomerID
> row.names(DataFrame_Clustering) <- DataFrame_CustomerID
> DataFrame_CustomerID <- NULL
> DataFrame_Clustering <- scale(DataFrame_Clustering)
> summary(DataFrame_Clustering )
24
Clustering
> d <- dist(DataFrame_Clustering)
> c <- hclust(d, method = 'ward.D2’)
> Plot(c)
25
Ijjagiri, Vivek (IV) - A dendrogram is a diagram that shows the
hierarchical relationship between objects. It is most commonly
created as an output from hierarchical clustering. The main use
of a dendrogram is to work out the best way to allocate objects
to clusters. The dendrogram below shows the hierarchical
clustering of six observations shown to on the scatterplot to the
left. (Dendrogram is often miswritten as dendogram.)
Plotting with less data
26
Plotting with less data
27
Plotting with less data
28
Conclusion
Customer segmentation process can be performed using various
clustering algorithms.
We focused on k-means clustering in R.
The algorithm is quite simple to implement. However,
representing data in the correct format and interpreting results
is the difficult part.
RFM Analysis can segment customers, design offers,
promotions specific to audience and produce products based on
customer profile and interests.
References
Shubhankar Rawat (May 2019), Mall Customers Segmentation
— Using Machine Learning retrieved from
https://towardsdatascience.com/mall-customers-segmentation-
using-machine-learning-274ddf5575d5
What is market segmentation, Different types explained
retrieved from https://www.qualtrics.com/experience-
management/brand/what-is-market-segmentation/
Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000).
Constrained k-means clustering (Technical Report MSR-TR-
2000-65). Microsoft Research, Redmond, WA.
K means clustering, AlindGupta retrieved from
https://www.geeksforgeeks.org/k-means-clustering-introduction/
Thank you
Any Questions
.MsftOfcThm_Accent1_Fill {
fill:#4472C4;
}
.MsftOfcThm_Accent1_Stroke {
stroke:#4472C4;
}
RcodeProject.R
##########################################
# section 3.3 Statistical Methods for Evaluation
##########################################
##########################################
# section 3.3.1 Hypothesis Testing
##########################################
# generate random observations from the two populations
x <- rnorm(10, mean=100, sd=5) # normal distribution centered
at 100
y <- rnorm(20, mean=105, sd=5) # normal distribution centered
at 105
# Student's t-test
t.test(x, y, var.equal=TRUE) # run the Student's t-test
# obtain t value for a two-sided test at a 0.05 significance level
qt(p=0.05/2, df=28, lower.tail= FALSE)
# Welch's t-test
t.test(x, y, var.equal=FALSE) # run the Welch's t-test
# Wilcoxon Rank-Sum Test
wilcox.test(x, y, conf.int = TRUE)
##########################################
# section 3.3.6 ANOVA
##########################################
offers <- sample(c("offer1", "offer2", "nopromo"), size=500,
replace=T)
# Simulated 500 observations of purchase sizes on the 3 offer
options
purchasesize <- ifelse(offers=="offer1", rnorm(500, mean=80,
sd=30),
ifelse(offers=="offer2", rnorm(500, mean=85,
sd=30),
rnorm(500, mean=40, sd=30)))
# create a data frame of offer option and purchase size
offertest <- data.frame(offer=as.factor(offers),
purchase_amt=purchasesize)
# display a summary of offertest where offer="offer1"
summary(offertest[offertest$offer=="offer1",])
# display a summary of offertest where offer="offer2"
summary(offertest[offertest$offer=="offer2",])
# display a summary of offertest where offer="nopromo"
summary(offertest[offertest$offer=="nopromo",])
# fit ANOVA test
model <- aov(purchase_amt ~ offers, data=offertest)
summary(model)
# Tukey's Honest Significant Difference (HSD) on all
# pair-wise tests for difference of means
TukeyHSD(model)
Lesson 2
1-1© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Chapter 4
4-1
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
“It is a set of beliefs that one party holds
about the other and how these beliefs are
formed from the interactions of […]
individuals as they engage in tasks
associated with an IT service” (Day 2007)
4-2
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-3
It is a multifaceted interaction of people
and processes.
It is complex. Different expectations and
accountabilities may lead to lack of trust.
It tends to cluster into patterns (e.g., IT is
a necessary evil; IT is a support but not a
partner; business and IT are partners).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-4
IT has to keep proving itself.
The business is often disengaged from IT
work.
Business expectations of IT change
continually.
Business assumptions of IT tend to cluster.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-5
The relationship is affected by the
interaction of many people and
processes at multiple levels.
Clarity is often lacking around
expectations and accountabilities.
There are many “disconnects”
between the two groups.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-6
Trust
Credibility
Competence
Value
Interpersonal Interaction
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-7
Expertise – the ability to support a technical
recommendation and have up-to-date knowledge.
Financial awareness – the ability to
identify the value of IT in terms of ROI
and total cost of ownership.
Execution – the ability to understand
the business, develop a vision and
operationalize strategies.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-8
Find ways to develop business knowledge in
all IT staff.
Link IT’s success criteria to business metrics.
Make business value an explicit criteria in all
IT decisions.
Ensure effective execution in all IT activities.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-9
Credibility is the belief that others can be
counted on to do what they say they will do.
It is built by:
Keeping agreements.
Acting with integrity, honesty and openness.
Being responsive (e.g., delivering on time
and under budget).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
10
Communicate frequently and explicitly.
Pay attention to the “little things”.
Utilize external cues to credibility.
Assess all business touch points.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
11
Professionalism - can be developed by five
sets of attitudes and behaviors:
on the job)
good organization.
job well)
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
12
Nontechnical communication
The ability to translate and interpret needs,
not only from business to technology and
vice versa, but also between business units.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
13
Social sk ills
The ability to build mutual understanding, to
enable all parties to get comfortable with one
another and to uncover hidden assumptions.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
14
Management of politics and conflict
The ability to understand the role of politics
and how they can affect the IT work (i.e.,
addressing conflict and use it to deliver
creative solutions).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
15
Expect professionalism.
Promote a wide variety of social interactions
at all levels.
Develop “soft skills” in IT staff.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
16
The most important way to build trust is through
an effective governance:
Integrating planning, defined accountabilities,
and clarity of roles and responsibilities are key
aspects of an effective governance.
An effective governance addresses the business’
expectations of its IT function.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
17
Design governance for clarity and
transparency.
Mandate the relationship.
Design IT for business expectations.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-
18
Business-IT relationships are complex, with
interactions of many types, at many levels,
and between both individuals and across
functional and organizational entities.
Four majors components are needed to
build a strong business-IT relationship:
competence, credibility, interpersonal skills,
and trust.
Chapter 5
5-1
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Communication is a key social element of
the organizational alignment between IT
and business.
One of the most important skills IT staff
needs to develop is how to communicate
effectively with businesses.
5-2
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Good communication is essential for:
trust and partnerships between
the business and IT
perceptions of IT
of the business
5-3
© 2015 Pearson Education, Inc. Publishing as Prentice Hall
Principle 1: The effectiveness of communication
is measured by its outcomes.
Principle 2: Communication is social behavior.
Principle 3: Shared knowledge improves
communication.
Principle 4: Mature organizations have better
communication.
5-4
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-5
Communication should be measure by its
outcomes rather than our intentions.
Communication can get distorted through
filters such as politics, culture, and
personal points of view.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-6
Communication not only transmits ideas;
it also negotiates relationships.
How you say what you mean is just as
important as what you say.
IT staff and managers need to become
aware of the power of different linguistic
styles in communication situations.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-7
The more IT staff
learns about the
business, the better
communication
becomes.
Shared knowledge is
the beginning of the
“virtuous circle”.
Shared Knowledge
Increased
Communication
Mutual Understanding
and “Common Sense”
Implementation
Success
THE VIRTUOUS
COMMUNICATION CYCLE
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-8
Strong organizational practices support and
reinforce good interpersonal communication.
Mature IT organizations embed appropriate
communication at the operational and
strategic level.
“You can’t be a partner unless
you’re a mature IT organization”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-9
The changing nature of IT work:
IT work has become more complex over
time. Multiple cultures, different political
contexts, various times zones, and virtual
contacts make communication more
challenging.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
10
Hiring practices:
IT skills are changing to become more
consultative and collaborative, rather
than focused exclusively on technology.
“IT organizations can no longer support smart,
super-talented but socially disruptive people”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
11
IT and business organization
structures:
IT staff is expected to play a “knowledge
broker” role, not only between IT and
business but also between business units.
Thus, business silos can make this
communication challenging.
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
12
Nature and frequency of
communication:
Formal interactions improve communication,
but communication should not exclusively
occur in formal interactions (e.g., through IT
governance).
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
13
Attitude:
Many IT staff are motivated by the desire
to be right rather than the desire to
communicate effectively.
“We definitely need a ‘we’ attitude in IT,
rather than ‘us-them’ attitude”
© 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-
14
Translation: A four-step process
Business
Impact of
Technology
Issues
Business
Technology
Issues
IT
Solution
s
Business
Datasets using R-StudioUsha Rani Singh.docx

More Related Content

PDF
Using R for customer segmentation
Kumar P
 
PPTX
Moduel 2 _KPMG.pptx
JehanzebXheikh
 
PDF
IRJET- Credit Profile of E-Commerce Customer
IRJET Journal
 
PDF
Customer segmentation with RFM models and demographic variable using DBSCAN a...
TELKOMNIKA JOURNAL
 
PPTX
Customer Segmentation Course 21102024(1).pptx
shivalikba25
 
PPTX
Cdac -Project Presentation [Autosaved].pptx
anushriasati
 
PPTX
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
Boston Institute of Analytics
 
PDF
Coin Loyalty CRM
Sajosh Joy
 
Using R for customer segmentation
Kumar P
 
Moduel 2 _KPMG.pptx
JehanzebXheikh
 
IRJET- Credit Profile of E-Commerce Customer
IRJET Journal
 
Customer segmentation with RFM models and demographic variable using DBSCAN a...
TELKOMNIKA JOURNAL
 
Customer Segmentation Course 21102024(1).pptx
shivalikba25
 
Cdac -Project Presentation [Autosaved].pptx
anushriasati
 
"Ecommerce Customer Segmentation & Prediction: Enhancing Business Strategies ...
Boston Institute of Analytics
 
Coin Loyalty CRM
Sajosh Joy
 

Similar to Datasets using R-StudioUsha Rani Singh.docx (20)

PDF
A Study on Customer Segmentation using Recency Frequency and Monetary Analysi...
ijtsrd
 
PPTX
[Big] Data For Marketers: Targeting the Right Market
Panji Winata
 
PPTX
E-Commerce Customer Segmentation and Behavior Prediction: A Data-Driven Strategy
Boston Institute of Analytics
 
PPTX
E-commerce Customer Segmentation: Unlocking Consumer Insights
Boston Institute of Analytics
 
PPTX
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
Boston Institute of Analytics
 
PPTX
Improving Organizational Decision Making Using a SAF-T based Business Intelli...
BrunoOliveira631137
 
PPT
Data Visions Big Data Visual Analytics Tool
Double Check ĆŐNSULTING
 
PPTX
Data mining for the online retail industry
ATUL SHARMA
 
PDF
Business Segmentation
beckerdave
 
PPTX
An Introduction to RFM in Analytics
SAS Canada
 
PPT
RFM.ppt
OmvirGautam
 
PDF
Fam1 Big Data + Visualization
Seth Familian
 
PPTX
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
PDF
Shrinking big data for real time marketing strategy - A statistical Report
Manidipa Banerjee
 
PPTX
RFMAnalysisShort.pptx
AyushSrivastava8761
 
PPTX
Marketing & Retail Analytics-PPT-Milestone-1-Shor.pptx
AshishKumar797592
 
PPTX
Customer analytics
Karl Melo
 
PDF
Introduction to sas
Zienab Allam
 
PPSX
Data Refinement: The missing link between data collection and decisions
Vivastream
 
A Study on Customer Segmentation using Recency Frequency and Monetary Analysi...
ijtsrd
 
[Big] Data For Marketers: Targeting the Right Market
Panji Winata
 
E-Commerce Customer Segmentation and Behavior Prediction: A Data-Driven Strategy
Boston Institute of Analytics
 
E-commerce Customer Segmentation: Unlocking Consumer Insights
Boston Institute of Analytics
 
E-Commerce Customer Segmentation and Prediction: Unlocking Insights for Smart...
Boston Institute of Analytics
 
Improving Organizational Decision Making Using a SAF-T based Business Intelli...
BrunoOliveira631137
 
Data Visions Big Data Visual Analytics Tool
Double Check ĆŐNSULTING
 
Data mining for the online retail industry
ATUL SHARMA
 
Business Segmentation
beckerdave
 
An Introduction to RFM in Analytics
SAS Canada
 
RFM.ppt
OmvirGautam
 
Fam1 Big Data + Visualization
Seth Familian
 
Customer Segmentation: Seeing the Trees and the Forest Simultaneously
Sione Palu
 
Shrinking big data for real time marketing strategy - A statistical Report
Manidipa Banerjee
 
RFMAnalysisShort.pptx
AyushSrivastava8761
 
Marketing & Retail Analytics-PPT-Milestone-1-Shor.pptx
AshishKumar797592
 
Customer analytics
Karl Melo
 
Introduction to sas
Zienab Allam
 
Data Refinement: The missing link between data collection and decisions
Vivastream
 
Ad

More from edwardmarivel (20)

DOCX
deadline 6 hours 7.3 y 7.47.4.docx
edwardmarivel
 
DOCX
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
edwardmarivel
 
DOCX
DEADLINE 15 HOURS6 PAGES UNDERGRADUATECOURSEWORKHARV.docx
edwardmarivel
 
DOCX
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
edwardmarivel
 
DOCX
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
edwardmarivel
 
DOCX
DDL 24 hours reading the article and writing a 1-page doubl.docx
edwardmarivel
 
DOCX
DCF valuation methodSuper-normal growth modelApplicatio.docx
edwardmarivel
 
DOCX
ddr-.docx
edwardmarivel
 
DOCX
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
edwardmarivel
 
DOCX
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
edwardmarivel
 
DOCX
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
edwardmarivel
 
DOCX
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
edwardmarivel
 
DOCX
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
edwardmarivel
 
DOCX
DB1 What Ive observedHave you ever experienced a self-managed .docx
edwardmarivel
 
DOCX
DB Response 1I agree with the decision to search the house. Ther.docx
edwardmarivel
 
DOCX
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
edwardmarivel
 
DOCX
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
edwardmarivel
 
DOCX
DB Instructions Each reply must be 250–300 words with a minim.docx
edwardmarivel
 
DOCX
DB Defining White Collar CrimeHow would you define white co.docx
edwardmarivel
 
DOCX
DAVID H. ROSENBLOOMSECOND EDITIONAdministrative Law .docx
edwardmarivel
 
deadline 6 hours 7.3 y 7.47.4.docx
edwardmarivel
 
Deadline 6 PM Friday September 27, 201310 Project Management Que.docx
edwardmarivel
 
DEADLINE 15 HOURS6 PAGES UNDERGRADUATECOURSEWORKHARV.docx
edwardmarivel
 
De nada.El gusto es mío.Encantada.Me llamo Pepe.Muy bien, grac.docx
edwardmarivel
 
DDBA 8307 Week 4 Assignment TemplateJohn DoeDDBA 8.docx
edwardmarivel
 
DDL 24 hours reading the article and writing a 1-page doubl.docx
edwardmarivel
 
DCF valuation methodSuper-normal growth modelApplicatio.docx
edwardmarivel
 
ddr-.docx
edwardmarivel
 
DDBA 8307 Week 2 Assignment ExemplarJohn Doe[footnoteRef1] .docx
edwardmarivel
 
DBM380 v14Create a DatabaseDBM380 v14Page 2 of 2Create a D.docx
edwardmarivel
 
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
edwardmarivel
 
DB3.1 Mexico corruptionDiscuss the connection between pol.docx
edwardmarivel
 
DB2Pepsi Co and Coke American beverage giants, must adhere to th.docx
edwardmarivel
 
DB1 What Ive observedHave you ever experienced a self-managed .docx
edwardmarivel
 
DB Response 1I agree with the decision to search the house. Ther.docx
edwardmarivel
 
DB Response prompt ZAKChapter 7, Q1.Customers are expecting.docx
edwardmarivel
 
DB Topic of Discussion Information-related CapabilitiesAnalyze .docx
edwardmarivel
 
DB Instructions Each reply must be 250–300 words with a minim.docx
edwardmarivel
 
DB Defining White Collar CrimeHow would you define white co.docx
edwardmarivel
 
DAVID H. ROSENBLOOMSECOND EDITIONAdministrative Law .docx
edwardmarivel
 
Ad

Recently uploaded (20)

PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 

Datasets using R-StudioUsha Rani Singh.docx

  • 1. Datasets using R-Studio Usha Rani Singh 1 Datasets for cars Dataset is a collection of related information which is useful to analyze data and derive the outputs The dataset contains information in various forms, and it isn't straightforward for the analyzer to extract the data and present it to the business 2 Preparing Dataset for cars
  • 2. Preparing and analyzing the dataset is very important for any threat information, which helps to provide accurate data We have to consider the data which provide more value or relevant for the problem Categorize the data into regression, classification, clustering, and ranking It is difficult to establish data collection mechanism and data is scattered into various forms and departments We have to make consistency in the data Data sample has been reduced, and at the same time it should consist of the required information Preparing Dataset for cars We have to clean the data so that the processing time will be faster and accurate Complex datasets have to be decomposed into multiple parts Data normalization has to be performed to improve the quality of the data R Studio for dataset
  • 3. Pie Diagram for data set ggplot_1 for dataset ggplot_2 for data set ggplot_3 for dataset
  • 5. Microsoft Excel Worksheet Microsoft Excel Worksheet Week 10 – Analysing Data sets in RapidMiner The data sets used for this weeks analysis relates to the CSRIC best practices: The CSRIC Best Practices Search Tool allows you to search CSRIC's collection of Best Practices using a variety of criteria including Network Type, Industry Role, Keywords, Priority Levels, and BP Number. The Communications Security, Reliability and Interoperability Council's (CSRIC) mission is to provide recommendations to the FCC to ensure, among other things, optimal security and reliability of communications systems, including telecommunications, media, and public safety. CSRIC’s members focus on a range of public safety and homeland security-related communications matters, including: (1) the reliability and security of communications systems and infrastructure, particularly mobile systems; (2) 911, Enhanced 911 (E911), and Next Generation 911 (NG911); and (3) emergency alerting.
  • 6. The CSRIC's recommendations will address the prevention and remediation of detrimental cyber events, the development of best practices to improve overall communications reliability, the availability and performance of communications services and emergency alerting during natural disasters, terrorist attacks, cyber security attacks or other events that result in exceptional strain on the communications infrastructure, the rapid restoration of communications services in the event of widespread or major disruptions and the steps communications providers can take to help secure end-users and servers. I have used RapidMiner to analyze the data set : The statistical view of various names, types and attributes related to the data set. Visualization of public safety vs prioritization
  • 7. Overall prioritization pie chart Bar graph comparing various network types and internet/data usage customer-segmentation-data set.zip Mall_Customers.csv CustomerID,Gender,Age,Annual Income (k$),Spending Score (1-100) 1,Male,19,15,39 2,Male,21,15,81 3,Female,20,16,6 4,Female,23,16,77 5,Female,31,17,40 6,Female,22,17,76 7,Female,35,18,6
  • 19. Mall Customer Segment Data Analysis using RFM Vivek Ijjagiri Agenda 2 Introduction Mall Customer Segmentation data Mall Customer Segment analysis data using RFM Problem Solving
  • 20. Clustering Conclusion References Introduction When we want to increase the sales we need to do planning for marketing spend, or while formulating a new promotion, as a retail marketer we have to be more careful about how we segment and target the customers. It would be a waste of time and money if, for example, we launch an ad campaign that is central to a lot of customers. Such untargeted marketing and advertising is not likely to have a high conversion fee and may additionally even hurt our company value. Retailers now use sophisticated strategies to section their customers and goal their marketing efforts to these segments. RFM analysis is one such famous patron segmentation technique that can assist shops to maximize the return on their advertising investments. Why RFM.? Improving customer segmentation marketing and widely used for surveys. Superior and simplistic compared to other methods.(CHAID and
  • 21. logistic regression) Focuses on transaction information and delivering better marketing to customers. What is RFM? R => Recency F => Frequency M=> Monetary How are we using the RFM and target customers? Simple we score the customers based on the RFM from high to low. Greater the score there’s likely more chance to buy a product or take a new offer or promotion. It’ll help us identify customers that are most likely to respond to a new offer or promotion. Identifying the most valuable RFM segments can capitalize on chance relationships in the data used for this analysis.
  • 22. Mall Customer Segment analysis data using RFM 7 Recency: Recency is most important predictor of customers who did the purchases recently. Customers who have purchased recently a product are more likely to purchase again from your store/mall compared to those who did not purchase recently. Frequency: The second most important factor is how frequently these customers purchase from you. The higher the frequency, the higher of chances of them purchasing the products again. Monetary: The third factor is the amount of money these customers have spent on purchases. Customers who have spent higher are more likely to purchase based on their recent purchase compared to those who have spent less. How are we going to calculate RFM? To implement the RFM analysis, we need to further process the data set in by the following steps: Find the most recent date for each ID and calculate the days to the now or some other date, to get the Recency data Calculate the quantity of translations of a customer, to get the Frequency data Sum the amount of money a customer spent and divide it by
  • 23. Frequency, to get the amount per transaction on average, that is the Monetary data. 8 Problem Solving Make sure we have the following libraries to procced with the data analysis, if the libraries not found in your R Studio install those packages. library(data.table) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(rmarkdown) 9 Load and examine data > Mall_Customers<- fread('data.csv’) > glimpse(Mall_Customers)
  • 24. Ijjagiri, Vivek (IV) - This is like a transposed version of print: columns run down the page, and data runs across. This makes it possible to see every column in a data frame. It's a little like str applied to a data frame but it tries to show you as much data as possible. (And it always shows the underlying data, even when applied to a remote data source.) View Data 14 Data Cleanup Or WRangle 15 > Mall_Customers<- Mall_Customers%>% mutate(Quantity = replace(Quantity, Quantity<=0, NA), UnitPrice = replace(UnitPrice, UnitPrice<=0, NA))
  • 25. > Mall_Customers<- Mall_Customers%>% drop_na() Recode Variables > df_data <- df_data %>% mutate(InvoiceNo=as.factor(InvoiceNo), StockCode=as.factor(StockCode), InvoiceDate=as.Date(InvoiceDate, '%m/%d/%Y %H:%M'), CustomerID=as.factor(CustomerID), Country=as.factor(Country)) > df_data <- df_data %>% mutate(total_dolar = Quantity*UnitPrice) > glimpse(df_data) | summary(df_data) 16 Calculate RFM > df_RFM <- df_data %>% group_by(CustomerID) %>% summarise(recency=as.numeric(as.Date("2012-01-01")- max(InvoiceDate)), frequency=n_distinct(InvoiceNo), monitery= sum(total_dolar)/n_distinct(InvoiceNo))
  • 26. > summary(df_RFM) 17 Calculate RFM > kable(head(df_RFM)) 18 K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.” A cluster refers to a collection of data points aggregated together because of certain similarities. In other words, the K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible.
  • 27. K Means Clustering Algorithm 1.Specify number of clusters K. 2.Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement. 3.Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing. K Means clustering algorithm Recency Recency – How recently did the customer purchase? > Customer_Purchase_Recency <- df_RFM$recency > hist(Customer_Purchase_Recency, main = 'Recency') 20 Frequency Frequency – How often do they purchase? > Customer_Purchase_Frequency <- df_RFM$frequency > hist(Customer_Purchase_Frequency, main = ‘Frequency') 21
  • 28. Monetary Monetary Value – How much do they spend? > Customer_Purchase_Monitery <- df_RFM$monitery > hist(Customer_Purchase_Monitery, main = ‘Monetary’, breaks=50 ) 22 Monetary Log Because the data is skewed, we use log scale to normalize > MoniteryLog <- log(df_RFM$monitery) > hist(MoniteryLog, main ='MoniteryLog') 23 Ijjagiri, Vivek (IV) - https://www.rdocumentation.org/packages/amap/versions/0.8- 17/topics/hcluster Ijjagiri, Vivek (IV) - This function is a mix of function hclust and function dist. hcluster(x, method = "euclidean",link = "complete") = hclust(dist(x, method = "euclidean"),method = "complete")) It use twice less memory, as it doesn't store distance matrix. For more details, see documentation of hclust and Dist. Clustering > DataFrame_Clustering <- df_RFM
  • 29. > DataFrame_CustomerID <- DataFrame_Clustering$CustomerID > row.names(DataFrame_Clustering) <- DataFrame_CustomerID > DataFrame_CustomerID <- NULL > DataFrame_Clustering <- scale(DataFrame_Clustering) > summary(DataFrame_Clustering ) 24 Clustering > d <- dist(DataFrame_Clustering) > c <- hclust(d, method = 'ward.D2’) > Plot(c) 25 Ijjagiri, Vivek (IV) - A dendrogram is a diagram that shows the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering. The main use of a dendrogram is to work out the best way to allocate objects to clusters. The dendrogram below shows the hierarchical clustering of six observations shown to on the scatterplot to the left. (Dendrogram is often miswritten as dendogram.) Plotting with less data
  • 30. 26 Plotting with less data 27 Plotting with less data 28 Conclusion Customer segmentation process can be performed using various clustering algorithms. We focused on k-means clustering in R. The algorithm is quite simple to implement. However, representing data in the correct format and interpreting results is the difficult part.
  • 31. RFM Analysis can segment customers, design offers, promotions specific to audience and produce products based on customer profile and interests. References Shubhankar Rawat (May 2019), Mall Customers Segmentation — Using Machine Learning retrieved from https://towardsdatascience.com/mall-customers-segmentation- using-machine-learning-274ddf5575d5 What is market segmentation, Different types explained retrieved from https://www.qualtrics.com/experience- management/brand/what-is-market-segmentation/ Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000). Constrained k-means clustering (Technical Report MSR-TR- 2000-65). Microsoft Research, Redmond, WA. K means clustering, AlindGupta retrieved from https://www.geeksforgeeks.org/k-means-clustering-introduction/ Thank you Any Questions
  • 32. .MsftOfcThm_Accent1_Fill { fill:#4472C4; } .MsftOfcThm_Accent1_Stroke { stroke:#4472C4; } RcodeProject.R ########################################## # section 3.3 Statistical Methods for Evaluation ########################################## ########################################## # section 3.3.1 Hypothesis Testing ########################################## # generate random observations from the two populations x <- rnorm(10, mean=100, sd=5) # normal distribution centered at 100 y <- rnorm(20, mean=105, sd=5) # normal distribution centered at 105 # Student's t-test t.test(x, y, var.equal=TRUE) # run the Student's t-test # obtain t value for a two-sided test at a 0.05 significance level qt(p=0.05/2, df=28, lower.tail= FALSE) # Welch's t-test t.test(x, y, var.equal=FALSE) # run the Welch's t-test
  • 33. # Wilcoxon Rank-Sum Test wilcox.test(x, y, conf.int = TRUE) ########################################## # section 3.3.6 ANOVA ########################################## offers <- sample(c("offer1", "offer2", "nopromo"), size=500, replace=T) # Simulated 500 observations of purchase sizes on the 3 offer options purchasesize <- ifelse(offers=="offer1", rnorm(500, mean=80, sd=30), ifelse(offers=="offer2", rnorm(500, mean=85, sd=30), rnorm(500, mean=40, sd=30))) # create a data frame of offer option and purchase size offertest <- data.frame(offer=as.factor(offers), purchase_amt=purchasesize) # display a summary of offertest where offer="offer1" summary(offertest[offertest$offer=="offer1",]) # display a summary of offertest where offer="offer2" summary(offertest[offertest$offer=="offer2",]) # display a summary of offertest where offer="nopromo" summary(offertest[offertest$offer=="nopromo",]) # fit ANOVA test model <- aov(purchase_amt ~ offers, data=offertest) summary(model)
  • 34. # Tukey's Honest Significant Difference (HSD) on all # pair-wise tests for difference of means TukeyHSD(model) Lesson 2 1-1© 2015 Pearson Education, Inc. Publishing as Prentice Hall Chapter 4 4-1 © 2015 Pearson Education, Inc. Publishing as Prentice Hall © 2015 Pearson Education, Inc. Publishing as Prentice Hall “It is a set of beliefs that one party holds about the other and how these beliefs are formed from the interactions of […] individuals as they engage in tasks associated with an IT service” (Day 2007) 4-2 © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-3 It is a multifaceted interaction of people and processes.
  • 35. It is complex. Different expectations and accountabilities may lead to lack of trust. It tends to cluster into patterns (e.g., IT is a necessary evil; IT is a support but not a partner; business and IT are partners). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-4 IT has to keep proving itself. The business is often disengaged from IT work. Business expectations of IT change continually. Business assumptions of IT tend to cluster. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-5 The relationship is affected by the interaction of many people and processes at multiple levels. Clarity is often lacking around expectations and accountabilities. There are many “disconnects” between the two groups.
  • 36. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-6 Trust Credibility Competence Value Interpersonal Interaction © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-7 Expertise – the ability to support a technical recommendation and have up-to-date knowledge. Financial awareness – the ability to identify the value of IT in terms of ROI and total cost of ownership. Execution – the ability to understand the business, develop a vision and operationalize strategies. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-8 Find ways to develop business knowledge in all IT staff.
  • 37. Link IT’s success criteria to business metrics. Make business value an explicit criteria in all IT decisions. Ensure effective execution in all IT activities. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4-9 Credibility is the belief that others can be counted on to do what they say they will do. It is built by: Keeping agreements. Acting with integrity, honesty and openness. Being responsive (e.g., delivering on time and under budget). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 10 Communicate frequently and explicitly. Pay attention to the “little things”. Utilize external cues to credibility. Assess all business touch points.
  • 38. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 11 Professionalism - can be developed by five sets of attitudes and behaviors: on the job) good organization. job well) © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 12 Nontechnical communication The ability to translate and interpret needs, not only from business to technology and vice versa, but also between business units. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 13 Social sk ills The ability to build mutual understanding, to
  • 39. enable all parties to get comfortable with one another and to uncover hidden assumptions. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 14 Management of politics and conflict The ability to understand the role of politics and how they can affect the IT work (i.e., addressing conflict and use it to deliver creative solutions). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 15 Expect professionalism. Promote a wide variety of social interactions at all levels. Develop “soft skills” in IT staff. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 16 The most important way to build trust is through an effective governance: Integrating planning, defined accountabilities,
  • 40. and clarity of roles and responsibilities are key aspects of an effective governance. An effective governance addresses the business’ expectations of its IT function. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 17 Design governance for clarity and transparency. Mandate the relationship. Design IT for business expectations. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 4- 18 Business-IT relationships are complex, with interactions of many types, at many levels, and between both individuals and across functional and organizational entities. Four majors components are needed to build a strong business-IT relationship: competence, credibility, interpersonal skills, and trust. Chapter 5
  • 41. 5-1 © 2015 Pearson Education, Inc. Publishing as Prentice Hall © 2015 Pearson Education, Inc. Publishing as Prentice Hall Communication is a key social element of the organizational alignment between IT and business. One of the most important skills IT staff needs to develop is how to communicate effectively with businesses. 5-2 © 2015 Pearson Education, Inc. Publishing as Prentice Hall Good communication is essential for: trust and partnerships between the business and IT perceptions of IT of the business 5-3
  • 42. © 2015 Pearson Education, Inc. Publishing as Prentice Hall Principle 1: The effectiveness of communication is measured by its outcomes. Principle 2: Communication is social behavior. Principle 3: Shared knowledge improves communication. Principle 4: Mature organizations have better communication. 5-4 © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-5 Communication should be measure by its outcomes rather than our intentions. Communication can get distorted through filters such as politics, culture, and personal points of view. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-6 Communication not only transmits ideas; it also negotiates relationships. How you say what you mean is just as
  • 43. important as what you say. IT staff and managers need to become aware of the power of different linguistic styles in communication situations. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-7 The more IT staff learns about the business, the better communication becomes. Shared knowledge is the beginning of the “virtuous circle”. Shared Knowledge Increased Communication Mutual Understanding and “Common Sense” Implementation Success THE VIRTUOUS COMMUNICATION CYCLE
  • 44. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-8 Strong organizational practices support and reinforce good interpersonal communication. Mature IT organizations embed appropriate communication at the operational and strategic level. “You can’t be a partner unless you’re a mature IT organization” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5-9 The changing nature of IT work: IT work has become more complex over time. Multiple cultures, different political contexts, various times zones, and virtual contacts make communication more challenging. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 10 Hiring practices: IT skills are changing to become more consultative and collaborative, rather than focused exclusively on technology. “IT organizations can no longer support smart,
  • 45. super-talented but socially disruptive people” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 11 IT and business organization structures: IT staff is expected to play a “knowledge broker” role, not only between IT and business but also between business units. Thus, business silos can make this communication challenging. © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 12 Nature and frequency of communication: Formal interactions improve communication, but communication should not exclusively occur in formal interactions (e.g., through IT governance). © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 13 Attitude:
  • 46. Many IT staff are motivated by the desire to be right rather than the desire to communicate effectively. “We definitely need a ‘we’ attitude in IT, rather than ‘us-them’ attitude” © 2015 Pearson Education, Inc. Publishing as Prentice Hall 5- 14 Translation: A four-step process Business Impact of Technology Issues Business Technology Issues IT Solution s Business