SlideShare a Scribd company logo
Report : Web-Analytics Insights
Author : Abhishek Agrawal
Email-Id : akagrawa@ncsu.edu
Language : R [base, knitR, ggplot2]
Synopsis
In this document, a brief report is presented which provide the insights about the visitors behavior and activities on the
website. Now, before going deeper into any analytics, lets us discuss what is that we are trying to achieve. The answer is
clear, i.e. “the website visitor's behavior and activities trends”. But now the question is how these information can be retrieved
and more importantly how can it tell a story.
Well, for that we need a set of right questions that can capture our story. So, lets begin our story with a set of questions and
answers.
Note: This report is generated using KnitR package in R and the results are reproducible using the R packages. Code is
embedded in the same file.
Question 1 : How is data loaded and how it looks
like?
Data is obtained from an ecommerce vendor as a dummy data of some of their actual data streams and saved in .csv format
named “webdata.csv” . Wait!, do you want to see that.. Ok!, below is the code to load and peek into data.
rm(list=ls())
webData <- read.csv("webdata.csv", header = T) # Loading data
names(webData)
## [1] "day" "site" "new_customer"
## [4] "platform" "visits" "distinct_sessions"
## [7] "orders" "gross_sales" "bounces"
## [10] "add_to_cart" "product_page_views" "search_page_views"
head(webData,1)
## day site new_customer platform visits distinct_sessions orders
## 1 1/1/13 0:00 Acme 1 Android 24 16 14
## gross_sales bounces add_to_cart product_page_views search_page_views
## 1 1287 4 16 104 192
Well!, now you got an idea, that how the data looks likes and what are the attributes present in the data such as (day, site,
visits, order, gross_sales, page_views etc.). Now lets move on to the next question.
Question 2 : Does the data looks clean and if not,
any pre-processing or feature engineering is
required?
Yes, the data looks mostly clean but it does have some missing values and some features can be created that can give us
better insights later such as day, month etc.
So, in the first step lets see the summary statistics of the data:
summary(webData) # Summary statistics of the data
## day site new_customer platform
## 12/19/13 0:00: 86 Acme :7392 Min. :0.000 iOS :3435
## 11/29/13 0:00: 85 Botly : 804 1st Qu.:0.000 Android:3172
## 12/11/13 0:00: 85 Pinnacle:5725 Median :0.000 Windows:2399
## 12/7/13 0:00 : 85 Sortly :5532 Mean :0.448 MacOSX :2054
## 12/2/13 0:00 : 84 Tabular : 804 3rd Qu.:1.000 Linux :2036
## 12/5/13 0:00 : 84 Widgetry: 804 Max. :1.000 Unknown:1641
## (Other) :20552 NA's :8259 (Other):6324
## visits distinct_sessions orders gross_sales
## Min. : 0 Min. : 0 Min. : 0.00 Min. : 1
## 1st Qu.: 3 1st Qu.: 2 1st Qu.: 0.00 1st Qu.: 79
## Median : 24 Median : 19 Median : 0.00 Median : 851
## Mean : 1935 Mean : 1515 Mean : 62.38 Mean : 16473
## 3rd Qu.: 360 3rd Qu.: 274 3rd Qu.: 7.00 3rd Qu.: 3145
## Max. :136057 Max. :107104 Max. :4916.00 Max. :707642
## NA's :9576
## bounces add_to_cart product_page_views search_page_views
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 3 1st Qu.: 4
## Median : 5.0 Median : 4.0 Median : 53 Median : 82
## Mean : 743.3 Mean : 166.3 Mean : 4358 Mean : 8584
## 3rd Qu.: 97.0 3rd Qu.: 43.0 3rd Qu.: 708 3rd Qu.: 1229
## Max. :54512.0 Max. :7924.0 Max. :187601 Max. :506629
##
Now, lets replace missing values in new_customer column as numeric value 2(nominal).
# Assigning the customer having missing value as 2
webData$new_customer[which(is.na(webData$new_customer))] <- 2
# Similarly Assgning the missing value platform to Unknown
webData$platform[which(webData$platform == "")] <- "Unknown"
webData$platform <- factor(webData$platform)
Now in the final step, lets extract some date related fields from the data and append it in our main data.
webData$date <- substring(webData$day, 0, 7)
webData$day <- weekdays(as.Date(webData$date, "%m/%d/%y"))
webData$day <- as.factor(webData$day)
webData$month <- months(as.Date(webData$date, "%m/%d/%y"))
webData$month <- as.factor(webData$month)
Ok, enough with boring codes. Now lets see something cool. Lets answer some real questions now.
Question 3 : Who visits the website and what they
do?
Well from the data, it is clear that the users belong to the category of “New User”, “Returning User” and “Neither”. Lets
segment them and observe their visits, Bounces, addtocart & orders trends.
library(ggplot2)
library(reshape2)
library(gridExtra)
## Loading required package: grid
# Aggregaring data based on average visits, bounces, addtocart and orders group by
user
getAggregate <- function(funcName){
activity <- aggregate(list(webData$visits, webData$bounces,
webData$add_to_cart, webData$orders), list(webData$new_customer), FUN=funcName)
names(activity) <- c("Customers","Visits","Bounces","Add to Cart", "Order")
activity <- activity[,-1]
}
activity <- getAggregate("mean")
Customers=c("New_User","Returning User","Neither") # create list of names
data=data.frame(cbind(activity),Customers) # combine them into a data frame
data.m <- melt(data, id.vars='Customers')
# Plotting Code
plot1 <- ggplot(data.m, aes(Customers, value)) + ggtitle("Customer Visit
Pattern[On Average]") + geom_bar(aes(fill = variable), position = "dodge",
stat="identity") +ylab("Average Count")
activity <- getAggregate("sum")
data=data.frame(cbind(activity),Customers) # combine them into a data frame
data.m <- melt(data, id.vars='Customers')
plot2 <- ggplot(data.m, aes(Customers, value)) + ggtitle("Customer Visit Pattern
[Total]") +geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
ggtitle("Customer Visit Pattern[Log Scale]") + scale_y_log10("Total Count")
grid.arrange(plot1, plot2, nrow = 2)
Here, from both the panels, we can get the following perceptions:
1. The “Neither” category of customer is exceptionally high from returning and new users. This means that the web-logs
are failed to capture most of the user category or there are many bot visits on the website.
2. The Neither category users do not have any order count on average, which strengthen the claims of the bot-visits.
3. New-users are more than returning users. Well its both good and bad news for the website. Good news is they have
a traffic of new customers joined this year means their business is not flop. And bad news is the returning users counts
are less, this might indicate a slight customer un-satisfaction.
4. From the second plot, we can clearly have a zoomed glance of the visitors trends. Like, returning users have less
bounce rate compare to new users.
5. There is a similar add-to-cart to order conversion rate in returning and new users, unlike the “Neither” user category.
Ok, Now lets, dig deeper into another set of trends.
Question 4 Is there any temporal pattern in visitors
behavior?
We need to group the visits patterns based on day and month, to find out such pattern.
activity <- aggregate(list(webData$visits, webData$bounces, webData$add_to_cart,
webData$orders), list(webData$month), FUN="mean")
names(activity) <- c("Month","Visits","Bounces","Add to Cart", "Order")
activity <- activity[,-1]
Months=unique(webData$month) # create list of names
data=data.frame(cbind(activity),Months) # combine them into a data frame
data.m <- melt(data, id.vars='Months')
data.m$Months <- factor(data.m$Months, levels=c("January", "February",
"June","July","August","September","October","November","December"))
plot3 <- ggplot(data.m, aes(x = Months, y = value)) + geom_line(size=1,
aes(group=variable,color=factor(variable)))+geom_point(color="blue") +
ggtitle("Monthly Activity Trend") + theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + ylab("Average Count")
activity <- aggregate(list(webData$visits, webData$bounces, webData$add_to_cart,
webData$orders), list(webData$day), FUN="mean")
names(activity) <- c("Days","Visits","Bounces","Add to Cart", "Order")
activity <- activity[,-1]
Days=unique(webData$day) # create list of names
data=data.frame(cbind(activity),Days) # combine them into a data frame
data.m <- melt(data, id.vars='Days')
data.m$Days <- factor(data.m$Days, levels=c("Monday", "Tuesday","Wednesday"
,"Thursday","Friday","Saturday","Sunday"))
plot4 <- ggplot(data.m, aes(x = Days, y = value)) + geom_line(size=1,
aes(group=variable,color=factor(variable)))+geom_point(color="blue") +
ggtitle("Day-Wise Activity Trend") + theme(axis.text.x = element_text(angle = 90,
hjust = 1)) + ylab("Average Count")
grid.arrange(plot3, plot4, nrow = 2)
Now lets see what all we can interpret from the above plots:
1. We see some seasonal traffic in Februray and October, but again not so much significant.
2. Similar trends of activities are observed in month and day-wise views.
Note: Since the data is dummy, we have not seen any of the real-world temporal patterns in this data, such as seasonal
patterns or weekday-weekend patterns.
Ok, now lets see some other patterns.
Question 5: Is there any trend based on the user
platform and the particular site visit?
To see that, lets check the share of each site on the platform wise visits of customers.
qplot(factor(platform), data=webData, geom="bar", fill=factor(site)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +xlab("Platform")
+ylab("Visitor Count") + ggtitle("Platform Vs Site Share")
Few interpretations from the above plot :
1. It is obvious and very clear that the website has dominant Android, IOS and windows users.
2. Acme, Pinnacle and Sortly sites are pre-dominantly visited and are opened in almost all the platforms.
3. Few Sites as “Botly”, “Widgetry” and “Tabular” are not much popular in every platform. This means that these sites
either do not have cross-platform support or they are very less popular. There is a scope of increasing customer base of
these less popular sites among different platform users.
4. Symbian, Machintosh have very less customer base wrt to others, this indicates that the maintainance and new-
feature development support of these platform can have less priority.
Ok, now lets check some hidden patterns. Yes, I am talking about data-mining stuffs.
Question 6 : Are there any natural clusters in the
data, and if present what they indicates?
Well, we can check the clustering using K-means and other clustering algorithms. But lets check some natural cluster using
Exploratory Data Analysis itself.
clustData <- subset(webData,
select=c("new_customer","orders","bounces","add_to_cart","product_page_views"))
plot(clustData[,2:ncol(clustData)], pch=20, col=clustData$new_customer+1,
main="Scatter Matrix on User Activities")
Now, lets see what the above plot means:
1. We can see some natural cluster in the data. In many of the panels, there are 3 clear clusters [ Red -> “New User”,
Black -> “Returning User”, Green -> “Neither”].
2. These clusters are nothing but different user-categories. This means that we can easily segment customer-type with
respect to their site visit and activity patterns.
3. Since the clusters are linearly separable, this indicates the predictive nature of customer behavior. These predictions
can help in building targeted customer campaigns.
Other Patterns and Scope
Well, pattern finding in data is a never ending process. With more quality data and enriched techniques we can have more
hidden knowledge unraveled.
Here also, if we have product visit information, we can check for “Association Rules” among the product. These results
can help in building recommendation systems for the website.
Similarly user-signature data can help in analyzing individual customer behavior that can help in building personalized
advertisements.
Geo-spatial information can also provide more detail analysis of location wise customer behavior.
End Of Report

More Related Content

Viewers also liked (13)

PDF
Design to Differentiate An Approach to Test, Target and Learn
Tallada Kishore
 
PPTX
Models of audience segmentation
fin98
 
PPTX
Data Science for e-commerce
InfoFarm
 
PPTX
Data mining with Google analytics
Greg Bray
 
PDF
Sentiment Analysis in R
Edureka!
 
PDF
Web Analytics Framework
Tallada Kishore
 
PPTX
Sentiment Analysis via R Programming
Skillspeed
 
PDF
Automating Web Analytics
Anand Bagmar
 
KEY
Marrying Web Analytics and User Experience
Louis Rosenfeld
 
PDF
Sentiment analysis
ike kurniati
 
PPT
How Sentiment Analysis works
CJ Jenkins
 
PDF
Transform Your Marketing
HubSpot
 
PDF
Sentiment Analysis of Twitter Data
Sumit Raj
 
Design to Differentiate An Approach to Test, Target and Learn
Tallada Kishore
 
Models of audience segmentation
fin98
 
Data Science for e-commerce
InfoFarm
 
Data mining with Google analytics
Greg Bray
 
Sentiment Analysis in R
Edureka!
 
Web Analytics Framework
Tallada Kishore
 
Sentiment Analysis via R Programming
Skillspeed
 
Automating Web Analytics
Anand Bagmar
 
Marrying Web Analytics and User Experience
Louis Rosenfeld
 
Sentiment analysis
ike kurniati
 
How Sentiment Analysis works
CJ Jenkins
 
Transform Your Marketing
HubSpot
 
Sentiment Analysis of Twitter Data
Sumit Raj
 

Similar to Web analytics using R (20)

PDF
Customer Clustering For Retail Marketing
Jonathan Sedar
 
PPTX
2000 KDD Cup Winners
Salford Systems
 
PPT
clickstream analysis
ERSHUBHAM TIWARI
 
PPT
BAQMaR - Conference DM
BAQMaR
 
PPT
Web usage-mining
Samik Bhattacharjee
 
PDF
Web Mining
Rami Alsalman
 
PPTX
Web usage mining
shabnamfsayyad
 
PPTX
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
PDF
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
PDF
Web analytics
Fernando Tricas García
 
PDF
Using R for customer segmentation
Kumar P
 
PDF
RESEARCH CHALLENGES IN WEB ANALYTICS – A STUDY
IRJET Journal
 
PDF
Customer Clustering for Retailer Marketing
Jonathan Sedar
 
PPTX
Data mining for the online retail industry
ATUL SHARMA
 
PPTX
E-commerce Customer Segmentation: Unlocking Consumer Insights
Boston Institute of Analytics
 
PDF
Data Wrangling with dplyr
Rsquared Academy
 
PPTX
web log mining presentation
rutuja suryawanshi
 
PDF
Big Data Explained - Case study: Website Analytics
deep.bi
 
PDF
MeasureFest July 2021 - Session Segmentation with Machine Learning
Richard Lawrence
 
PDF
Predicting online user behaviour using deep learning algorithms
Armando Vieira
 
Customer Clustering For Retail Marketing
Jonathan Sedar
 
2000 KDD Cup Winners
Salford Systems
 
clickstream analysis
ERSHUBHAM TIWARI
 
BAQMaR - Conference DM
BAQMaR
 
Web usage-mining
Samik Bhattacharjee
 
Web Mining
Rami Alsalman
 
Web usage mining
shabnamfsayyad
 
Using R for Building a Simple and Effective Dashboard
Andrea Gigli
 
Interactively querying Google Analytics reports from R using ganalytics
Johann de Boer
 
Using R for customer segmentation
Kumar P
 
RESEARCH CHALLENGES IN WEB ANALYTICS – A STUDY
IRJET Journal
 
Customer Clustering for Retailer Marketing
Jonathan Sedar
 
Data mining for the online retail industry
ATUL SHARMA
 
E-commerce Customer Segmentation: Unlocking Consumer Insights
Boston Institute of Analytics
 
Data Wrangling with dplyr
Rsquared Academy
 
web log mining presentation
rutuja suryawanshi
 
Big Data Explained - Case study: Website Analytics
deep.bi
 
MeasureFest July 2021 - Session Segmentation with Machine Learning
Richard Lawrence
 
Predicting online user behaviour using deep learning algorithms
Armando Vieira
 
Ad

Recently uploaded (20)

PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Climate Action.pptx action plan for climate
justfortalabat
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
Ad

Web analytics using R

  • 1. Report : Web-Analytics Insights Author : Abhishek Agrawal Email-Id : [email protected] Language : R [base, knitR, ggplot2] Synopsis In this document, a brief report is presented which provide the insights about the visitors behavior and activities on the website. Now, before going deeper into any analytics, lets us discuss what is that we are trying to achieve. The answer is clear, i.e. “the website visitor's behavior and activities trends”. But now the question is how these information can be retrieved and more importantly how can it tell a story. Well, for that we need a set of right questions that can capture our story. So, lets begin our story with a set of questions and answers. Note: This report is generated using KnitR package in R and the results are reproducible using the R packages. Code is embedded in the same file. Question 1 : How is data loaded and how it looks like? Data is obtained from an ecommerce vendor as a dummy data of some of their actual data streams and saved in .csv format named “webdata.csv” . Wait!, do you want to see that.. Ok!, below is the code to load and peek into data. rm(list=ls()) webData <- read.csv("webdata.csv", header = T) # Loading data names(webData) ## [1] "day" "site" "new_customer" ## [4] "platform" "visits" "distinct_sessions" ## [7] "orders" "gross_sales" "bounces" ## [10] "add_to_cart" "product_page_views" "search_page_views" head(webData,1) ## day site new_customer platform visits distinct_sessions orders ## 1 1/1/13 0:00 Acme 1 Android 24 16 14 ## gross_sales bounces add_to_cart product_page_views search_page_views ## 1 1287 4 16 104 192 Well!, now you got an idea, that how the data looks likes and what are the attributes present in the data such as (day, site, visits, order, gross_sales, page_views etc.). Now lets move on to the next question. Question 2 : Does the data looks clean and if not, any pre-processing or feature engineering is required? Yes, the data looks mostly clean but it does have some missing values and some features can be created that can give us better insights later such as day, month etc. So, in the first step lets see the summary statistics of the data: summary(webData) # Summary statistics of the data
  • 2. ## day site new_customer platform ## 12/19/13 0:00: 86 Acme :7392 Min. :0.000 iOS :3435 ## 11/29/13 0:00: 85 Botly : 804 1st Qu.:0.000 Android:3172 ## 12/11/13 0:00: 85 Pinnacle:5725 Median :0.000 Windows:2399 ## 12/7/13 0:00 : 85 Sortly :5532 Mean :0.448 MacOSX :2054 ## 12/2/13 0:00 : 84 Tabular : 804 3rd Qu.:1.000 Linux :2036 ## 12/5/13 0:00 : 84 Widgetry: 804 Max. :1.000 Unknown:1641 ## (Other) :20552 NA's :8259 (Other):6324 ## visits distinct_sessions orders gross_sales ## Min. : 0 Min. : 0 Min. : 0.00 Min. : 1 ## 1st Qu.: 3 1st Qu.: 2 1st Qu.: 0.00 1st Qu.: 79 ## Median : 24 Median : 19 Median : 0.00 Median : 851 ## Mean : 1935 Mean : 1515 Mean : 62.38 Mean : 16473 ## 3rd Qu.: 360 3rd Qu.: 274 3rd Qu.: 7.00 3rd Qu.: 3145 ## Max. :136057 Max. :107104 Max. :4916.00 Max. :707642 ## NA's :9576 ## bounces add_to_cart product_page_views search_page_views ## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0 ## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 3 1st Qu.: 4 ## Median : 5.0 Median : 4.0 Median : 53 Median : 82 ## Mean : 743.3 Mean : 166.3 Mean : 4358 Mean : 8584 ## 3rd Qu.: 97.0 3rd Qu.: 43.0 3rd Qu.: 708 3rd Qu.: 1229 ## Max. :54512.0 Max. :7924.0 Max. :187601 Max. :506629 ## Now, lets replace missing values in new_customer column as numeric value 2(nominal). # Assigning the customer having missing value as 2 webData$new_customer[which(is.na(webData$new_customer))] <- 2 # Similarly Assgning the missing value platform to Unknown webData$platform[which(webData$platform == "")] <- "Unknown" webData$platform <- factor(webData$platform) Now in the final step, lets extract some date related fields from the data and append it in our main data. webData$date <- substring(webData$day, 0, 7) webData$day <- weekdays(as.Date(webData$date, "%m/%d/%y")) webData$day <- as.factor(webData$day) webData$month <- months(as.Date(webData$date, "%m/%d/%y")) webData$month <- as.factor(webData$month) Ok, enough with boring codes. Now lets see something cool. Lets answer some real questions now. Question 3 : Who visits the website and what they do? Well from the data, it is clear that the users belong to the category of “New User”, “Returning User” and “Neither”. Lets segment them and observe their visits, Bounces, addtocart & orders trends. library(ggplot2) library(reshape2) library(gridExtra) ## Loading required package: grid
  • 3. # Aggregaring data based on average visits, bounces, addtocart and orders group by user getAggregate <- function(funcName){ activity <- aggregate(list(webData$visits, webData$bounces, webData$add_to_cart, webData$orders), list(webData$new_customer), FUN=funcName) names(activity) <- c("Customers","Visits","Bounces","Add to Cart", "Order") activity <- activity[,-1] } activity <- getAggregate("mean") Customers=c("New_User","Returning User","Neither") # create list of names data=data.frame(cbind(activity),Customers) # combine them into a data frame data.m <- melt(data, id.vars='Customers') # Plotting Code plot1 <- ggplot(data.m, aes(Customers, value)) + ggtitle("Customer Visit Pattern[On Average]") + geom_bar(aes(fill = variable), position = "dodge", stat="identity") +ylab("Average Count") activity <- getAggregate("sum") data=data.frame(cbind(activity),Customers) # combine them into a data frame data.m <- melt(data, id.vars='Customers') plot2 <- ggplot(data.m, aes(Customers, value)) + ggtitle("Customer Visit Pattern [Total]") +geom_bar(aes(fill = variable), position = "dodge", stat="identity") + ggtitle("Customer Visit Pattern[Log Scale]") + scale_y_log10("Total Count") grid.arrange(plot1, plot2, nrow = 2) Here, from both the panels, we can get the following perceptions: 1. The “Neither” category of customer is exceptionally high from returning and new users. This means that the web-logs are failed to capture most of the user category or there are many bot visits on the website. 2. The Neither category users do not have any order count on average, which strengthen the claims of the bot-visits. 3. New-users are more than returning users. Well its both good and bad news for the website. Good news is they have a traffic of new customers joined this year means their business is not flop. And bad news is the returning users counts are less, this might indicate a slight customer un-satisfaction. 4. From the second plot, we can clearly have a zoomed glance of the visitors trends. Like, returning users have less bounce rate compare to new users. 5. There is a similar add-to-cart to order conversion rate in returning and new users, unlike the “Neither” user category.
  • 4. Ok, Now lets, dig deeper into another set of trends. Question 4 Is there any temporal pattern in visitors behavior? We need to group the visits patterns based on day and month, to find out such pattern. activity <- aggregate(list(webData$visits, webData$bounces, webData$add_to_cart, webData$orders), list(webData$month), FUN="mean") names(activity) <- c("Month","Visits","Bounces","Add to Cart", "Order") activity <- activity[,-1] Months=unique(webData$month) # create list of names data=data.frame(cbind(activity),Months) # combine them into a data frame data.m <- melt(data, id.vars='Months') data.m$Months <- factor(data.m$Months, levels=c("January", "February", "June","July","August","September","October","November","December")) plot3 <- ggplot(data.m, aes(x = Months, y = value)) + geom_line(size=1, aes(group=variable,color=factor(variable)))+geom_point(color="blue") + ggtitle("Monthly Activity Trend") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylab("Average Count") activity <- aggregate(list(webData$visits, webData$bounces, webData$add_to_cart, webData$orders), list(webData$day), FUN="mean") names(activity) <- c("Days","Visits","Bounces","Add to Cart", "Order") activity <- activity[,-1] Days=unique(webData$day) # create list of names data=data.frame(cbind(activity),Days) # combine them into a data frame data.m <- melt(data, id.vars='Days') data.m$Days <- factor(data.m$Days, levels=c("Monday", "Tuesday","Wednesday" ,"Thursday","Friday","Saturday","Sunday")) plot4 <- ggplot(data.m, aes(x = Days, y = value)) + geom_line(size=1, aes(group=variable,color=factor(variable)))+geom_point(color="blue") + ggtitle("Day-Wise Activity Trend") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ylab("Average Count") grid.arrange(plot3, plot4, nrow = 2)
  • 5. Now lets see what all we can interpret from the above plots: 1. We see some seasonal traffic in Februray and October, but again not so much significant. 2. Similar trends of activities are observed in month and day-wise views. Note: Since the data is dummy, we have not seen any of the real-world temporal patterns in this data, such as seasonal patterns or weekday-weekend patterns. Ok, now lets see some other patterns. Question 5: Is there any trend based on the user platform and the particular site visit? To see that, lets check the share of each site on the platform wise visits of customers. qplot(factor(platform), data=webData, geom="bar", fill=factor(site)) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +xlab("Platform") +ylab("Visitor Count") + ggtitle("Platform Vs Site Share")
  • 6. Few interpretations from the above plot : 1. It is obvious and very clear that the website has dominant Android, IOS and windows users. 2. Acme, Pinnacle and Sortly sites are pre-dominantly visited and are opened in almost all the platforms. 3. Few Sites as “Botly”, “Widgetry” and “Tabular” are not much popular in every platform. This means that these sites either do not have cross-platform support or they are very less popular. There is a scope of increasing customer base of these less popular sites among different platform users. 4. Symbian, Machintosh have very less customer base wrt to others, this indicates that the maintainance and new- feature development support of these platform can have less priority. Ok, now lets check some hidden patterns. Yes, I am talking about data-mining stuffs. Question 6 : Are there any natural clusters in the data, and if present what they indicates? Well, we can check the clustering using K-means and other clustering algorithms. But lets check some natural cluster using Exploratory Data Analysis itself. clustData <- subset(webData, select=c("new_customer","orders","bounces","add_to_cart","product_page_views")) plot(clustData[,2:ncol(clustData)], pch=20, col=clustData$new_customer+1, main="Scatter Matrix on User Activities")
  • 7. Now, lets see what the above plot means: 1. We can see some natural cluster in the data. In many of the panels, there are 3 clear clusters [ Red -> “New User”, Black -> “Returning User”, Green -> “Neither”]. 2. These clusters are nothing but different user-categories. This means that we can easily segment customer-type with respect to their site visit and activity patterns. 3. Since the clusters are linearly separable, this indicates the predictive nature of customer behavior. These predictions can help in building targeted customer campaigns. Other Patterns and Scope Well, pattern finding in data is a never ending process. With more quality data and enriched techniques we can have more hidden knowledge unraveled. Here also, if we have product visit information, we can check for “Association Rules” among the product. These results can help in building recommendation systems for the website. Similarly user-signature data can help in analyzing individual customer behavior that can help in building personalized advertisements. Geo-spatial information can also provide more detail analysis of location wise customer behavior. End Of Report