Open In App

Telecom Customer Churn Analysis in R

Last Updated : 19 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Customer churn is an important concern for the telecom industry, as retaining customers is just as important as acquiring new ones. In this article we will be analyzing a dataset related to customer churn to derive insights into why customers leave and what can be done to retain them.

Project Overview

The primary goal of this project is to understand the factors that influence customer churn in the telecom industry. By identifying these factors, telecom companies can implement targeted interventions to retain customers. This has broader implications, helping telecom companies improve profitability, customer satisfaction and community stability through reliable services.

Understanding the Dataset

The dataset contains columns such as customer ID, gender, senior citizen, Partner, Dependents, tenure, phone service, Internet service, Churn and other telecom customer-related information.

Dataset Link: Telecom Customer Churn

1. Load Packages and Data

In this step, we will load the necessary libraries and read the dataset. We will check the first few rows of the data to understand its structure. The head(churn_data) function in R displays the first six rows of the "churn_data" dataframe. This function is useful for quickly inspecting the structure and contents of the dataframe to understand what kind of data it contains.

R
install.packages(c("dplyr","ggplot2","tidyverse"))

library(dplyr)
library(tidyverse)
library(ggplot2)

churn_data <- read.csv("/content/Telcom-Customer-Churn.csv")
head(churn_data) 

Output:

data
Data

From this initial glance, we can see that the dataset contains both categorical and numerical data, such as customer information, service types and billing methods. This forms the basis for understanding patterns related to customer churn.

2. Performing Exploratory Data Analysis (EDA)

EDA is a process of describing and summarizing data to bring important aspects into focus for further analysis.

2.1. Checking for Missing values

We begin by checking for missing values and printing the dimensions of the dataset.

  • colSums(is.na(churn_data)) counts the number of missing values in each column.
  • dim(churn_data) give the dimensions of the dataset.
R
mis_val <- (colSums(is.na(churn_data)))
print("Missing Values in All columns")
print(mis_val)

print("Dimesnions of the dataset")
print(dim(churn_data))

Output:

eda1
Checking for Missing values

We can see that no column has missing data. This step is important for ensuring the quality of our analysis. We can also conclude that our dataset has 7032 rows and 21 columns.

2.2. Checking the summary of the data

The summary() function in R provides a statistical summary of each column in the dataframe. For numeric columns, it shows the minimum, 1st quartile, median, mean, 3rd quartile and maximum values. For categorical columns, it displays the frequency of each category.

R
summary(churn_data)

Output:

summary-of-the-data
Checking the summary of the data

This analysis helped us understand the distribution and key statistics of our data.

3. Performing Data Visualization

We will visualize the churn distribution and explore how various factors such as contract type, tenure and internet service affect churn.

3.1. Churn Distribution using Pie Chart

We will be plotting a bar chart to display the distribution of customer churn.

  • geom_bar() is used to create a bar chart and coord_polar() transforms the bar chart into a pie chart.
  • geom_text() adds the percentage labels to the chart.
R
churn_counts <- table(churn_data$Churn)

churn_df <- as.data.frame(churn_counts)
names(churn_df) <- c("Churn", "Count")

ggplot(churn_df, aes(x = "", y = Count, fill = Churn)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  geom_text(aes(label = scales::percent(Count / sum(Count))), 
            position = position_stack(vjust = 0.5)) +
  ggtitle("Churn Distribution") +
  theme_void()

Output:

gh
Telecom Customer Churn Analysis in R

The pie chart shows the distribution of churn (whether a customer left or stayed). It helps us see the proportion of customers who churned versus those who didn’t.

3.2. Churn Distribution of Contract Status

This bar plot shows how churn varies with respect to contract type (Month-to-month, One year, Two year).

R
ggplot(churn_data, aes(x = Churn, fill = Contract)) +
  geom_bar(position = "dodge") +
  labs(title = "Churn Distribution w.r.t Contract Status", x = "Churn") +
  theme_minimal()

Output:

Churn-Distribution-wrt-Contract-Status
Churn Distribution w.r.t Contract Status

The bar chart here helps visualize whether different contract types (Month-to-month, One year and Two year) have an impact on customer churn. It uses side-by-side bars to make the comparison clear.

3.3. Churn Distribution of Tenure

This bar chart shows the churn distribution in relation to the tenure (number of months a customer has been with the telecom company).

R
ggplot(churn_data, aes(x = tenure, fill = Churn)) +
  geom_bar(position = "dodge",width = 2,colour="black") +
  labs(title = "Churn Distribution w.r.t Tenure", x = "Months", y = "Count") +
  theme_minimal()

Output:

Churn-Distribution-wrt-Tenure
Churn Distribution w.r.t Tenure

This visualization helps us understand how the number of months a customer has been with the telecom company correlates with the likelihood of churn.

3.4. Churn Distribution of Internet Services

This bar plot shows the churn distribution in relation to the type of internet service used (DSL or Fiber optic).

R
ggplot(churn_data, aes(x = InternetService, fill = Churn)) +
  geom_bar(position = "dodge") +
  labs(title = "Churn Distribution w.r.t Internet Services", x = "Internet Service") +
  theme_minimal()

Output:

Churn-Distribution-wrt-Internet-Services
Churn Distribution w.r.t Internet Services

This bar chart helps visualize whether the type of internet service (DSL vs. Fiber optic) impacts the churn rate.

3.5. Churn Based on Senior Citizen Status

Identifying the number of senior citizens helps in tailoring services and promotions specifically for this segment. This bar plot compares the churn distribution between senior citizens and non-senior citizens.

R
senior_data <- data.frame(
  SeniorCitizen = c("No", "Yes"),
  Count = c(6932, 1539)
)

ggplot(senior_data, aes(x = SeniorCitizen, y = Count, fill = SeniorCitizen)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Senior Citizen Status", x = "Senior Citizen", y = "Count") +
  scale_fill_manual(values = c("No" = "#66B3FF", "Yes" = "#FF9999"))

Output:

gh
Customer Churn Analysis in R

This bar chart shows the number of senior citizens vs. non-senior citizens in the dataset, helping to tailor services specifically for this segment.

3.6. Churn Based on Payment Method

Understanding how customers prefer to pay for services can inform billing and payment strategy. This bar plot shows the distribution of different payment methods used by customers.

R
payment_data <- data.frame(
  PaymentMethod = c("Bank transfer (automatic)", "Credit card (automatic)",
                                                "Electronic check", "Mailed check"),
  Count = c(1542, 1521, 2365, 1604)
)

ggplot(payment_data, aes(x = PaymentMethod, y = Count, fill = PaymentMethod)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Payment Method Distribution", x = "Payment Method", y = "Count") +
  scale_fill_brewer(palette = "Set3")

Output:

Screenshot-2024-07-04-091914
Telecom Customer Churn Analysis in R

This bar chart visualizes the preferred payment methods among customers, helping telecom companies better understand customer behavior and billing preferences.

Conclusion

From our analysis, we observed that:

  • Month-to-month contracts are associated with higher churn, suggesting that offering longer-term contracts could improve retention.
  • Newer customers with shorter tenures are at higher risk of churn, indicating the need for targeted retention strategies early in the subscription.
  • DSL internet services show a higher churn rate compared to fiber optic services, highlighting the potential benefit of improving DSL quality or encouraging upgrades.
  • Senior citizens have a lower churn rate, emphasizing the need for tailored services for non-senior citizens who are more likely to churn.
  • Payment method preferences, especially the popularity of electronic checks, suggest opportunities for optimizing billing strategies.

By addressing these factors, telecom companies can implement more effective retention strategies to reduce churn.


Explore