SlideShare a Scribd company logo
2
Most read
7
Most read
14
Most read
dplyr Package
Introduction
Helps transform and manipulate data
Powerful tool to summarise data sets
Install: install.packages(dplyr)
Activate: library(dplyr)
File: Excel
Variables: 7
Observations: 153
▪ select
▪ filter
▪ arrange
▪ distinct
▪ mutate
▪ transmute
▪ group_by
▪ summarise
▪ pipe operator (%>%)
▪ slice
▪ count
Functions in dplyr
• Keeps only those variables (columns) that you
want to retain/extract.
• Syntax: select(dataset,[column1],[column2],…)
Examples:
 Select columns Month, Dealer, Item, Quantity: select(sales,Month,Dealer,Item,Qty)
 Select columns from Month to Quantity: select(sales,Month:Qty)
 Deselect column Month from the dataset: select(sales,-Month)
 Select columns ending with the letter “r”: select(sales,ends_with("r"))
 Select columns containing the letter “r”: select(sales,contains("r"))
 Select columns starting the series “m”: select(sales,matches("m."))
 Select columns with multiple variables: select(sales,one_of(c("Month","Dealer")))
 Select columns starting with the letter “d”: select(sales,starts_with("d"))
select()
• Keeps only those records (rows) that you want to
retain/extract.
• Syntax: filter(dataset,criteria)
Examples:
 Item is Pen: filter(sales,Item==“Pen”)
 Quantity is more than 50: filter(sales,Qty>50)
 Item is Pencil and Quantity is more than 50: filter(sales,Item=="Pencil"&Qty>50)
 Quantity is between 50 and 80: filter(sales,Qty>50&Qty<80)
 Item is Pencil or Quantity is more than 50: filter(sales,Item=="Pencil"|Qty>50)
filter()
Examples:
 We want to extract the Sales Manager, Item and Quantity but only for Pencil:
i) k=select(sales,SalesManager,Item,Qty)
filter(k,Item=="Pencil")
ii) select(filter(sales,Item=="Pencil"),SalesManager,Item,Qty)
iii) filter(select(sales,SalesManager,Item,Qty),Item=="Pencil")
 We want to extract for the Month of May, Dealer, Item and Quantity:
i) filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
ii) filter(select(sales,Dealer,Item,Qty),Month=="May")
select() and filter()
• Orders or sorts the records (rows) based on the
variable(s).
• By default the arrangement is in ascending order.
• Syntax: arrange(dataset,column1,[column2],…)
Examples:
 Sort the dataset based on Months: arrange(sales,Month)
 Sort the dataset based on Months and Dealer: arrange(sales,Month,Dealer)
 Arrange the data in descending order of Quantity: arrange(sales,desc(Qty))
arrange()
• Helps extract unique values from a variable.
• Syntax: distinct(dataset,by=column1)
Examples:
 Find the names of the Dealers: distinct(sales,Dealer)
 Find the items sold by each Dealer: arrange(distinct(sales,Dealer,Item),by=Dealer)
distinct()
• Adds a new variable (column) to the existing
dataset
• Syntax: mutate(dataset,newcolumn=criteria)
Example:
 Add a new column Target where it is twice of Quantity: mutate(sales,Target=Qty*2)
mutate()
• Creates a new variable (column) but drops the
existing ones
• Syntax: transmute(dataset,newcolumn=criteria)
Example:
 Create a new column Target where it is twice of Quantity: transmute(sales,tgt=2*Qty)
transmute()
• Helps create groups in a dataset based on a
varaible.
• Useful when nested with other functions.
• Syntax: group_by(dataset,column1,[column2]…)
• Ungroup Syntax: ungroup(dataset)
Example:
 Create groups in the data based on Items: group_by(sales,Item)
 Get the maximum units sold for each item: filter(group_by(sales,by=Item),Qty==max(Qty))
group_by()
• Helps generate a single number/statistic for the dataset
• Syntax: summarise(dataset,newvariable=function….)
Examples:
 Total number of units sold across all Items:
summarise(sales,total=sum(Qty))
 Total number of units sold and total amount:
summarise(sales,t_Qty=sum(Qty),t_Amount=sum(Amount))
 Total number of records in the dataset:
summarise(sales,rowscount=n())
 Get the total number of records, quantity sold and amount for each item:
summarise(group_by(sales,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
 Every statistic for each dealer and their respective items:
summarise(group_by(sales,Dealer,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
summarise()
 We want to extract the top 6 records for Dealers who have sold the Item Pen only:
filter((sales,Item=="Pen")
select(filter(sales,Item=="Pen"),Item,Dealer,Qty)
arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer)
head(arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer))
 We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
select(sales,Dealer,Item,Qty)
filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer)
summarise(group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer),max(Qty))
Assignment
• Belongs to magrittr Package.
• Helps structure sequence of operations in a
single code from left to right.
• Helps avoid nesting of funtions.
• Operator: %>%
Examples:
 We want to extract the top 6 records for Dealers who have sold the Item Pen only:
sales%>%filter(Item=="Pen")%>%select(Dealer,Item,Qty)%>%arrange(Dealer)%>%head
 We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
sales%>%select(Dealer,Item,Qty)%>%filter(sales$Month=="May")%>%group_by(Item,Dealer)%>%sum
marise(max(Qty))
pipe operator %>%
• Helps extract records (rows) based on their
position.
• Syntax: slice(dataset,row numbers)
Examples:
 Select first ten rows: slice(sales,1:10)
 Select rows fifteen to twenty: slice(sales,15:20)
slice()
• Helps count the number of times a values has
appeared in a variable.
• Syntax: count(dataset, [column1],[column2],…)
Examples:
 Count the number of times each Dealer has appeared: count(sales,Dealer)
 Count the number of times Pen has appeared: count(sales,Item=="Pen")
count()
Thanks!
Any questions?
You can find me at
▪ cc@wkvedu.com

More Related Content

What's hot (20)

PPTX
DPLYR package in R
Bimba Pawar
 
PPTX
Data frame operations
19MSS011dhanyatha
 
PPTX
Data Reduction Stratergies
AnjaliSoorej
 
PPTX
Data mining primitives
lavanya marichamy
 
PDF
List comprehensions
Jordi Gómez
 
PPTX
Data Exploration in R.pptx
Ramakrishna Reddy Bijjam
 
PPTX
Data preprocessing
Gajanand Sharma
 
PPTX
Data visualization using R
Ummiya Mohammedi
 
PPTX
Pandas
Jyoti shukla
 
PPT
Data preprocessing
Jason Rodrigues
 
PPT
Data Preprocessing
Object-Frontier Software Pvt. Ltd
 
PPTX
Data Mining: Outlier analysis
DataminingTools Inc
 
PDF
Dimensionality reduction with UMAP
Jakub Bartczuk
 
PPT
Data preparation
Tony Nguyen
 
PDF
Scaling and Normalization
Kush Kulshrestha
 
PPTX
Presentation on data preparation with pandas
AkshitaKanther
 
PDF
pandas - Python Data Analysis
Andrew Henshaw
 
DPLYR package in R
Bimba Pawar
 
Data frame operations
19MSS011dhanyatha
 
Data Reduction Stratergies
AnjaliSoorej
 
Data mining primitives
lavanya marichamy
 
List comprehensions
Jordi Gómez
 
Data Exploration in R.pptx
Ramakrishna Reddy Bijjam
 
Data preprocessing
Gajanand Sharma
 
Data visualization using R
Ummiya Mohammedi
 
Pandas
Jyoti shukla
 
Data preprocessing
Jason Rodrigues
 
Data Mining: Outlier analysis
DataminingTools Inc
 
Dimensionality reduction with UMAP
Jakub Bartczuk
 
Data preparation
Tony Nguyen
 
Scaling and Normalization
Kush Kulshrestha
 
Presentation on data preparation with pandas
AkshitaKanther
 
pandas - Python Data Analysis
Andrew Henshaw
 

Similar to dplyr Package in R (20)

PDF
Getting Started with MDX 20140625a
Ron Moore
 
PPTX
Lecture 9.pptx
MathewJohnSinoCruz
 
PDF
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
yesheeka
 
PPTX
Pass 2018 introduction to dax
Ike Ellis
 
PPS
Advanced excel unit 01
Prashanth Shivakumar
 
PDF
Adding measures to Calcite SQL
Julian Hyde
 
PPTX
Chris Seebacher Portfolio
guest3ea163
 
PPTX
Ali upload
Ali Zahraei, Ph.D
 
PPTX
Introduction - Using Stata
Ryan Herzog
 
PPTX
Data Science.pptx00000000000000000000000
shaikhmismail66
 
PPTX
Oracle sql analytic functions
mamamowebby
 
PDF
R programming & Machine Learning
AmanBhalla14
 
PPTX
CS 151 Standard deviation lecture
Rudy Martinez
 
PDF
Company segmentation - an approach with R
Casper Crause
 
PDF
Calculation Groups - color 1 slide per page.pdf
PBIMINERADC
 
PPTX
UNIT-3 python and data structure alo.pptx
harikahhy
 
PPTX
4. chapter iv(transform)
Chhom Karath
 
PPTX
Introduction
Ryan Herzog
 
PDF
From 0 to DAX…………………………………………………………..pdf
VaibhavChawla26
 
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Getting Started with MDX 20140625a
Ron Moore
 
Lecture 9.pptx
MathewJohnSinoCruz
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
yesheeka
 
Pass 2018 introduction to dax
Ike Ellis
 
Advanced excel unit 01
Prashanth Shivakumar
 
Adding measures to Calcite SQL
Julian Hyde
 
Chris Seebacher Portfolio
guest3ea163
 
Ali upload
Ali Zahraei, Ph.D
 
Introduction - Using Stata
Ryan Herzog
 
Data Science.pptx00000000000000000000000
shaikhmismail66
 
Oracle sql analytic functions
mamamowebby
 
R programming & Machine Learning
AmanBhalla14
 
CS 151 Standard deviation lecture
Rudy Martinez
 
Company segmentation - an approach with R
Casper Crause
 
Calculation Groups - color 1 slide per page.pdf
PBIMINERADC
 
UNIT-3 python and data structure alo.pptx
harikahhy
 
4. chapter iv(transform)
Chhom Karath
 
Introduction
Ryan Herzog
 
From 0 to DAX…………………………………………………………..pdf
VaibhavChawla26
 
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Ad

Recently uploaded (20)

PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Ad

dplyr Package in R

  • 2. Introduction Helps transform and manipulate data Powerful tool to summarise data sets Install: install.packages(dplyr) Activate: library(dplyr)
  • 4. ▪ select ▪ filter ▪ arrange ▪ distinct ▪ mutate ▪ transmute ▪ group_by ▪ summarise ▪ pipe operator (%>%) ▪ slice ▪ count Functions in dplyr
  • 5. • Keeps only those variables (columns) that you want to retain/extract. • Syntax: select(dataset,[column1],[column2],…) Examples:  Select columns Month, Dealer, Item, Quantity: select(sales,Month,Dealer,Item,Qty)  Select columns from Month to Quantity: select(sales,Month:Qty)  Deselect column Month from the dataset: select(sales,-Month)  Select columns ending with the letter “r”: select(sales,ends_with("r"))  Select columns containing the letter “r”: select(sales,contains("r"))  Select columns starting the series “m”: select(sales,matches("m."))  Select columns with multiple variables: select(sales,one_of(c("Month","Dealer")))  Select columns starting with the letter “d”: select(sales,starts_with("d")) select()
  • 6. • Keeps only those records (rows) that you want to retain/extract. • Syntax: filter(dataset,criteria) Examples:  Item is Pen: filter(sales,Item==“Pen”)  Quantity is more than 50: filter(sales,Qty>50)  Item is Pencil and Quantity is more than 50: filter(sales,Item=="Pencil"&Qty>50)  Quantity is between 50 and 80: filter(sales,Qty>50&Qty<80)  Item is Pencil or Quantity is more than 50: filter(sales,Item=="Pencil"|Qty>50) filter()
  • 7. Examples:  We want to extract the Sales Manager, Item and Quantity but only for Pencil: i) k=select(sales,SalesManager,Item,Qty) filter(k,Item=="Pencil") ii) select(filter(sales,Item=="Pencil"),SalesManager,Item,Qty) iii) filter(select(sales,SalesManager,Item,Qty),Item=="Pencil")  We want to extract for the Month of May, Dealer, Item and Quantity: i) filter(select(sales,Dealer,Item,Qty),sales$Month=="May") ii) filter(select(sales,Dealer,Item,Qty),Month=="May") select() and filter()
  • 8. • Orders or sorts the records (rows) based on the variable(s). • By default the arrangement is in ascending order. • Syntax: arrange(dataset,column1,[column2],…) Examples:  Sort the dataset based on Months: arrange(sales,Month)  Sort the dataset based on Months and Dealer: arrange(sales,Month,Dealer)  Arrange the data in descending order of Quantity: arrange(sales,desc(Qty)) arrange()
  • 9. • Helps extract unique values from a variable. • Syntax: distinct(dataset,by=column1) Examples:  Find the names of the Dealers: distinct(sales,Dealer)  Find the items sold by each Dealer: arrange(distinct(sales,Dealer,Item),by=Dealer) distinct()
  • 10. • Adds a new variable (column) to the existing dataset • Syntax: mutate(dataset,newcolumn=criteria) Example:  Add a new column Target where it is twice of Quantity: mutate(sales,Target=Qty*2) mutate()
  • 11. • Creates a new variable (column) but drops the existing ones • Syntax: transmute(dataset,newcolumn=criteria) Example:  Create a new column Target where it is twice of Quantity: transmute(sales,tgt=2*Qty) transmute()
  • 12. • Helps create groups in a dataset based on a varaible. • Useful when nested with other functions. • Syntax: group_by(dataset,column1,[column2]…) • Ungroup Syntax: ungroup(dataset) Example:  Create groups in the data based on Items: group_by(sales,Item)  Get the maximum units sold for each item: filter(group_by(sales,by=Item),Qty==max(Qty)) group_by()
  • 13. • Helps generate a single number/statistic for the dataset • Syntax: summarise(dataset,newvariable=function….) Examples:  Total number of units sold across all Items: summarise(sales,total=sum(Qty))  Total number of units sold and total amount: summarise(sales,t_Qty=sum(Qty),t_Amount=sum(Amount))  Total number of records in the dataset: summarise(sales,rowscount=n())  Get the total number of records, quantity sold and amount for each item: summarise(group_by(sales,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))  Every statistic for each dealer and their respective items: summarise(group_by(sales,Dealer,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount)) summarise()
  • 14.  We want to extract the top 6 records for Dealers who have sold the Item Pen only: filter((sales,Item=="Pen") select(filter(sales,Item=="Pen"),Item,Dealer,Qty) arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer) head(arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer))  We want the maximum quantity of every item for the month of May with just Dealer, Item and Quantity variables: select(sales,Dealer,Item,Qty) filter(select(sales,Dealer,Item,Qty),sales$Month=="May") group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer) summarise(group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer),max(Qty)) Assignment
  • 15. • Belongs to magrittr Package. • Helps structure sequence of operations in a single code from left to right. • Helps avoid nesting of funtions. • Operator: %>% Examples:  We want to extract the top 6 records for Dealers who have sold the Item Pen only: sales%>%filter(Item=="Pen")%>%select(Dealer,Item,Qty)%>%arrange(Dealer)%>%head  We want the maximum quantity of every item for the month of May with just Dealer, Item and Quantity variables: sales%>%select(Dealer,Item,Qty)%>%filter(sales$Month=="May")%>%group_by(Item,Dealer)%>%sum marise(max(Qty)) pipe operator %>%
  • 16. • Helps extract records (rows) based on their position. • Syntax: slice(dataset,row numbers) Examples:  Select first ten rows: slice(sales,1:10)  Select rows fifteen to twenty: slice(sales,15:20) slice()
  • 17. • Helps count the number of times a values has appeared in a variable. • Syntax: count(dataset, [column1],[column2],…) Examples:  Count the number of times each Dealer has appeared: count(sales,Dealer)  Count the number of times Pen has appeared: count(sales,Item=="Pen") count()