dplyr Package in R

Introduction
Helps transform and manipulate data
Powerful tool to summarise data sets
Install: install.packages(dplyr)
Activate: library(dplyr)

File: Excel
Variables: 7
Observations: 153

▪ select
▪ filter
▪ arrange
▪ distinct
▪ mutate
▪ transmute
▪ group_by
▪ summarise
▪ pipe operator (%>%)
▪ slice
▪ count
Functions in dplyr

• Keeps only those variables (columns) that you
want to retain/extract.
• Syntax: select(dataset,[column1],[column2],…)
Examples:
 Select columns Month, Dealer, Item, Quantity: select(sales,Month,Dealer,Item,Qty)
 Select columns from Month to Quantity: select(sales,Month:Qty)
 Deselect column Month from the dataset: select(sales,-Month)
 Select columns ending with the letter “r”: select(sales,ends_with("r"))
 Select columns containing the letter “r”: select(sales,contains("r"))
 Select columns starting the series “m”: select(sales,matches("m."))
 Select columns with multiple variables: select(sales,one_of(c("Month","Dealer")))
 Select columns starting with the letter “d”: select(sales,starts_with("d"))
select()

• Keeps only those records (rows) that you want to
retain/extract.
• Syntax: filter(dataset,criteria)
Examples:
 Item is Pen: filter(sales,Item==“Pen”)
 Quantity is more than 50: filter(sales,Qty>50)
 Item is Pencil and Quantity is more than 50: filter(sales,Item=="Pencil"&Qty>50)
 Quantity is between 50 and 80: filter(sales,Qty>50&Qty<80)
 Item is Pencil or Quantity is more than 50: filter(sales,Item=="Pencil"|Qty>50)
filter()

Examples:
 We want to extract the Sales Manager, Item and Quantity but only for Pencil:
i) k=select(sales,SalesManager,Item,Qty)
filter(k,Item=="Pencil")
ii) select(filter(sales,Item=="Pencil"),SalesManager,Item,Qty)
iii) filter(select(sales,SalesManager,Item,Qty),Item=="Pencil")
 We want to extract for the Month of May, Dealer, Item and Quantity:
i) filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
ii) filter(select(sales,Dealer,Item,Qty),Month=="May")
select() and filter()

• Orders or sorts the records (rows) based on the
variable(s).
• By default the arrangement is in ascending order.
• Syntax: arrange(dataset,column1,[column2],…)
Examples:
 Sort the dataset based on Months: arrange(sales,Month)
 Sort the dataset based on Months and Dealer: arrange(sales,Month,Dealer)
 Arrange the data in descending order of Quantity: arrange(sales,desc(Qty))
arrange()

• Helps extract unique values from a variable.
• Syntax: distinct(dataset,by=column1)
Examples:
 Find the names of the Dealers: distinct(sales,Dealer)
 Find the items sold by each Dealer: arrange(distinct(sales,Dealer,Item),by=Dealer)
distinct()

• Adds a new variable (column) to the existing
dataset
• Syntax: mutate(dataset,newcolumn=criteria)
Example:
 Add a new column Target where it is twice of Quantity: mutate(sales,Target=Qty*2)
mutate()

• Creates a new variable (column) but drops the
existing ones
• Syntax: transmute(dataset,newcolumn=criteria)
Example:
 Create a new column Target where it is twice of Quantity: transmute(sales,tgt=2*Qty)
transmute()

• Helps create groups in a dataset based on a
varaible.
• Useful when nested with other functions.
• Syntax: group_by(dataset,column1,[column2]…)
• Ungroup Syntax: ungroup(dataset)
Example:
 Create groups in the data based on Items: group_by(sales,Item)
 Get the maximum units sold for each item: filter(group_by(sales,by=Item),Qty==max(Qty))
group_by()

• Helps generate a single number/statistic for the dataset
• Syntax: summarise(dataset,newvariable=function….)
Examples:
 Total number of units sold across all Items:
summarise(sales,total=sum(Qty))
 Total number of units sold and total amount:
summarise(sales,t_Qty=sum(Qty),t_Amount=sum(Amount))
 Total number of records in the dataset:
summarise(sales,rowscount=n())
 Get the total number of records, quantity sold and amount for each item:
summarise(group_by(sales,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
 Every statistic for each dealer and their respective items:
summarise(group_by(sales,Dealer,Item),rcount=n(),untiyqty=sum(Qty),totalamount=sum(Amount))
summarise()

 We want to extract the top 6 records for Dealers who have sold the Item Pen only:
filter((sales,Item=="Pen")
select(filter(sales,Item=="Pen"),Item,Dealer,Qty)
arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer)
head(arrange(select(filter(sales,Item=="Pen"),Item,Dealer,Qty),by=Dealer))
 We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
select(sales,Dealer,Item,Qty)
filter(select(sales,Dealer,Item,Qty),sales$Month=="May")
group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer)
summarise(group_by(filter(select(sales,Dealer,Item,Qty),sales$Month=="May"),Item,Dealer),max(Qty))
Assignment

• Belongs to magrittr Package.
• Helps structure sequence of operations in a
single code from left to right.
• Helps avoid nesting of funtions.
• Operator: %>%
Examples:
 We want to extract the top 6 records for Dealers who have sold the Item Pen only:
sales%>%filter(Item=="Pen")%>%select(Dealer,Item,Qty)%>%arrange(Dealer)%>%head
 We want the maximum quantity of every item for the month of May with just Dealer, Item and
Quantity variables:
sales%>%select(Dealer,Item,Qty)%>%filter(sales$Month=="May")%>%group_by(Item,Dealer)%>%sum
marise(max(Qty))
pipe operator %>%

• Helps extract records (rows) based on their
position.
• Syntax: slice(dataset,row numbers)
Examples:
 Select first ten rows: slice(sales,1:10)
 Select rows fifteen to twenty: slice(sales,15:20)
slice()

• Helps count the number of times a values has
appeared in a variable.
• Syntax: count(dataset, [column1],[column2],…)
Examples:
 Count the number of times each Dealer has appeared: count(sales,Dealer)
 Count the number of times Pen has appeared: count(sales,Item=="Pen")
count()

Thanks!
Any questions?
You can find me at
▪ cc@wkvedu.com

dplyr Package in R

More Related Content

What's hot (20)

Similar to dplyr Package in R (20)

Recently uploaded (20)

dplyr Package in R