SlideShare a Scribd company logo
Manipulating Data
Using base R package
Rupak Roy
 Sub-setting of data is a very important component of data management
where the subset function helps us to pull out rows from the data frame
based on a logical expression using the column names.
#load the pre-set R sample data
>data(mtcars)
#sub-setting 4first rows and columns in the format [rows, columns]
>mtcars[1:4,1:3]
#sub-setting specific list of rows and columns in the format [rows, columns]
>mtcars[c(2,4,6), c(1,3,6)]
Sub-setting: rows
Rupak Roy
#using == operator: ( == )
>mtcars_93<-mtcars[mtcars$hp == 93, ] #with comma i.e. [row,column] format
>mtcars_equal93<-mtcars[mtcars$hp == 93] #without comma it will give the
results of columns ==93 which is not accurate
#using OR operator: ( | )
>mtcars_or<-mtcars[mtcars$hp==93 | mtcars$carb==1,]
>View(mtcars_or)
#using AND operator: (&)
> mtcars_and<-mtcars[ mtcars$mpg >= 20 & mtcars$hp <= 110,]
>View(mtcars_and)
Logical sub-setting
Rupak Roy
#selecting columns
>mtcars_col<-mtcars[ , c("mpg","cyl","hp")]
>View(mtcars_col)
#logical row sub-setting + sub-setting of columns
>mtcars_rc<-mtcars[mtcars$mpg >= 20 & mtcars$hp <= 110, c("wt","disp")]
>View(mtcars_rc)
#adding new columns where “add” is the new column name
>mtcars$add<-ifelse(mtcars$mpg<=15,"luxury",ifelse(mtcars$mpg<=
20,"sports","economy"))
>View(mtcars)
#remove a column
>mtcars$add<-NULL
Sub-setting: columns
Rupak Roy
#load the data
>sampledata<-read.csv(file.choose(),na.strings = c("NA",""),header = TRUE)
#identify the positions using “which” operator
>index_position<-which(sampledata$number.of.doors==“two”)
“Which” operator returns only the indices i.e. the positions of the logical
condition if true.
>head(index_position)
#Sub-setting the data using the indices
>newdata<-mtcars[index_position, ]
>View(newdata)
Manipulating Data: which()
Rupak Roy
#sort the data using order operator
>descending_order<-order(-sampledata$Sl.No)
>View(order)
#sort the dataset using descending_order R object values
>ordered<-sampledata[descending_order, ]
Or directly
>ordered<-sampledata[order(-sampledata$Sl.No), ]
>View(ordered)
Where “-” represents descending order
Manipulating Data: order()
Rupak Roy
#average/mean mileage(mpg) based on cylinder(cyl)
>aggregate(mtcars$mpg, by=list(mtcars$cyl), mean)
OR
>tapply (mtcars$mpg, mtcars$cyl, mean)
Else
>aggregate(mtcars$mpg ~ mtcars$cyl, FUN = mean)
>aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
#average/mean mileage based on cyl and carb
>aggregate(mpg~ cyl+carb, data = mtcars, FUN = mean)
#total average/mean based on cylinder
>average_mtcars<-aggregate(mtcars, by=list(mtcars$cyl), mean)
>class(average_mtcars)
>View(average_mtcars)
Manipulating Data: aggregate()
Rupak Roy
?table: table uses the cross-classifying factors to build a contingency table of
the counts at each combination of factor levels.
#number of mtcars based on Transmission (am) and Cylinder(cyl)
>table(mtcars$am,mtcars$cyl)
where 4,6,8 are type of cylinders
and 0 for automatic, 1 for manual
total: 32cars
Alternatively we can use xtabs()
?xtabs: Creates a contingency table (optionally a sparse matrix) from cross-
classifying factors, usually contained in a data frame, using a formula interface.
>xtabs(expenditure~category) in the format (numeric ~ factor1+factor2)
Cross tabulation
4 6 8
0 3 4 12
1 8 3 2
Next:
We will learn more easy way to manipulate data using a
special package call DPLYR()
Manipulating Data
Rupak Roy

More Related Content

What's hot (18)

PPTX
3. R- list and data frame
krishna singh
 
PDF
SAS and R Code for Basic Statistics
Avjinder (Avi) Kaler
 
PDF
Day 1b R structures objects.pptx
Adrien Melquiond
 
PDF
Day 1d R structures & objects: matrices and data frames.pptx
Adrien Melquiond
 
PDF
Day 2 repeats.pptx
Adrien Melquiond
 
PPTX
5. working on data using R -Cleaning, filtering ,transformation, Sampling
krishna singh
 
PDF
Day 2b i/o.pptx
Adrien Melquiond
 
PDF
Stata cheatsheet transformation
Laura Hughes
 
PDF
Stata cheat sheet: data transformation
Tim Essam
 
PPTX
DPLYR package in R
Bimba Pawar
 
ODP
Sqlharshal
ashvinkokate
 
PDF
Data manipulation language
PratibhaRashmiSingh
 
PDF
Day 5b statistical functions.pptx
Adrien Melquiond
 
PDF
Stata cheat sheet: data processing
Tim Essam
 
PDF
Stata Cheat Sheets (all)
Laura Hughes
 
PDF
R factors
Learnbay Datascience
 
PDF
Stata Programming Cheat Sheet
Laura Hughes
 
3. R- list and data frame
krishna singh
 
SAS and R Code for Basic Statistics
Avjinder (Avi) Kaler
 
Day 1b R structures objects.pptx
Adrien Melquiond
 
Day 1d R structures & objects: matrices and data frames.pptx
Adrien Melquiond
 
Day 2 repeats.pptx
Adrien Melquiond
 
5. working on data using R -Cleaning, filtering ,transformation, Sampling
krishna singh
 
Day 2b i/o.pptx
Adrien Melquiond
 
Stata cheatsheet transformation
Laura Hughes
 
Stata cheat sheet: data transformation
Tim Essam
 
DPLYR package in R
Bimba Pawar
 
Sqlharshal
ashvinkokate
 
Data manipulation language
PratibhaRashmiSingh
 
Day 5b statistical functions.pptx
Adrien Melquiond
 
Stata cheat sheet: data processing
Tim Essam
 
Stata Cheat Sheets (all)
Laura Hughes
 
Stata Programming Cheat Sheet
Laura Hughes
 

Similar to Manipulating Data using base R package (20)

TXT
R console
Ananth Raj
 
PDF
Manipulating Data using DPLYR in R Studio
Rupak Roy
 
PPTX
Artificial inteliggence and machine learning ppt
PromashreeChakrabort1
 
PDF
R is a very flexible and powerful programming language, as well as a.pdf
annikasarees
 
PPTX
Lab 2 - Managing Data in R Basic Conecpt.pptx
noman297489
 
PPTX
A few things about the Oracle optimizer - 2013
Connor McDonald
 
PDF
Pandas numpy Related Presentation.pptx.pdf
chaitudec2005
 
PPTX
Chris Seebacher Portfolio
guest3ea163
 
PPTX
Archana python seminar python program.pptx
internationalssgroup
 
PPTX
interenship.pptx
Naveen316549
 
PPTX
R programming
Pramodkumar Jha
 
PPTX
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
dataKarthik
 
PDF
SQL on Linux and its uses and application.pdf
bhaveshsethi456
 
ODP
Mysqlppt
poornima sugumaran
 
PDF
Data mining with caret package
Vivian S. Zhang
 
PDF
PerlApp2Postgresql (2)
Jerome Eteve
 
PDF
R code for data manipulation
Avjinder (Avi) Kaler
 
PPTX
Case Study: Building Analytic Models over Big Data
Collin Bennett
 
PPT
Module03
Sridhar P
 
PPTX
Business Intelligence Portfolio
Chris Seebacher
 
R console
Ananth Raj
 
Manipulating Data using DPLYR in R Studio
Rupak Roy
 
Artificial inteliggence and machine learning ppt
PromashreeChakrabort1
 
R is a very flexible and powerful programming language, as well as a.pdf
annikasarees
 
Lab 2 - Managing Data in R Basic Conecpt.pptx
noman297489
 
A few things about the Oracle optimizer - 2013
Connor McDonald
 
Pandas numpy Related Presentation.pptx.pdf
chaitudec2005
 
Chris Seebacher Portfolio
guest3ea163
 
Archana python seminar python program.pptx
internationalssgroup
 
interenship.pptx
Naveen316549
 
R programming
Pramodkumar Jha
 
fINAL Lesson_5_Data_Manipulation_using_R_v1.pptx
dataKarthik
 
SQL on Linux and its uses and application.pdf
bhaveshsethi456
 
Data mining with caret package
Vivian S. Zhang
 
PerlApp2Postgresql (2)
Jerome Eteve
 
R code for data manipulation
Avjinder (Avi) Kaler
 
Case Study: Building Analytic Models over Big Data
Collin Bennett
 
Module03
Sridhar P
 
Business Intelligence Portfolio
Chris Seebacher
 
Ad

More from Rupak Roy (20)

PDF
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
PDF
Clustering K means and Hierarchical - NLP
Rupak Roy
 
PDF
Network Analysis - NLP
Rupak Roy
 
PDF
Topic Modeling - NLP
Rupak Roy
 
PDF
Sentiment Analysis Practical Steps
Rupak Roy
 
PDF
NLP - Sentiment Analysis
Rupak Roy
 
PDF
Text Mining using Regular Expressions
Rupak Roy
 
PDF
Introduction to Text Mining
Rupak Roy
 
PDF
Apache Hbase Architecture
Rupak Roy
 
PDF
Introduction to Hbase
Rupak Roy
 
PDF
Apache Hive Table Partition and HQL
Rupak Roy
 
PDF
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
PDF
Introductive to Hive
Rupak Roy
 
PDF
Scoop Job, import and export to RDBMS
Rupak Roy
 
PDF
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
PDF
Introduction to scoop and its functions
Rupak Roy
 
PDF
Introduction to Flume
Rupak Roy
 
PDF
Apache Pig Relational Operators - II
Rupak Roy
 
PDF
Passing Parameters using File and Command Line
Rupak Roy
 
PDF
Apache PIG Relational Operations
Rupak Roy
 
Hierarchical Clustering - Text Mining/NLP
Rupak Roy
 
Clustering K means and Hierarchical - NLP
Rupak Roy
 
Network Analysis - NLP
Rupak Roy
 
Topic Modeling - NLP
Rupak Roy
 
Sentiment Analysis Practical Steps
Rupak Roy
 
NLP - Sentiment Analysis
Rupak Roy
 
Text Mining using Regular Expressions
Rupak Roy
 
Introduction to Text Mining
Rupak Roy
 
Apache Hbase Architecture
Rupak Roy
 
Introduction to Hbase
Rupak Roy
 
Apache Hive Table Partition and HQL
Rupak Roy
 
Installing Apache Hive, internal and external table, import-export
Rupak Roy
 
Introductive to Hive
Rupak Roy
 
Scoop Job, import and export to RDBMS
Rupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Rupak Roy
 
Introduction to scoop and its functions
Rupak Roy
 
Introduction to Flume
Rupak Roy
 
Apache Pig Relational Operators - II
Rupak Roy
 
Passing Parameters using File and Command Line
Rupak Roy
 
Apache PIG Relational Operations
Rupak Roy
 
Ad

Recently uploaded (20)

PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 

Manipulating Data using base R package

  • 1. Manipulating Data Using base R package Rupak Roy
  • 2.  Sub-setting of data is a very important component of data management where the subset function helps us to pull out rows from the data frame based on a logical expression using the column names. #load the pre-set R sample data >data(mtcars) #sub-setting 4first rows and columns in the format [rows, columns] >mtcars[1:4,1:3] #sub-setting specific list of rows and columns in the format [rows, columns] >mtcars[c(2,4,6), c(1,3,6)] Sub-setting: rows Rupak Roy
  • 3. #using == operator: ( == ) >mtcars_93<-mtcars[mtcars$hp == 93, ] #with comma i.e. [row,column] format >mtcars_equal93<-mtcars[mtcars$hp == 93] #without comma it will give the results of columns ==93 which is not accurate #using OR operator: ( | ) >mtcars_or<-mtcars[mtcars$hp==93 | mtcars$carb==1,] >View(mtcars_or) #using AND operator: (&) > mtcars_and<-mtcars[ mtcars$mpg >= 20 & mtcars$hp <= 110,] >View(mtcars_and) Logical sub-setting Rupak Roy
  • 4. #selecting columns >mtcars_col<-mtcars[ , c("mpg","cyl","hp")] >View(mtcars_col) #logical row sub-setting + sub-setting of columns >mtcars_rc<-mtcars[mtcars$mpg >= 20 & mtcars$hp <= 110, c("wt","disp")] >View(mtcars_rc) #adding new columns where “add” is the new column name >mtcars$add<-ifelse(mtcars$mpg<=15,"luxury",ifelse(mtcars$mpg<= 20,"sports","economy")) >View(mtcars) #remove a column >mtcars$add<-NULL Sub-setting: columns Rupak Roy
  • 5. #load the data >sampledata<-read.csv(file.choose(),na.strings = c("NA",""),header = TRUE) #identify the positions using “which” operator >index_position<-which(sampledata$number.of.doors==“two”) “Which” operator returns only the indices i.e. the positions of the logical condition if true. >head(index_position) #Sub-setting the data using the indices >newdata<-mtcars[index_position, ] >View(newdata) Manipulating Data: which() Rupak Roy
  • 6. #sort the data using order operator >descending_order<-order(-sampledata$Sl.No) >View(order) #sort the dataset using descending_order R object values >ordered<-sampledata[descending_order, ] Or directly >ordered<-sampledata[order(-sampledata$Sl.No), ] >View(ordered) Where “-” represents descending order Manipulating Data: order() Rupak Roy
  • 7. #average/mean mileage(mpg) based on cylinder(cyl) >aggregate(mtcars$mpg, by=list(mtcars$cyl), mean) OR >tapply (mtcars$mpg, mtcars$cyl, mean) Else >aggregate(mtcars$mpg ~ mtcars$cyl, FUN = mean) >aggregate(mpg ~ cyl, data = mtcars, FUN = mean) #average/mean mileage based on cyl and carb >aggregate(mpg~ cyl+carb, data = mtcars, FUN = mean) #total average/mean based on cylinder >average_mtcars<-aggregate(mtcars, by=list(mtcars$cyl), mean) >class(average_mtcars) >View(average_mtcars) Manipulating Data: aggregate() Rupak Roy
  • 8. ?table: table uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels. #number of mtcars based on Transmission (am) and Cylinder(cyl) >table(mtcars$am,mtcars$cyl) where 4,6,8 are type of cylinders and 0 for automatic, 1 for manual total: 32cars Alternatively we can use xtabs() ?xtabs: Creates a contingency table (optionally a sparse matrix) from cross- classifying factors, usually contained in a data frame, using a formula interface. >xtabs(expenditure~category) in the format (numeric ~ factor1+factor2) Cross tabulation 4 6 8 0 3 4 12 1 8 3 2
  • 9. Next: We will learn more easy way to manipulate data using a special package call DPLYR() Manipulating Data Rupak Roy