SlideShare a Scribd company logo
Introduction to
Programming in
R
Przemyslaw (Pshemek) Pawluk MSc.
Eng.
Introduction
• Basic programming concepts in R
• Tools - RStudio
Data Science / Data Analysis
Why do we use
programming tools if we can
do so much in Excel?
About R
• R is an interpreted language, not a compiled one,
meaning that all commands typed on the keyboard are
directly executed without requiring to build a complete
program like in most computer languages (C, Java,
C#, . . .).
• R’s syntax is very simple and intuitive.
About R
• When R is running, variables, data,
functions, results, etc., are stored in the
active memory of the computer in the
form of objects which have a names.
• The user can do actions on these objects
with operators (arithmetic, logical,
comparison, . . .) and functions (which
are themselves objects).
Data frame
• A data frame is the most common way of storing data
in R and, generally, is the data structure most often used
for data analyses.
• Under the hood, a data frame is a list of equal-length
vectors.
• Each element of the list can be thought of as a column
and the length of each element of the list is the number of
rows.
Basic commands
• help(): prints documentation for a given R command
• example(): view some examples on the use of a command
• c() or scan(): enter data manually to a vector
Running Calculations
• You can run/calculate
expressions directly in the R
console
• Results are printed in the
console
• This approach is good for
testing
1 / 200 * 30
#> [1] 0.15
(59 + 73 + 2) / 3
#> [1] 44.66667
sin(pi / 2)
#> [1] 1
Objects
• In the console, you can also
store results in the object (some
call it a variable, it is a named
space in the memory)
• It allows you to reuse the value
• Assignment does not print value
• Expression can be simply a value
or some calculation
name <- expression
x <- 3 * 4
(x <- 3 * 4)
#> [1] 12
Objects’ Names
• Object name must start with a letter
• It can only contain letters,
numbers, _ and ..
• You want your object names to be
descriptive, so you’ll need a convention
for multiple words. We
recommend snake_case where you
separate lowercase words with _.
• Alternative is camelCase where each
word starts with a capital letter
use_snake_case
orUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention
Objects’ Names
• Object name must start with a letter
• It can only contain letters,
numbers, _ and ..
• You want your object names to be
descriptive, so you’ll need a convention
for multiple words. We
recommend snake_case where you
separate lowercase words with _.
• Alternative is camelCase where each
word starts with a capital letter
use_snake_case
orUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention
R is case sensitive
• R is a case sensitive language
• Be careful with naming as in
programming there is no
guessing
• If you want to see the value
assigned to the variable
surround the whole line with ()
r_rocks <- 2 ^ 3
r_rock
#> Error: object 'r_rock' not found
R_rocks
#> Error: object 'R_rocks' not found
Functions
• R has a large collection of built-
in functions
• You can call them using name
of the function and parenthesis
( and )
• In parenthesis you can provide
parameters for the function (if
it requires them)
function_name(arg1 = val1, ...)
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
• Demos
• Console
• Help etc.
• Results &
Graphs
• Demos
• Console
• Help etc.
• Results &
Graphs
• Demos
• Console
• Help etc.
• Results &
Graphs
• Demos
• Console
• Help etc.
• Results &
Graphs
Working with data
• data(): load built-in dataset
• View(): view loaded datasets
• read.csv() – requires a path or url to the csv file
• read.table()
Data Types
• Basic data types
• int stands for integers.
• dbl stands for doubles, or real numbers.
• chr stands for character vectors, or strings.
• dttm stands for date-times (a date + a time).
• lgl stands for logical, vectors that contain only TRUE or FALSE.
• fctr stands for factors, which R uses to represent categorical variables
with fixed possible values.
• date stands for dates.
Data Transformation in R
• Pick observations by their values (filter()).
• Reorder the rows (arrange()).
• Pick variables by their names (select()).
• Create new variables with functions of existing variables
(mutate()).
• Collapse many values down to a single summary
(summarise()).
Filter
• filter() allows you to subset observations based on their
values. The first argument is the name of the data frame.
• The second and subsequent arguments are the
expressions that filter the data frame.
jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)
Arrange
• arrange() works similarly to filter() except that instead
of selecting rows, it changes their order. It takes a data
frame and a set of column names (or more complicated
expressions) to order by.
• If you provide more than one column name, each
additional column will be used to break ties in the values
of preceding columns
arrange(flights, year, month, day)
Select
• select() allows you to rapidly zoom in on a useful
subset using operations based on the names of the
variables.
select(flights, year, month, day)
Mutate
• mutate() always adds new columns at the end of your
dataset so we’ll start by creating a narrower dataset so we
can see the new variables.
• You can use the columns added in definitions of other
columns that you add (be careful with cycles)
mutate(flights,
gain = dep_delay - arr_delay,
speed = distance / air_time * 60
)
Summarize
• summarise() is not terribly useful unless we pair it
with group_by().
• This changes the unit of analysis from the complete
dataset to individual groups. Then, when you use the
dplyr verbs on a grouped data frame they’ll be
automatically applied “by group”.
by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
Hands-on
Task 1 – Code analysis
• Why does this code not work?
my_variable <- 10
my_varıable
#> Error in eval(expr, envir, enclos): object
'my_varıable' not found
Task 2 – Code analysis
• What is wrong with this two function calls?
fliter(mpg, cyl = 8)
filter(diamond, carat > 3)
Task 3 – Filter data
• Find all diamonds in the diamonds data set that are
smaller than 3 carats and cost more than 15,000
filter(diamonds,…)
References
• https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.
pdf
• https://www.rdocumentation.org/

More Related Content

Similar to Introduction To Programming In R for data analyst (20)

PDF
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
PDF
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
PDF
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
PPTX
Unit I - 1R introduction to R program.pptx
SreeLaya9
 
PPTX
Introduction to R - Basics of R programming, Data structures.pptx
DrTherasaChandraseka
 
PPTX
Unit 1 - R Programming (Part 2).pptx
Malla Reddy University
 
PPTX
R Basics
Dr.E.N.Sathishkumar
 
PDF
R basics
FAO
 
PPTX
Getting Started with R
Sankhya_Analytics
 
PPT
R Programming Intro
062MayankSinghal
 
PPTX
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
PPTX
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
PDF
FULL R PROGRAMMING METERIAL_2.pdf
attalurilalitha
 
PDF
2 data types and operators in r
Dr Nisha Arora
 
PDF
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
مركز البحوث الأقسام العلمية
 
PDF
Machine Learning in R
Alexandros Karatzoglou
 
PDF
7. basics
ExternalEvents
 
PDF
Introduction to r
Alberto Labarga
 
PDF
R Programming - part 1.pdf
RohanBorgalli
 
PDF
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
مركز البحوث الأقسام العلمية
 
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
Unit I - 1R introduction to R program.pptx
SreeLaya9
 
Introduction to R - Basics of R programming, Data structures.pptx
DrTherasaChandraseka
 
Unit 1 - R Programming (Part 2).pptx
Malla Reddy University
 
R basics
FAO
 
Getting Started with R
Sankhya_Analytics
 
R Programming Intro
062MayankSinghal
 
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
FULL R PROGRAMMING METERIAL_2.pdf
attalurilalitha
 
2 data types and operators in r
Dr Nisha Arora
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
مركز البحوث الأقسام العلمية
 
Machine Learning in R
Alexandros Karatzoglou
 
7. basics
ExternalEvents
 
Introduction to r
Alberto Labarga
 
R Programming - part 1.pdf
RohanBorgalli
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
مركز البحوث الأقسام العلمية
 

Recently uploaded (20)

PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
Horarios de distribución de agua en julio
pegazohn1978
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Ad

Introduction To Programming In R for data analyst

  • 1. Introduction to Programming in R Przemyslaw (Pshemek) Pawluk MSc. Eng.
  • 2. Introduction • Basic programming concepts in R • Tools - RStudio
  • 3. Data Science / Data Analysis
  • 4. Why do we use programming tools if we can do so much in Excel?
  • 5. About R • R is an interpreted language, not a compiled one, meaning that all commands typed on the keyboard are directly executed without requiring to build a complete program like in most computer languages (C, Java, C#, . . .). • R’s syntax is very simple and intuitive.
  • 6. About R • When R is running, variables, data, functions, results, etc., are stored in the active memory of the computer in the form of objects which have a names. • The user can do actions on these objects with operators (arithmetic, logical, comparison, . . .) and functions (which are themselves objects).
  • 7. Data frame • A data frame is the most common way of storing data in R and, generally, is the data structure most often used for data analyses. • Under the hood, a data frame is a list of equal-length vectors. • Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.
  • 8. Basic commands • help(): prints documentation for a given R command • example(): view some examples on the use of a command • c() or scan(): enter data manually to a vector
  • 9. Running Calculations • You can run/calculate expressions directly in the R console • Results are printed in the console • This approach is good for testing 1 / 200 * 30 #> [1] 0.15 (59 + 73 + 2) / 3 #> [1] 44.66667 sin(pi / 2) #> [1] 1
  • 10. Objects • In the console, you can also store results in the object (some call it a variable, it is a named space in the memory) • It allows you to reuse the value • Assignment does not print value • Expression can be simply a value or some calculation name <- expression x <- 3 * 4 (x <- 3 * 4) #> [1] 12
  • 11. Objects’ Names • Object name must start with a letter • It can only contain letters, numbers, _ and .. • You want your object names to be descriptive, so you’ll need a convention for multiple words. We recommend snake_case where you separate lowercase words with _. • Alternative is camelCase where each word starts with a capital letter use_snake_case orUseCamelCase some.people.use.periods And_aFew.People_RENOUNCEconvention
  • 12. Objects’ Names • Object name must start with a letter • It can only contain letters, numbers, _ and .. • You want your object names to be descriptive, so you’ll need a convention for multiple words. We recommend snake_case where you separate lowercase words with _. • Alternative is camelCase where each word starts with a capital letter use_snake_case orUseCamelCase some.people.use.periods And_aFew.People_RENOUNCEconvention
  • 13. R is case sensitive • R is a case sensitive language • Be careful with naming as in programming there is no guessing • If you want to see the value assigned to the variable surround the whole line with () r_rocks <- 2 ^ 3 r_rock #> Error: object 'r_rock' not found R_rocks #> Error: object 'R_rocks' not found
  • 14. Functions • R has a large collection of built- in functions • You can call them using name of the function and parenthesis ( and ) • In parenthesis you can provide parameters for the function (if it requires them) function_name(arg1 = val1, ...) seq(1, 10) #> [1] 1 2 3 4 5 6 7 8 9 10
  • 15. • Demos • Console • Help etc. • Results & Graphs
  • 16. • Demos • Console • Help etc. • Results & Graphs
  • 17. • Demos • Console • Help etc. • Results & Graphs
  • 18. • Demos • Console • Help etc. • Results & Graphs
  • 19. Working with data • data(): load built-in dataset • View(): view loaded datasets • read.csv() – requires a path or url to the csv file • read.table()
  • 20. Data Types • Basic data types • int stands for integers. • dbl stands for doubles, or real numbers. • chr stands for character vectors, or strings. • dttm stands for date-times (a date + a time). • lgl stands for logical, vectors that contain only TRUE or FALSE. • fctr stands for factors, which R uses to represent categorical variables with fixed possible values. • date stands for dates.
  • 21. Data Transformation in R • Pick observations by their values (filter()). • Reorder the rows (arrange()). • Pick variables by their names (select()). • Create new variables with functions of existing variables (mutate()). • Collapse many values down to a single summary (summarise()).
  • 22. Filter • filter() allows you to subset observations based on their values. The first argument is the name of the data frame. • The second and subsequent arguments are the expressions that filter the data frame. jan1 <- filter(flights, month == 1, day == 1) filter(flights, month == 11 | month == 12) filter(flights, !(arr_delay > 120 | dep_delay > 120)) filter(flights, arr_delay <= 120, dep_delay <= 120)
  • 23. Arrange • arrange() works similarly to filter() except that instead of selecting rows, it changes their order. It takes a data frame and a set of column names (or more complicated expressions) to order by. • If you provide more than one column name, each additional column will be used to break ties in the values of preceding columns arrange(flights, year, month, day)
  • 24. Select • select() allows you to rapidly zoom in on a useful subset using operations based on the names of the variables. select(flights, year, month, day)
  • 25. Mutate • mutate() always adds new columns at the end of your dataset so we’ll start by creating a narrower dataset so we can see the new variables. • You can use the columns added in definitions of other columns that you add (be careful with cycles) mutate(flights, gain = dep_delay - arr_delay, speed = distance / air_time * 60 )
  • 26. Summarize • summarise() is not terribly useful unless we pair it with group_by(). • This changes the unit of analysis from the complete dataset to individual groups. Then, when you use the dplyr verbs on a grouped data frame they’ll be automatically applied “by group”. by_day <- group_by(flights, year, month, day) summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
  • 28. Task 1 – Code analysis • Why does this code not work? my_variable <- 10 my_varıable #> Error in eval(expr, envir, enclos): object 'my_varıable' not found
  • 29. Task 2 – Code analysis • What is wrong with this two function calls? fliter(mpg, cyl = 8) filter(diamond, carat > 3)
  • 30. Task 3 – Filter data • Find all diamonds in the diamonds data set that are smaller than 3 carats and cost more than 15,000 filter(diamonds,…)