4. Why do we use
programming tools if we can
do so much in Excel?
5. About R
• R is an interpreted language, not a compiled one,
meaning that all commands typed on the keyboard are
directly executed without requiring to build a complete
program like in most computer languages (C, Java,
C#, . . .).
• R’s syntax is very simple and intuitive.
6. About R
• When R is running, variables, data,
functions, results, etc., are stored in the
active memory of the computer in the
form of objects which have a names.
• The user can do actions on these objects
with operators (arithmetic, logical,
comparison, . . .) and functions (which
are themselves objects).
7. Data frame
• A data frame is the most common way of storing data
in R and, generally, is the data structure most often used
for data analyses.
• Under the hood, a data frame is a list of equal-length
vectors.
• Each element of the list can be thought of as a column
and the length of each element of the list is the number of
rows.
8. Basic commands
• help(): prints documentation for a given R command
• example(): view some examples on the use of a command
• c() or scan(): enter data manually to a vector
9. Running Calculations
• You can run/calculate
expressions directly in the R
console
• Results are printed in the
console
• This approach is good for
testing
1 / 200 * 30
#> [1] 0.15
(59 + 73 + 2) / 3
#> [1] 44.66667
sin(pi / 2)
#> [1] 1
10. Objects
• In the console, you can also
store results in the object (some
call it a variable, it is a named
space in the memory)
• It allows you to reuse the value
• Assignment does not print value
• Expression can be simply a value
or some calculation
name <- expression
x <- 3 * 4
(x <- 3 * 4)
#> [1] 12
11. Objects’ Names
• Object name must start with a letter
• It can only contain letters,
numbers, _ and ..
• You want your object names to be
descriptive, so you’ll need a convention
for multiple words. We
recommend snake_case where you
separate lowercase words with _.
• Alternative is camelCase where each
word starts with a capital letter
use_snake_case
orUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention
12. Objects’ Names
• Object name must start with a letter
• It can only contain letters,
numbers, _ and ..
• You want your object names to be
descriptive, so you’ll need a convention
for multiple words. We
recommend snake_case where you
separate lowercase words with _.
• Alternative is camelCase where each
word starts with a capital letter
use_snake_case
orUseCamelCase
some.people.use.periods
And_aFew.People_RENOUNCEconvention
13. R is case sensitive
• R is a case sensitive language
• Be careful with naming as in
programming there is no
guessing
• If you want to see the value
assigned to the variable
surround the whole line with ()
r_rocks <- 2 ^ 3
r_rock
#> Error: object 'r_rock' not found
R_rocks
#> Error: object 'R_rocks' not found
14. Functions
• R has a large collection of built-
in functions
• You can call them using name
of the function and parenthesis
( and )
• In parenthesis you can provide
parameters for the function (if
it requires them)
function_name(arg1 = val1, ...)
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
19. Working with data
• data(): load built-in dataset
• View(): view loaded datasets
• read.csv() – requires a path or url to the csv file
• read.table()
20. Data Types
• Basic data types
• int stands for integers.
• dbl stands for doubles, or real numbers.
• chr stands for character vectors, or strings.
• dttm stands for date-times (a date + a time).
• lgl stands for logical, vectors that contain only TRUE or FALSE.
• fctr stands for factors, which R uses to represent categorical variables
with fixed possible values.
• date stands for dates.
21. Data Transformation in R
• Pick observations by their values (filter()).
• Reorder the rows (arrange()).
• Pick variables by their names (select()).
• Create new variables with functions of existing variables
(mutate()).
• Collapse many values down to a single summary
(summarise()).
22. Filter
• filter() allows you to subset observations based on their
values. The first argument is the name of the data frame.
• The second and subsequent arguments are the
expressions that filter the data frame.
jan1 <- filter(flights, month == 1, day == 1)
filter(flights, month == 11 | month == 12)
filter(flights, !(arr_delay > 120 | dep_delay > 120))
filter(flights, arr_delay <= 120, dep_delay <= 120)
23. Arrange
• arrange() works similarly to filter() except that instead
of selecting rows, it changes their order. It takes a data
frame and a set of column names (or more complicated
expressions) to order by.
• If you provide more than one column name, each
additional column will be used to break ties in the values
of preceding columns
arrange(flights, year, month, day)
24. Select
• select() allows you to rapidly zoom in on a useful
subset using operations based on the names of the
variables.
select(flights, year, month, day)
25. Mutate
• mutate() always adds new columns at the end of your
dataset so we’ll start by creating a narrower dataset so we
can see the new variables.
• You can use the columns added in definitions of other
columns that you add (be careful with cycles)
mutate(flights,
gain = dep_delay - arr_delay,
speed = distance / air_time * 60
)
26. Summarize
• summarise() is not terribly useful unless we pair it
with group_by().
• This changes the unit of analysis from the complete
dataset to individual groups. Then, when you use the
dplyr verbs on a grouped data frame they’ll be
automatically applied “by group”.
by_day <- group_by(flights, year, month, day)
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
28. Task 1 – Code analysis
• Why does this code not work?
my_variable <- 10
my_varıable
#> Error in eval(expr, envir, enclos): object
'my_varıable' not found
29. Task 2 – Code analysis
• What is wrong with this two function calls?
fliter(mpg, cyl = 8)
filter(diamond, carat > 3)
30. Task 3 – Filter data
• Find all diamonds in the diamonds data set that are
smaller than 3 carats and cost more than 15,000
filter(diamonds,…)